Interim Report on Status of P5 [MEW07]Lou Burnard and Sebastian Rahtz12 Jan 2004
Licensed under
Revised from SPQR draft of 26 Nov
16 September 2007
Converted to P5
Lou BurnardRevision: #9
This document attempts to cover several different things, which may
at some stage be better unravelled into different documents.
Summarize and clarify our strategic goals in revising TEI P5List some specific steps we currently identify as necessary
to accomplish those goalsRequest endorsement from the Council for those stepsDocument the current state of affairs with respect to the new
ODD format
The document includes a number of specific technical proposals for
change which we would like the Council to endorse. Without this
agreement, we cannot make much further progress.
The interdependency of many of the steps involved make this a
difficult process. For example, we have agreed to produce a complete
revision of the Tag Set Documentation chapter of P4, so that TEI P5
will be truly self-documenting. But we cannot produce this chapter until
we have finalized our revisions of TEI ODD format. But we cannot
finalize those until we have thought through the documentary consequences of
other changes in the TEI scheme as a whole. This is not simply a
boot-strapping problem (though certainly that is part of it): it has
to be possible to revise the
draft document defining the ODD format independently of the review
process, since we may need to revise the ODD format in order to produce
other documents for review!
Strategic Goals
These may be summarized under the following headings:
taking
advantage of the work done by
othersaddressing areas as yet
untamedclearing out the
accretions of a decade
The need for interoperability implies (at least) the following
subsidiary goals:
support for multiple namespacessupport for non-DTD schema mechanismsclearer definition of the TEI abstract model
The need for expansion implies (at least) the following
subsidiary goals:
incorporation of additional modulessurvey of user requirements for further additionstesting and documentation of the customization layer, and
training in its use
House-cleaning is a self-evident need, which probably only counts as
a strategic goal because we have not had any opportunity to
re-think the basic assumptions built into the TEI Architecture since
c. 1994.
TEI P5: contents
As of this writing, we expect to add the following completely new chapters at
P5:
Manuscript descriptionMultimedia and graphicsStandoff annotation
In addition to these new chapters, we expect substantial revision of the following chapters:
Authoring and tag documentationFeature structuresTranscription of primary sourcesLanguages and Character setsLinking, Segmentation, and Alignment
In addition, substantial revision will be needed in the following
chapters (but has not yet been assigned, much less begun)
Gentle Introduction (doesnt discuss relax)The headerDefault structure (discusses class system)Core (too long)
It is currently proposed to remove the following chapters:
Writing System Declaration (agreed at Council meeting May
2003)Graphs, networks, and treesTerminology
Steps on the path
Currently chartered workgroups are producing relevant material for
several of the new and revised chapters specified above. A major
problem at the moment is that drafts cannot be produced in semi-final
form until we have a working documented version of the new ODD format.
In November 2003, we thought the timetable was like this:
public call for small changescomplete and document revisions of new ODD format incorporate new/revised chaptersend of public call for small changes seek approval by council of new/revised chapters public alpha review (possibility of feature change) public beta review (feature frozen) release to TEI members full public release of P5
Although somewhat optimistic, this timetable usefully
indicates the sequence of events: we cannot reasonably expect new
drafts until we have documented their format, and we cannot do that
until we've decided what it should be! For that reason, it seems
to us rather urgent to reach agreement on a number of
pervasive changes in the current TEI system, which are listed below,
in no particular order.
At P5 we plan to make a large number of systematic changes in
naming conventions. A separate working paper (ME W 08) lists these. Agree or improve name change proposals in
MEW08At the Oxford meeting of the TEI Council, we agreed to move to
using RelaxNG as the means of formal expression for the TEI
schema. This has been implemented. We also agreed that it would be
desirable to generate output schemas in each of XML DTD language and
W3C Schema as well, on user request. We did not explicitly address the
question of whether user extension and modification of the scheme
should be supported in all three schema languages: in the event, our
current belief is that this is really only practicable in Relax NG. (A
RNG schema can however be converted to any of the other languages as a
second step).User extensions and modification
files must be prepared using RelaxNG syntaxWith the move to RelaxNG we aim to introduce a better range of
attribute value validation facilities. The goal is that all attribute
values should match a W3C datatype, or one of a small set of
TEI-defined patterns. This subsumes the agreed need to remove
attribute values which potentially contain tagged text. All attribute values to be reviewed and changed to
match W3C datatypes or TEI-defined patternall elements bearing "text" attribute
values to be rethought and redesigned, perhaps using the choice
mechanismThe class system of P4 is only partially applied, largely
because its implementation via parameter entities is so fiendishly
complicated. The use of RelaxNG patterns gives us a class system which
is easy to apply and understand, and also greatly simplifies
modification of the schema. We therefore propose to extend the class
system systematically, and to deprecate content models which refer to
specific elements rather than to element classes.
As far as possible, all content models to
be re-expressed using either one of the standard content macros, or by
reference to classes of element. In P4, there is a small number of elements (e.g. eg,
gram) which have different definitions in different
modules. This is now seen as a needless complication: if an element is
to have a different definition in some context, this should be achieved
by redefinition of the element in the same way as usual.
All element names to be unique across the schemeThere are five existing auxiliary tagsets: writing system
declaration (WSD), feature system declaration (FSD), independent
header (IHS), and tagset declaration (TSD). It has already been agreed
to drop the WSD and to recast the TSD as an additional tagset. The IHS
is an artefact to enable a valid document consisting only of headers,
which could be accomplished in other ways. This leaves only
the FSD, which could also be handled in the same way as any other
module. With a view to further simplifying the process of schema construction, we
are considering whether or not the current distinction between base
and additional tagsets is necessary.
Concept of auxiliary DTD to be dropped
in favour of a discussion about namespaces and ways of combining TEI modulesAt present the maintenance version of the Guidelines is still in the
(undocumented) P4 ODD format. The experimental P5 versions are
generated from these by an increasingly tortuous series of scripts. To
make real progress in testing and developing the new ODD format, it
has to become the maintenance form for TEI P5. We would therefore like
to freeze the current P4 ODDs (they will still be needed for maintenance
of TEI P4), to run the conversion process once more, and then to use
TEI P5 format ODDs as the development source for all subsequent work
on TEI P5. Some external checking of the process seems advisable
before this can take place however.
Check equivalence amongst (at least) P4 content
models, generated P5 RelaxNG equivalents, and P5 DTD-generated
equivalentsSwitch to developing and maintaining TEI P5
sources in ODD-NG only
Detailed changes
The document lists changes made
to the P5 ODDs to date. We highlight
below a few specific kinds of change below:
Class changes, content model changes
added new classes: paragraph, categorize, segment,
profile, encoding, header. These are mostly coping with
vagaries in the header, where content models refer to
elements which are not defined until modules are loaded.listBibl.tag: in place of "trailer", use "divbot" class, to
avoid dependencyadd class.teiText and class.teiHeader, and change content model
of TEI accordingly.dieg.tag: changed name of this eg to dicteg, to avoid
overlap with eg from tagdocsadd pattern for "schemapattern"; this defaults to ANY, but
p5odds.xsl redefines it as "anything from RelaxNG"define and add ODDPHR and ODDREF from tagdocs to low-level classes
Schema generation
when creating output modules, do not produce those
with type "decls"; instead, include the contents of that module at
the front of the module with the corresponding name with "-decl"
stripped off.put a fixed list of overrides of special defines in the
start of tei.rngin schema, each element interleaves itself into its model class.
we just include the schema file if we want to use it, and it
extends the classes of which it is a member
Additions and deletions
figure.tag: added url, width, height, scale attributesxpointer.cla: added url attributekill all of WSD, which means all the following files are
gone: basewsd.tag directn.tag exceptns.tag script.tag
teiwsd.tag wsdccs.tag wsdchar.tag wsdchars.tag wsddesc.tag
wsdents.tag wsdfig.tag wsdform.tag wsdglob.cla wsdlang.tag
wsdnote.tag wsdxfig.tagtake out top-level material which makes TSD a separate DTD