Naming Conventions in the TEI scheme
Following the decision to implement support for XML namespaces in
TEI P5, we have an opportunity to review naming conventions and
practice at all levels of the TEI scheme. This document sketches out
the decisions taken, or proposed, so far, and will be revised as
necessary.
The namespace
To begin at the beginning, we have the namespace itself. It is
currently proposed that there should be only a single TEI namespace
(rather than, say, a different one for each base, or for each
module). There was a little discussion on TEI-L (
)
of what form of identifier that namespace should take, with a
consensus emerging in favour of a URI-style identifier including a
version number. The current proposal is therefore that the name space
for TEI P5 will be
http://www.tei-c.org/ns/1.0, and that
is the string which our current P5 processing utilities expect to find
as the value of an
xmlns attribute on the root of a TEI
document.
Note that the version number in the namespace is
not the version
number of the TEI Guidelines, but the version of the name space
declaration. We therefore propose additionally a new
attribute on
TEI to identify the release of the
Guidelines. We have long been unable to simply identify fixed
or enhanced versions of the DTDs and schemas, and this will be an
important new tool. We propose the name
teiversion,
with a simple floating-point number as the datatype. It has also been
proposed that the version number should be derived from the date of
release, but on balance we prefer to simply increment the number.
A TEI document using this namespace and the first release of TEI P5
will thus commence <TEI xmlns="http://www.tei-c.org/ns/1.0" teiversion="5.0">
Elements and classes
We have already made the decision to rename the root element of a
TEI document to TEI (or teiCorpus), without the .2 suffix which has
puzzled users ever since P3. It may be worth noting that these two element
names will remain the only ones to flout the general principles
underlying the naming of TEI elements, as stated in TEI document
[ML W26](http://www.tei-c.org.uk/Vault/ML/mlw26.htm), which concludes as follows:
The following recommendations should be applied where possible in
generating names for TEI DTDs, and should govern usage in TEI documents
and examples:
- TEI documents and examples should give all one-word tag and
attribute names in lowercase; phrasal names should uppercase the
initial character of each word but the first.
- Names should be natural-language words or phrases.
- Avoid abbreviation except for very common items.
- Where possible, avoid forming names from phrases.
- Avoid collisions among names of different types.
- Use nouns and adjectives for tag and attribute names; avoid verbs.
These principles still seem valid, and are more or less
consistently applied across the Guidelines. However, they do not cover all the
conventions which subsequently developed during the authoring and maintenance of the
Guidelines, with varying degrees of consistency. This document
proposes a new list of specific naming conventions for all components
of the TEI scheme.
The proficient user of P4 must distinguish the following
categories of objects, which all have different forms of name in the
generated DTD fragments:
- elements and attributes e.g.
encodingDesc,
id
- names of elements e.g.
n.encodingDesc
- model classes e.g.
m.bibl
- attribute classes e.g.
a.global
- extension classes e.g.
x.bibl
- common content models e.g.
phrase.seq
In P5, with the use of RelaxNG as underlying formalism, it
becomes much simpler to express the notion of class
membership. Moreover the mechanism of RelaxNG patterns is a
much better fit with the requirements of the TEI abstract
model than the rather idiosyncratic use of SGML parameter
entities in P4. In P5, indirection is provided by the use of
patterns and the class system is directly supported.
In P5, for every defined element blort, there is a
pattern with the same name which carries the definition. Another
pattern, the name of which is content.blort, defines
its content model by reference to macros, element classes, or other
element patterns (though the latter is deprecated). This indirection
makes renaming of elements easy.
Our intention is that every element should be a member of at least
one (possibly several) model classes, and that its content should
generally be defined in terms of element classes rather than specific
elements. A knowledge of element classes thus become rather more
important than it was in P4, where the class system was really only of
relevance when defining user extensions. In P5, the class system is
also of importance for interoperability with other XML vocabularies
and namespaces.
In P5, we intend to adopt the following naming conventions:
- an unadorned name (e.g.
encodingDesc,
id) is an element or an attribute, or a pattern
defining an element
- a name starting
content.
(e.g.
content.encodingDesc) is a pattern
defining the content model of the named element.
- a name starting
class.
(e.g.
class.biblPart,
class.phrase) is a
model class, members of which have common hierarchic or semantic
properties, for use in content models
- a name starting
attributes.class
(e.g.
attributes.class.linking,
attributes.class.global) is an
attribute class, defining the set of attributes shared by all members
of the class.
- a name starting
macro.
(e.g.
macro.phraseSeq) is a pattern defining some common content models
Architectural components
At P4, the TEI scheme is organized as a number of
tagsets which are characterized as being
core,
base,
additional, or
auxiliary. To build a view of the TEI dtd, you take one
base, the core tagsets, and zero or more additional tagsets. Auxiliary
tagsets are free standing DTDs, which re-use some of the other
tagsets, but cannot be combined in the same way.
Each tagset is actually generated from the ODD files by processing
a number of
dtdFrag elements in a rather complicated
way. Attributes on these elements control in a somewhat non-obvious way how
these fragments of schema are to be assembled and
cross-referenced.
At P5, we have dropped the misnomer
tagset in favour of the
neutral term
module. With the advent of support for
multiple namespaces, there is no need for auxiliary modules, and so
we propose to drop the term from P5 (and in any case, of the existing five
auxiliary tagsets, only one — the FSD — seems to have any future). For
the moment we are retaining the distinction between core, base, and
additional modules, but we are seriously considering the necessity for
the distinction between base and additional modules. It seems likely
that further simplification will become possible during the process of
revising the chapter on basic TEI structure.
The way in which modules are generated from ODDs has been
completely changed as a consequence of which there are several naming
changes. However, as the original ODD format (
) was never formally
published or approved by the TEI Council, it is probably unnecessary
to describe these changes in full detail. A brief summary is given in
the next section: full details are provided in the ChangeLog.
Simplifying and renaming the ODDs
The original ODD format defined large numbers of different
documentation crystals for different kinds of object: these included
tagDocs, which document elements and attributes;
peDoc, which document parameter entities; entDoc,
which document entities; and classDoc, which documented
classes. Corresponding with each of these, the ODD system allowed for
range of declaration elements, (tagDecl, entDecl,
claDecl, etc), which produced the formal declaration for the
object concerned; and a number of description elements
(tagDesc, etc.) which generated the prose documenation for it,
and so on.
Following discussion reported in [MEW05](mew05.xml), we carried out a number of
simplifications in the ODD internal vocabulary and usage:
-
tagDecl and patternDecl are removed, and
replaced with the corresponding contents of tagDoc and
patternDoc (as entities). The level of indirection provided
served no useful purpose, and placing the tagDoc and
patternDoc inline makes for ODD processing which is easier to
understand. claDecl is replaced by an inline
classDoc only at its first occurrence for a given class. More
work may be needed here.
-
dataDesc demoted to a comment (should be scanned and
interesting parts put into remarks)
- change gi, attName, entName,
class to ident, for consistency.
- change name to gloss in entDoc,
classDoc, tagDoc, attDef
- wrapper of schemaContent around content RelaxNG in
tagdocs and classdocs
- change name of dtdRef and dtdFrag to
chunkRef and chunk
- disallow rs as alternative to gloss in
tagdocs
-
entDoc renamed to patternDoc
- in valList, change from val
desc
pair to valitem, containing a standard ident,
equiv and gloss trio
- get rid of @contin in dtdFrag (for each dtdFrag, find
people who have it as @contin, and make a dtdRef to them)
- take each eg and put the contents into an
xmleg element if it looks like being well-formed.
xmleg; contents are in namespace
http://www.tei-c.org/P5/Examples/. This has the advantage
that examples are parsed and can be validated.
- get rid of part in classDoc as it is not used
and contents are wrong
- get rid of string elements containing IGNORE, INCLUDE
and CDATA in favour of RelaxNG elements
- add new type for patternDoc "epe" for when it is providing
attributes
- redo the string elements which have public IDs in,
replace with publicID file="..."...
introduce the module element (and moduleref) as a
replacement for chunk file="", dtdFrag,
peRef, etc.
introduce the patternDoc element to replace those
entDoc elements which define patterns, and removed those
which defined the system entities used for TEI modules (this involved
moving information about public identifiers from the entDoc
to the module element; it also means that
modulebecomes the only place to reference a filename).
redefine the entities used to define common content models as
patterns, and renamed them as follows:
- component to macro.component
- component.plus to macro.componentPlus
- component.seq to macro.componentSeq
- paraContent to macro.paraContent
- phrase.seq to macro.phraseSeq
- phrase to macro.phrasegroup
- seq to macro.seq
- specialPara to macro.specialPara
rename all
classDoc elements of type "model" to include the
"class." prefix within their
ident element; similarly, for
all classDoc elements of type A, thus renaming "a." to
"attributes.class.", so "a.global" is now
"attributes.class.global"