This document describes how to produce a customization of the TEI
P5 schema. From the start, the TEI was intended to be used as a set of
building blocks for creating a schema suitable for a particular
project. This is in keeping with the TEI philosophy of providing a
vocabulary for describing texts, not dictating precisely what those
texts must contain or might have contained. This means that it is
What do we mean by a
There are three ways of customizing the TEI:
- Writing a high-level specification for a view of the TEI, and generating
ad hocDTD or schema; this is the preferred method.
- Using the DTD modules, and specifying in the document DTD subset which features you want activated.
- Using the RELAX NG modules, and writing a wrapper schema.
Note that it is not possible at present to use W3C Schema modules for customization.
Although there is no default schema, TEI P5 does provide a number of example customizations which may very well meet your needs, which can be downloaded from the TEI web site or from within the Roma interface:
Choosing the basic set of modules may be sufficient, but it’s also possible that you may want to tailor your TEI schema more tightly. For instance, once you have decided that your
application will make use of the
We will be seeing examples of each of these in the following sections.
Below is a table of all of the TEI modules. More information about each one is given in the TEI Guidelines; each module corresponds to a single chapter.
|analysis||Simple analytic mechanisms|
|certainty||Certainty and uncertainty|
|core |Elements common to all TEI documents|
|corpus||Header extensions for corpus texts|
|declarefs||Feature system declarations|
|dictionaries||Dictionaries and other lexical resources|
|figures||Tables, formulae, and figures|
|gaiji||Character and glyph documentation|
|header |The TEI Header|
|linking||Linking, segmentation and alignment|
|namesdates||Names and dates|
|nets||Graphs, networks and trees|
|tagdocs||Documentation of TEI modules|
|tei |Declarations for datatypes, classes, and macros available to all TEI modules|
|textstructure |Default text structure|
|transcr||Transcription of primary sources|
The TEI is written in a source format called
One Document Does it All) which includes the schema fragments, prose documentation, and reference documentation for the TEI Guidelines in a single document.
The recommended way to customize the TEI is to create a formal
specification expressing your customizations, as an XML document using TEI ODD markup;
this can then be compiled into a suitable DTD, RELAX NG schema or W3C Schema (together
with the appropriate reference documentation), using the Roma program.
If, however, you intend to make extensive use of the TEI in conjunction with other schemas written in RELAX NG, working directly with the RELAX NG modules is probably the best skill to learn. Typical TEI users are more likely to work solely within the confines of the TEI, and may need to use DTDs or W3C Schema as well as RELAX NG, and so writing customizations in the TEI’s own language is usually better.
There are several important reasons why this high-level method is recommended:
- It is independent of the schema type (DTD, RELAX NG schema, W3C schema) and the resulting specification can be used to generate a schema in any of these schema languages.
- It lets you document your work using the familiar TEI markup.
- It provides full access to the TEI class system.
Romautilities generate a single, portable, schema file which you can transfer to other people without worrying about link dependencies.
There are several core components in the TEI infrastructure which
you should understand before creating your own ODD files. The concept
In addition to being defined in modules, elements are also organized into
groups elements containing
images, formulae, and similar objects, such as
Other attributes are used on a range of different elements. It is not efficient
or practical to define the same attribute multiple times, once on each element, so
these attributes are defined as part of
Attribute classes may be nested. In other words, one attribute class may be a member of
another. This is a convenient way of grouping similar classes of attributes so that an element
can claim membership of all of them in one operation. For example, there are three base attribute
classes relating to dating attributes:
The final important concept is the idea of a
the content of paragraphs and similar elements, and is used in the definition of 50
different elements. Macros are defined using the
With this basic introduction to how elements, attributes and their components and relationships are defined, you may now want to take a look at some example specifications from the TEI repository:
ageelement contains one attribute definition, as mentioned above; it is also a member of three attribute classes, and one model class. If you look at the root elementSpecelement, you’ll see the attribute
module="namesdates"; this is what determines that this element is part of the
namesdates(Names and Dates) module.
att.typedattribute class defines two attributes, typeand subtype. It is part of the teimodule.
model.orgPartspecification demonstrates how simple a model class specification can be.
A TEI schema is defined by a
- If the
keyattribute is provided, it refers to the TEI name for a module, and details of that are accessed from the TEI web service database (which may be a local installation).
- If the
urlattribute is provided, it refers to an external file of schema definitions in the RELAX NG language (this is used to pull in non-TEI schemas)
In the simplest case, a user-defined schema might simply combine all the declarations from some nominated modules:
<title>TEI with simple setup</title> <author>Sebastian Rahtz</author> </titleStmt> <publicationStmt><p>freely available</p></publicationStmt> <sourceDesc> <p>Written from scratch.</p> </sourceDesc>
</fileDesc> </teiHeader> <text> <body> <schemaSpec ident=”oddex1″ start=”TEI”> <moduleRef key=”header”/> <moduleRef key=”core”/> <moduleRef key=”tei”/> <moduleRef key=”textstructure”/>
</schemaSpec> </body> </text> </TEI>
Note that this is a normal TEI document, with a metadata header.
In the other examples that follow, we will usually omit the outer TEI
wrapper and just show the
An ODD processor, given such a document, will combine the
declarations which belong to the named modules, and deliver the result
as a schema of some requested type. It might also generate documentation for
all (and only) the elements declared by those modules.
You can address individual elements or classes of modules by the
- the object is entirely new.
- the object entirely replaces the existing
object with the same
- all references to the original object with the same
identare removed from the schema.
- child elements of the object which appear in the original specification are replaced by the versions in the new specification. This may be at any level, as we will see in examples below.
an error to provide
A schema can include declarations for new elements, as in the following example:
<moduleRef key=”core”/> <moduleRef key=”tei”/> <moduleRef key=”textstructure”/> <elementSpec ident=”soundClip” mode=”add”> <classes> <memberOf key=”model.pPart.data”/> </classes> <content> <rng:text/>
</content> </elementSpec> </schemaSpec>
A declaration for the element
In the following example
we add a new element
<gloss>Rebirth details</gloss> <desc>contains information about a soul’s rebirth, such as its date and place.</desc> <classes> <memberOf key=”model.persEventLike”/> <memberOf key=”att.editLike”/> <memberOf key=”att.datable”/> <memberOf key=”att.naming”/> </classes>
<content> <rng:ref name=”macro.phraseSeq”/> </content> </elementSpec> </schemaSpec>
There are usually four parts to such an element definition:
- An identifier (in this case the value
rebirthfor the identattribute).
- Documentation (the
- Declaration of which classes this element is to be a member of
att.datableand att.naming); this is the same as birth, which we have to find out by looking at the definition of that
- The content model for the element, here the general purpose
There is no need to specify a module for the element to appear in, as this would not be used for anything.
Specifing that we do not want some of the elements to appear in our final schema is easy:
<moduleRef key=”tei”/> <moduleRef key=”textstructure”/> <elementSpec ident=”headItem” mode=”delete” module=”core”/> <elementSpec ident=”headLabel” mode=”delete” module=”core”/> <elementSpec ident=”hyphenation” mode=”delete” module=”header”/> </schemaSpec>
Note that no child elements of the deleted object are needed, or taken notice of.
Changing existing elements
When we come to
<elementSpec ident=”div” mode=”change”> <attList> <attDef ident=”type” usage=”req” mode=”change”> <gloss>You must indicate the level of the section</gloss> <datatype> <rng:ref xmlns:rng=”http://relaxng.org/ns/structure/1.0″ name=”datatype.Code”/> </datatype> <valList type=”closed” mode=”replace”>
<valItem ident=”section”> <gloss>1st level section</gloss> </valItem> <valItem ident=”subsection”> <gloss>2nd level section</gloss> </valItem> <valItem ident=”subsubsection”> <gloss>3rd level section</gloss>
</valItem> </valList> </attDef> </attList> </elementSpec> </schemaSpec>
In this example, we are changing the behaviour of the
must be supplied on each identifiable part of the object which is to
change. So the
Change mode can apply to classes as well as elements. In the
following example, we remove a set of attributes which are provided
for any element which is a member of the
<classSpec module=”linking” ident=”att.global.linking” mode=”change”> <attList> <attDef ident=”corresp” mode=”delete”/> <attDef ident=”synch” mode=”delete”/> <attDef ident=”sameAs” mode=”delete”/> <attDef ident=”copyOf” mode=”delete”/> <attDef ident=”next” mode=”delete”/> <attDef ident=”prev” mode=”delete”/> <attDef ident=”exclude” mode=”delete”/>
<attDef ident=”select” mode=”delete”/> </attList> </classSpec> </schemaSpec>
If you want to change which elements
A good example of this would be if you wanted to use the W3C XInclude scheme in your XML. This is a way of referring to external files to be transcluded (DTD users will be familiar with the use of file entities to perform this job). This document, for example, pulls in a table (created by an automatic process) by using this piece of code:
<rng:zeroOrMore> <rng:element> <rng:anyName/> <rng:zeroOrMore> <rng:attribute> <rng:anyName/> </rng:attribute> </rng:zeroOrMore> </rng:element>
</rng:zeroOrMore> </rng:element> </rng:optional> </content> <attList> <attDef ident=”href” usage=”req”> <datatype> <rng:data type=”anyURI”/> </datatype>
</attDef> <attDef ident=”parse”> <datatype> <rng:choice> <rng:value>xml</rng:value> <rng:value>text</rng:value> </rng:choice> </datatype>
<defaultVal>xml</defaultVal> </attDef> <attDef ident=”xpointer”> <datatype> <rng:text/> </datatype> </attDef> <attDef ident=”encoding”>
<datatype> <rng:text/> </datatype> </attDef> <attDef ident=”accept”> <datatype> <rng:text/> </datatype> </attDef>
<attDef ident=”accept-charset”> <datatype> <rng:text/> </datatype> </attDef> <attDef ident=”accept-language”> <datatype> <rng:text/> </datatype>
</attDef> </attList> </elementSpec>
Note the new
When you have finished writing your customization, you can turn your ODD into schemas or DTDs for use with XML editors or validators, or create schema documentation showing the specification for your elements and classes. There are a few options for carrying out both of these tasks:
If you want to use the RELAX NG schema modules,
include “http://www.tei-c.org/schema/relaxng/header.rnc” inherit = ns1 include “http://www.tei-c.org/schema/relaxng/core.rnc” inherit = ns1 include “http://www.tei-c.org/schema/relaxng/tei.rnc” inherit = ns1 include “http://www.tei-c.org/schema/relaxng/textstructure.rnc” inherit = ns1 start = TEI
This is clearer than the DTD method, as it loads files containing definitions from explicit URLs. It is then possible to override any patterns in the included files; so the following schema
include “http://www.tei-c.org/schema/relaxng/header.rnc” inherit = ns1 [ define [ name = “mentioned” notAllowed [ ] ] ] include “http://www.tei-c.org/schema/relaxng/core.rnc” inherit = ns1 include “http://www.tei-c.org/schema/relaxng/tei.rnc” inherit = ns1 include “http://www.tei-c.org/schema/relaxng/textstructure.rnc” inherit = ns1 start = TEI
RELAX NG patterns are defined for the TEI as follows:
This can be redefined as desired.
- An attribute class generates a pattern which references the definition of each of the class attributes.
- Each attribute generates a pattern.
- A model class generates a pattern with an initial value of
Each of these can be redefined
separately. In addition, for each model class of which the element is
a member, it generates an addition to the class pattern. Thus
so that any reference to
model.biblLike will now allow for
It is also possible to work with DTD modules, although the TEI does not recommend this any more. You specify which modules of the TEI you want to use by means of the DTD internal subset. A minimal TEI document using this method might start as follows:
This loads the obligatory modules
There is a parameter entity for each module(created by prefixing
the module name with
TEI., so we could request
to the DTD subset. It is also possible to disable particular elements from the modules by setting a parameter corresponding to the element. So
An ODD specification can be processed in a scripting environment
by using the
By default the script creates DTD, XSD and RELAX NG schemas, each
of these can be suppressed if needed, and a set of summary
documentation can be created. The
For information on using the web-based interface to
The TEI was designed to capture all the vagaries of literary and
Note that the representation is as found in the text, not normalized. </p>
But what about validating this XML against a schema? Using the
Namespace-based Validation Dispatching Language (see
Two common cases which do not require interleaving are:
- redefining the content of
formulato allow for MathML markup.
- redefining the content of
figureto allow SVG markup.
In each case, we first need a
These schemas can be downloaded from
<rng:ref name=”p”/> <rng:ref name=”svg.svg”/> </rng:choice> </rng:zeroOrMore> </content> </elementSpec>
A common requirement for changing existing elements is to make the visible names suit a local language. If we want to use the TEI in an entirely Spanish-speaking environment, it can be useful to have a copy of the TEI schema with all the names converted to Spanish. Documents can be created and edited using this schema, and then translated back to the canonical form for long-term archiving or distribution.
These translations are possible because the TEI
defines names in English for elements and attributes, but does not use
these names directly in content models for other elements. This means
that the names can be changed without breaking the rest of the
system. For example, the content model for
title here refers to the
title; this is defined with:
If we change it to
the definition for
title remain correct.
If we create documents using this schema, how can we be sure the back translation is easy? Because we can always go back to the source of the customization to find the original name.
The translation process in ODD is simple. Each element or attribute
affected must be supplied in
</elementSpec> <elementSpec ident=”mentioned” module=”core” mode=”change”> <altIdent type=”lang”>mencionado</altIdent> </elementSpec> <elementSpec ident=”when” module=”linking” mode=”change”> <altIdent type=”lang”>cuando</altIdent> <attList> <attDef mode=”change” ident=”unit”> <altIdent type=”lang”>unidad</altIdent>
</attDef> </attList> </elementSpec> </schemaSpec>
Notice that each
Constructing specifications like this by hand is both tedious and
error-prone, and it would be unwise for each separate project to make
its own translations. The TEI Consortium therefore maintains a set of
The effect of using a translated schema is shown in the image below; the