Getting Started with P5 ODDs


Contents

Introduction

This document describes how to produce a customization of the TEI P5 schema. From the start, the TEI was intended to be used as a set of building blocks for creating a schema suitable for a particular project. This is in keeping with the TEI philosophy of providing a vocabulary for describing texts, not dictating precisely what those texts must contain or might have contained. This means that it is likely, not just possible, that you will want to have a tailored view of the TEI.

What do we mean by a ‘customization’? It is important to understand that there is no single DTD or schema which is the TEI; you always choose from the available modules (there are currently 22 of them, listed in Figure 1. The TEI modules.) those that you want, with the caveat that the three modules core, header and textstructure (and tei, when using RELAX NG) should always be chosen unless you are certain you know what you are doing. Elements in these modules are referred to throughout the other modules, and hence these modules cannot be eliminated without careful adjustments.

There are three ways of customizing the TEI:
  1. Writing a high-level specification for a view of the TEI, and generating an ad hoc DTD or schema; this is the preferred method.
  2. Using the DTD modules, and specifying in the document DTD subset which features you want activated.
  3. Using the RELAX NG modules, and writing a wrapper schema.
Note that it is not possible at present to use W3C Schema modules for customization.
Although there is no default schema, TEI P5 does provide a number of example customizations which may very well meet your needs, which can be downloaded from the TEI web site or from within the Roma interface:
  • tei_bare: TEI Absolutely Bare
  • teilite: TEI Lite
  • tei_corpus: TEI for Linguistic Corpora
  • tei_ms: TEI for Manuscript Description
  • tei_drama: TEI with Drama
  • tei_speech: TEI for Speech Representation
  • tei_odds: TEI for authoring ODD
  • tei_allPlus: TEI with maximal setup, plus external additions
  • tei_svg: TEI with SVG
  • tei_math: TEI with MathML
  • tei_xinclude: TEI with XInclude (experimental)
Choosing the basic set of modules may be sufficient, but it's also possible that you may want to tailor your TEI schema more tightly. For instance, once you have decided that your application will make use of the msdescription and linking modules, you may also want to
  • remove elements from some of the modules which you do not expect to use, to reduce confusion and avoid the accidental use of elements you don't need
  • rename elements (see Internationalisation for more discussion of this)
  • add, delete or change attributes for existing elements, perhaps to make the datatype stricter
  • add new elements, and insert them into the TEI class system
We will be seeing examples of each of these in the following sections.

Below is a table of all of the TEI modules. More information about each one is given in the TEI Guidelines; each module corresponds to a single chapter.

Figure 1. The TEI modules.
analysis Simple analytic mechanisms
certainty Certainty and uncertainty
core Elements common to all TEI documents
corpus Header extensions for corpus texts
declarefs Feature system declarations
dictionaries Dictionaries and other lexical resources
drama Performance texts
figures Tables, formulae, and figures
gaiji Character and glyph documentation
header The TEI Header
iso-fs Feature structures
linking Linking, segmentation and alignment
msdescription Manuscript Description
namesdates Names and dates
nets Graphs, networks and trees
spoken Transcribed Speech
tagdocs Documentation of TEI modules
tei Declarations for datatypes, classes, and macros available to all TEI modules
textcrit Text criticism
textstructure Default text structure
transcr Transcription of primary sources
verse Verse structures

Writing ODD specifications

The TEI is written in a source format called ‘ODD’ (‘One Document Does it All’) which includes the schema fragments, prose documentation, and reference documentation for the TEI Guidelines in a single document. 1 ) An ODD specification is a normal TEI XML document which makes use of the tagdocs module. This adds a series of elements which are used to specify a new schema, and modifications to the TEI element structure. It is described in detail in the TEI Guidelines chapter Documentation Elements , so only a brief summary will be given here.

The recommended way to customize the TEI is to create a formal specification expressing your customizations, as an XML document using TEI ODD markup; this can then be compiled into a suitable DTD, RELAX NG schema or W3C Schema (together with the appropriate reference documentation), using the Roma program. Roma is a web-based interface for creating TEI customizations, which allows you to fill in simple forms to choose modules, add and delete elements, change attributes, and make other customizations. Advanced users can also create the ODD by hand using normal XML editing tools.

If, however, you intend to make extensive use of the TEI in conjunction with other schemas written in RELAX NG, working directly with the RELAX NG modules is probably the best skill to learn. Typical TEI users are more likely to work solely within the confines of the TEI, and may need to use DTDs or W3C Schema as well as RELAX NG, and so writing customizations in the TEI's own language is usually better.

There are several important reasons why this high-level method is recommended:
  1. It is independent of the schema type (DTD, RELAX NG schema, W3C schema) and the resulting specification can be used to generate a schema in any of these schema languages.
  2. It lets you document your work using the familiar TEI markup.
  3. It provides full access to the TEI class system.
  4. The Roma utilities generate a single, portable, schema file which you can transfer to other people without worrying about link dependencies.

Key concepts

There are several core components in the TEI infrastructure which you should understand before creating your own ODD files. The concept of modules has been explained above: TEI elements and attributes are organized into a set of modules which group them according to their purpose.

Elements are normally defined in the context of modules. For example, the castList element is defined in the drama module. An element is defined (or ‘declared’) using the <elementSpec> element.

In addition to being defined in modules, elements are also organized into model classes. A model class is a method of grouping a number of elements together so that they can be referred to easily as a group. For example, model.graphicLike ‘groups elements containing images, formulae, and similar objects’, such as <formula> and <graphic> . Other elements (for instance <figure> ) can then make use of this model as part of their content definition. Model classes are defined using the <classSpec> element, with the attribute type="model". The <classSpec> does not contain a list of elements which belong to the class; instead, elements "claim membership" of the class through <memberOf> elements in their own <elementSpec> s.

Attributes are always defined using the <attDef> element, but they may be defined in two different contexts. Some attributes are defined directly on elements; in other words, their definition forms part of the definition of the element on which they appear. For example, the age element, which is used to specify the age of a person, has an attribute called @value, which holds the numeric value of the person's age. This attribute is defined directly as part of the definition of the <age> element itself.

Other attributes are used on a range of different elements. It is not efficient or practical to define the same attribute multiple times, once on each element, so these attributes are defined as part of attribute classes. The Guidelines provides a list of all the TEI attribute classes. Each class provides one or more attributes which are likely to be useful in a particular context or for a particular purpose. For example, the @source attribute, which is used to provide a pointer to a bibliographical source for a quotation or reference, is defined in the att.source class. The elements <quote> , <q> , <writing> and <egXML> each declare membership of the att.source class, and thus acquire the @source attribute.

Attribute classes may be nested. In other words, one attribute class may be a member of another. This is a convenient way of grouping similar classes of attributes so that an element can claim membership of all of them in one operation. For example, there are three base attribute classes relating to dating attributes: att.datable.iso (providing attribute for expression of dates in ISO format), att.database.w3c (providing attributes for dates in W3C format), and att.datable.custom (for dates in non-Gregorian calendars). There is also the att.datable class, of which the other three are all members. An element which claims membership of att.datable (such as <date> ) will acquire all the attributes in the three base classes. Attribute classes are defined using the classSpec element, with the attribute type="atts".

The final important concept is the idea of a macro. A macro is basically a method of re-using the same block of content in multiple places. The most common macros are those which define datatypes for TEI attributes. For example, the data.count macro provides the definition of a positive integer, which is used as the datatype for more than 20 different attributes. Defining this datatype in one location avoids duplication and provides consistency. Other macros are used to define content models which are useful in the definition of many different elements; for instance, the macro.paraContent macro ‘defines the content of paragraphs and similar elements’, and is used in the definition of 50 different elements. Macros are defined using the <macroSpec> element.

With this basic introduction to how elements, attributes and their components and relationships are defined, you may now want to take a look at some example specifications from the TEI repository:
  • The <age> element contains one attribute definition, as mentioned above; it is also a member of three attribute classes, and one model class. If you look at the root <elementSpec> element, you'll see the attribute module="namesdates"; this is what determines that this element is part of the namesdates (Names and Dates) module.
  • The att.typed attribute class defines two attributes, @type and @subtype. It is part of the tei module.
  • The model.orgPart specification demonstrates how simple a model class specification can be.

The basic structure of a <schemaSpec>

A TEI schema is defined by a <schemaSpec> element containing an arbitrary mixture of explicit declarations for objects (i.e. elements, classes, or macro specifications) and references to other objects containing such declarations. In simplified form, the data model is
schemaSpec = (moduleRef | elementSpec | macroSpec | classSpec )*
where <elementSpec> , <macroSpec> and <classSpec> contain definitions of TEI objects. <moduleRef> references groups of existing definitions, in one of two ways:
  1. If the key attribute is provided, it refers to the TEI name for a module, and details of that are accessed from the TEI web service database (which may be a local installation).
  2. If the url attribute is provided, it refers to an external file of schema definitions in the RELAX NG language (this is used to pull in non-TEI schemas)
In the simplest case, a user-defined schema might simply combine all the declarations from some nominated modules:
<TEI> <teiHeader> <fileDesc> <titleStmt> <title>TEI with simple setup</title> <author>Sebastian Rahtz</author> </titleStmt> <publicationStmt><p>freely available</p></publicationStmt> <sourceDesc> <p>Written from scratch.</p> </sourceDesc> </fileDesc> </teiHeader> <text> <body> <schemaSpec ident="oddex1" start="TEI"> <moduleRef key="header"/> <moduleRef key="core"/> <moduleRef key="tei"/> <moduleRef key="textstructure"/> </schemaSpec> </body> </text> </TEI>
Note that this is a normal TEI document, with a metadata header. In the other examples that follow, we will usually omit the outer TEI wrapper and just show the <schemaSpec> element.

An ODD processor, given such a document, will combine the declarations which belong to the named modules, and deliver the result as a schema of some requested type. It might also generate documentation for all (and only) the elements declared by those modules. The start attribute of <schemaSpec> is used to specify in a RELAX NG schema which elements are valid entry points.

You can address individual elements or classes of modules by the adding <elementSpec> , <classSpec> or <macroSpec> elements after <moduleRef> . Each of these must have a mode attribute on it, which can take four values:
add
the object is entirely new.
replace
the object entirely replaces the existing object with the same ident.
delete
all references to the original object with the same ident are removed from the schema.
change
child elements of the object which appear in the original specification are replaced by the versions in the new specification. This may be at any level, as we will see in examples below.
It is an error to provide replace, delete or change versions for objects which do not already exist in the TEI, and an error to add something with the same ident attribute as an existing object in the TEI.

Adding new elements

A schema can include declarations for new elements, as in the following example:
<schemaSpec xmlns:rng="http://relaxng.org/ns/structure/1.0" ident="oddex1.5" start="TEI" xml:base="examples/odd1.5.xml"> <moduleRef key="header"/> <moduleRef key="core"/> <moduleRef key="tei"/> <moduleRef key="textstructure"/> <elementSpec ident="soundClip" mode="add"> <classes> <memberOf key="model.pPart.data"/> </classes> <content> <rng:text/> </content> </elementSpec> </schemaSpec>
A declaration for the element <soundClip> , which is not defined in the TEI scheme, will be added to the output schema. This element will also be added to the existing TEI class model.pPart.data, and will thus be avilable in TEI conformant documents.
In the following example we add a new element <rebirth> which is modelled on the existing <birth> element:
<schemaSpec xmlns:rng="http://relaxng.org/ns/structure/1.0" ident="oddex4" start="TEI" xml:base="examples/odd4.xml"> <moduleRef key="header"/> <moduleRef key="core"/> <moduleRef key="tei"/> <moduleRef key="textstructure"/> <moduleRef key="corpus"/> <elementSpec ident="rebirth" mode="add"> <gloss>Rebirth details</gloss> <desc>contains information about a soul's rebirth, such as its date and place.</desc> <classes> <memberOf key="model.persEventLike"/> <memberOf key="att.editLike"/> <memberOf key="att.datable"/> <memberOf key="att.naming"/> </classes> <content> <rng:ref name="macro.phraseSeq"/> </content> </elementSpec> </schemaSpec>
There are usually four parts to such an element definition:
  1. An identifier (in this case the value rebirth for the ident attribute).
  2. Documentation (the <gloss> and <desc> elements)
  3. Declaration of which classes this element is to be a member of (att.datable and att.naming); this is the same as <birth> , which we have to find out by looking at the definition of that
  4. The content model for the element, here the general purpose pattern macro.phraseSeq
There is no need to specify a module for the element to appear in, as this would not be used for anything.

Removing elements

Specifing that we do not want some of the elements to appear in our final schema is easy:
<schemaSpec ident="oddex2" start="TEI" xml:base="examples/odd2.xml"> <moduleRef key="header"/> <moduleRef key="core"/> <moduleRef key="tei"/> <moduleRef key="textstructure"/> <elementSpec ident="headItem" mode="delete" module="core"/> <elementSpec ident="headLabel" mode="delete" module="core"/> <elementSpec ident="hyphenation" mode="delete" module="header"/> </schemaSpec>
Note that no child elements of the deleted object are needed, or taken notice of.

Changing existing elements

When we come to changing existing elements, the specification looks a little more complex:
<schemaSpec ident="oddex3" start="TEI" xml:base="examples/odd3.xml"> <moduleRef key="header"/> <moduleRef key="core"/> <moduleRef key="tei"/> <moduleRef key="textstructure"/> <elementSpec ident="div" mode="change"> <attList> <attDef ident="type" usage="req" mode="change"> <gloss>You must indicate the level of the section</gloss> <datatype> <rng:ref xmlns:rng="http://relaxng.org/ns/structure/1.0" name="datatype.Code"/> </datatype> <valList type="closed" mode="replace"> <valItem ident="section"> <gloss>1st level section</gloss> </valItem> <valItem ident="subsection"> <gloss>2nd level section</gloss> </valItem> <valItem ident="subsubsection"> <gloss>3rd level section</gloss> </valItem> </valList> </attDef> </attList> </elementSpec> </schemaSpec>
In this example, we are changing the behaviour of the <div> element so that the type attribute (inherited from the class att.divLike) is mandatory and chosen from a fixed set of values. The change value for mode must be supplied on each identifiable part of the object which is to change. So the <elementSpec> itself is in change mode, plus the <attDef> for type, while the <valList> is in replace mode. The elements we have not specified any change for (examples, references, etc) are copied from the original.
Change mode can apply to classes as well as elements. In the following example, we remove a set of attributes which are provided for any element which is a member of the att.linking class:
<schemaSpec ident="oddex5" start="TEI" xml:base="examples/odd5.xml"> <moduleRef key="header"/> <moduleRef key="core"/> <moduleRef key="tei"/> <moduleRef key="textstructure"/> <moduleRef key="linking"/> <classSpec module="linking" ident="att.global.linking" mode="change"> <attList> <attDef ident="corresp" mode="delete"/> <attDef ident="synch" mode="delete"/> <attDef ident="sameAs" mode="delete"/> <attDef ident="copyOf" mode="delete"/> <attDef ident="next" mode="delete"/> <attDef ident="prev" mode="delete"/> <attDef ident="exclude" mode="delete"/> <attDef ident="select" mode="delete"/> </attList> </classSpec> </schemaSpec>
If you want to change which elements belong to add.linking, you must change the <classes> element of each of the elements separately.

Adding new elements in in a different namespace

A good example of this would be if you wanted to use the W3C XInclude scheme in your XML. This is a way of referring to external files to be transcluded (DTD users will be familiar with the use of file entities to perform this job). This document, for example, pulls in a table (created by an automatic process) by using this piece of code:
<include href="examples/modules.xml" xmlns="http://www.w3.org/2001/XInclude"/>
Since the <include> could occur anywhere, we want to add it to a TEI class which is referenced almost everywhere; model.inter does this job nicely. We could pull in an external schema which defines <include> , but it may be amusing to define it ourselves using this <elementSpec> :
<elementSpec xmlns:rng="http://relaxng.org/ns/structure/1.0" ident="xinclude" mode="add" ns="http://www.w3.org/2001/XInclude" xml:base="examples/odd6.xml"> <altIdent>include</altIdent> <classes> <memberOf key="model.inter"/> </classes> <content> <rng:optional> <rng:element name="fallback" ns="http://www.w3.org/2001/XInclude"> <rng:zeroOrMore> <rng:element> <rng:anyName/> <rng:zeroOrMore> <rng:attribute> <rng:anyName/> </rng:attribute> </rng:zeroOrMore> </rng:element> </rng:zeroOrMore> </rng:element> </rng:optional> </content> <attList> <attDef ident="href" usage="req"> <datatype> <rng:data type="anyURI"/> </datatype> </attDef> <attDef ident="parse"> <datatype> <rng:choice> <rng:value>xml</rng:value> <rng:value>text</rng:value> </rng:choice> </datatype> <defaultVal>xml</defaultVal> </attDef> <attDef ident="xpointer"> <datatype> <rng:text/> </datatype> </attDef> <attDef ident="encoding"> <datatype> <rng:text/> </datatype> </attDef> <attDef ident="accept"> <datatype> <rng:text/> </datatype> </attDef> <attDef ident="accept-charset"> <datatype> <rng:text/> </datatype> </attDef> <attDef ident="accept-language"> <datatype> <rng:text/> </datatype> </attDef> </attList> </elementSpec>
Note the new ns attribute on <elementSpec> which says that this element is not to be defined in the default (TEI) namespace, and the use of the shorthand RELAX NG method of inline element definition of <fallback> within the <include> element.

Processing your ODD specification

When you have finished writing your customization, you can turn your ODD into schemas or DTDs for use with XML editors or validators, or create schema documentation showing the specification for your elements and classes. There are a few options for carrying out both of these tasks:

Working with RELAX NG schema modules

If you want to use the RELAX NG schema modules, 2 you must always write a wrapper schema, selecting the appropriate modules. Thus a minimal TEI schema might look like this:
namespace ns1 = "http://www.tei-c.org/ns/1.0" namespace rng = "http://relaxng.org/ns/structure/1.0" include "http://www.tei-c.org/schema/relaxng/header.rnc" inherit = ns1 include "http://www.tei-c.org/schema/relaxng/core.rnc" inherit = ns1 include "http://www.tei-c.org/schema/relaxng/tei.rnc" inherit = ns1 include "http://www.tei-c.org/schema/relaxng/textstructure.rnc" inherit = ns1 start = TEI
This is clearer than the DTD method, as it loads files containing definitions from explicit URLs. It is then possible to override any patterns in the included files; so the following schema
namespace ns1 = "http://www.tei-c.org/ns/1.0" namespace rng = "http://relaxng.org/ns/structure/1.0" include "http://www.tei-c.org/schema/relaxng/header.rnc" inherit = ns1 [ define [ name = "mentioned" notAllowed [ ] ] ] include "http://www.tei-c.org/schema/relaxng/core.rnc" inherit = ns1 include "http://www.tei-c.org/schema/relaxng/tei.rnc" inherit = ns1 include "http://www.tei-c.org/schema/relaxng/textstructure.rnc" inherit = ns1 start = TEI
loads the header module, but then redefines the meaning of <mentioned> to be the special RELAX NG pattern notAllowed. This is a powerful and elegant mechanism; the only downside is that you must understand the inner structure of the TEI modules.
RELAX NG patterns are defined for the TEI as follows:
  • Each element, macro and class identifies the module it is part of; this determines which schema file its definition is written to.
  • Every macro (defined by a <macroSpec> in the source) has a RELAX NG pattern of the same name. e.g.
    macro.glossSeq = altIdent?, equiv*, gloss?, desc?
    This can be redefined as desired.
  • Class specifications generate a number of patterns, depending on their type:
    1. An attribute class generates a pattern which references the definition of each of the class attributes.
    2. Each attribute generates a pattern.
    3. A model class generates a pattern with an initial value of notAllowed
    Thus the att.timed attribute class generates
    tei.timed.attributes = tei.timed.attribute.start, tei.timed.attribute.end, tei.timed.attribute.dur tei.timed.attribute.start = attribute start { datatype.uri }? tei.timed.attribute.end = attribute end { datatype.uri }? tei.timed.attribute.dur = attribute dur { xsd:duration }?
    while the model.listLike model class generates
    model.listLike = notAllowed
  • Every element generates at least three patterns; the first defines the element itself, the second defines its content, and the third its attributes. For example, the top-level element <TEI> is defined with:
    TEI = element TEI { TEI.content, TEI.attributes } TEI.content = tei.teiHeader, tei.teiText TEI.attributes = [ a:defaultValue = "5.0" ] attribute version { xsd:decimal }?, [ a:defaultValue = "TEI" ] attribute TEIform { text }?
    Each of these can be redefined separately. In addition, for each model class of which the element is a member, it generates an addition to the class pattern. Thus <biblItem> is a member of the model.biblLike, att.declarable, and att.typed classes, so it produces:
    tei.bibl |= biblItem tei.declarable |= biblItem tei.typed |= biblItem
    so that any reference to model.biblLike will now allow for <biblItem> too.

Working with the DTD subset

It is also possible to work with DTD modules, although the TEI does not recommend this any more. You specify which modules of the TEI you want to use by means of the DTD internal subset. A minimal TEI document using this method might start as follows:
<!DOCTYPE TEI SYSTEM "http://www.tei-c.org/release/xml/tei/schema/dtd/tei.dtd" [ <!ENTITY % TEI.header "INCLUDE"> <!ENTITY % TEI.core "INCLUDE"> <!ENTITY % TEI.textstructure "INCLUDE"> ]> <TEI xmlns="http://www.tei-c.org/ns/1.0">
This loads the obligatory modules header, core, and textstructure by setting the corresponding parameter entity to INCLUDE.
There is a parameter entity for each module(created by prefixing the module name with TEI., so we could request the linking module to be loaded by adding
<!ENTITY % TEI.linking "INCLUDE">
to the DTD subset. It is also possible to disable particular elements from the modules by setting a parameter corresponding to the element. So
<!ENTITY % ab "IGNORE" >
would remove <ab> from the list of allowed elements. Although this type of customization is useful, it is not possible to use the method to add new elements, change attributes, or manipulate classes. That sort of change requires a deeper understanding of writing DTD extensions, beyond the scope of this introduction.

Roma (command line)

An ODD specification can be processed in a scripting environment by using the roma command-line script. This takes the form:
Usage: roma [options] schemaspec [output_directory] options, shown with defaults: --xsl=/usr/share/xml/tei/stylesheet --teiserver=http://www.tei-c.org/Query/ --localsource= # local copy of P5 sources options, binary switches: --doc # create expanded documented ODD (TEI Lite XML) --lang=LANG # language for names of attrbutes and elements --doclang=LANG # language for documentation --dochtml # create HTML version of doc --patternprefix=STRING # prefix relax patterns with STRING --docpdf # create PDF version of doc --nodtd # suppress DTD creation --norelax # suppress RELAX NG creation --noxsd # suppress W3C XML Schema creation --noteic # suppress TEI-specific features --debug # leave temporary files, etc.
By default the script creates DTD, XSD and RELAX NG schemas, each of these can be suppressed if needed, and a set of summary documentation can be created. The xsl and teiserver options point to resources which roma needs to do its job; if you have a local copy of the TEI XSL stylesheets, or a local TEI eXist database, you can make the script independent of web access.

For information on using the web-based interface to roma, see Creating Customizations with Roma.

Making use of non-TEI schemas

The TEI was designed to capture all the vagaries of literary and linguistic text; it does not attempt to describe other specialised descriptive languages, such as those for chemistry, mathematics, and vector graphics, or the technical vocabulary of fields like law, health care and computer science. Some of the areas have been addressed as thoroughly as the TEI in their own standards. But what if we want to write a composite document mixing material from two fields? Since all the TEI elements are in their own XML namespace, it is easy to write a document which interleaves TEI markup with markup from another namespace, as in this example of TEI and Docbook:
<p> The button on our web page shows the the date of the manuscript: <guibutton xmlns:dbk="http://docbook.org/docbook-ng"> <date calendar="Julian" when="1732-02-22">Feb. 11, 1731/32, O.S.</date> </guibutton> Note that the representation is as found in the text, not normalized. </p>
But what about validating this XML against a schema? Using the Namespace-based Validation Dispatching Language (see http://www.nvdl.org/ ), we can validate the two languages separately, but we also want a TEI customization which checks where insert of ‘foreign’ elements is permitted. This means importing another schema, and changing one or more TEI classes to allow for the new element(s). If it is also required that TEI elements be allowed inside the elements of the other namespace, we also have to modify the other namespace.
Two common cases which do not require interleaving are:
  1. redefining the content of <formula> to allow for MathML markup.
  2. redefining the content of <figure> to allow SVG markup.
In each case, we first need a <moduleRef> which loads the external schema in RELAX NG format:
<moduleRef url="mathml2-main.rng"/> <moduleRef url="svg-main.rng"/>
These schemas can be downloaded from http://www.w3.org/Math/ and http://www.w3.org/TR/SVG11/ ; note that they may each need a small fix to remove the RELAX NG <start> pattern, as this causes a conflict with the TEI definition. These define respectively two patterns called mathml.math and svg.svg, which we can proceed to add to TEI content models.
  • For MathML, we can redefine an existing macro which is already provided as a hook inside the content of <formula> :
    <macroSpec xmlns:rng="http://relaxng.org/ns/structure/1.0" type="pe" ident="datatype.Formula" mode="change" xml:base="examples/addmath.xml"> <content> <rng:ref name="mathml.math"/> </content> </macroSpec>
  • For SVG, we need to change the model of <figure> , simply adding a reference to svg.svg at the end of a <choice> list:
    <elementSpec ident="figure" mode="change" xml:base="examples/addsvg.xml"> <content> <rng:zeroOrMore xmlns:rng="http://relaxng.org/ns/structure/1.0"> <rng:choice> <rng:ref name="model.Incl"/> <rng:ref name="figure"/> <rng:ref name="figDesc"/> <rng:ref name="graphic"/> <rng:ref name="head"/> <rng:ref name="p"/> <rng:ref name="svg.svg"/> </rng:choice> </rng:zeroOrMore> </content> </elementSpec>

Internationalisation

A common requirement for changing existing elements is to make the visible names suit a local language. If we want to use the TEI in an entirely Spanish-speaking environment, it can be useful to have a copy of the TEI schema with all the names converted to Spanish. Documents can be created and edited using this schema, and then translated back to the canonical form for long-term archiving or distribution.

These translations are possible because the TEI defines names in English for elements and attributes, but does not use these names directly in content models for other elements. This means that the names can be changed without breaking the rest of the system. For example, the content model for <series> is
series.content = (text | model.gLike | title | editor | respStmt | biblScope | model.global)*
but the ‘title’ here refers to the pattern called ‘title’; this is defined with:
title = element title { title.content, title.attributes }
If we change it to
title = element titulo { title.content, title.attributes }
the definition for <series> will still work, and the pointers to the content and attributes of ‘title’ remain correct.

If we create documents using this schema, how can we be sure the back translation is easy? Because we can always go back to the source of the customization to find the original name.

The translation process in ODD is simple. Each element or attribute affected must be supplied in change mode, with simply an <altIdent> provided. For example, here are some translations into Spanish:
<schemaSpec xml:base="examples/spanish.xml"> <elementSpec ident="quote" module="core" mode="change"> <altIdent type="lang">cita</altIdent> </elementSpec> <elementSpec ident="cit" module="core" mode="change"> <altIdent type="lang">citaCompl</altIdent> </elementSpec> <elementSpec ident="mentioned" module="core" mode="change"> <altIdent type="lang">mencionado</altIdent> </elementSpec> <elementSpec ident="when" module="linking" mode="change"> <altIdent type="lang">cuando</altIdent> <attList> <attDef mode="change" ident="unit"> <altIdent type="lang">unidad</altIdent> </attDef> </attList> </elementSpec> </schemaSpec>
Notice that each <attDef> element must also specify change mode, as well as the parent <elementSpec> .
Constructing specifications like this by hand is both tedious and error-prone, and it would be unwise for each separate project to make its own translations. The TEI Consortium therefore maintains a set of translated names, 3 and a utility to generate the appropriate ODD code for elements from all the modules you have selected. The Roma application automates this to choosing from a drop down list as shown in the figure below.
Choosing language for names in
Figure 1. Choosing language for names in Roma
The effect of using a translated schema is shown in the image below; the oXygen editor is shown editing Hamlet with Spanish element and attribute names.
Editing TEI text using a schema translated to Spanish
Figure 2. Editing TEI text using a schema translated to Spanish
Notes
1.
The concepts of ODD were devised and implemented by Lou Burnard and Michael Sperberg-McQueen early in the development of the TEI. The language developed over time as the TEI was put together, and one form of it was documented in the TEI Guidelines (versions 3 and 4); unfortunately, that version of the markup was not what was actually used to write the TEI Guidelines, which diverged into a more complex scheme. For version 5 of the TEI, the entire ODD language was heavily revised and simplified by a working group led by Sebastian Rahtz, and the Guidelines themselves brought into conformance with it.
2.
Examples of RELAX NG in this section are presented using the compact syntax; when you write TEI customizations in the ODD system, it is necessary to use the XML syntax.
3.
A project initiated by Alejandro Bia, and extended by Sebastian Rahtz and Arno Mittelbach.

Last recorded change to this page: 2013-12-08  •  For corrections or updates, contact web@tei-c.org