23 Using the TEI
目次
This section discusses some technical topics concerning the deployment of the TEI markup scheme documented elsewhere in these Guidelines. In section 23.3 Personalization and Customization we discuss the scope and variety of the TEI customization mechanisms, distinguishing between ‘clean’ modifications, which result in a schema that supports a subset of the distinctions made in the full TEI system, on the one hand, from ‘unclean’ modifications, which result in a schema that does not have this property. In 23.4 Conformance we define the notion of TEI Conformance, distinguishing between documents which are algorithmically TEI conformant ("TEI Conformable") from those which are intrinsically conformant ("TEI Conformant"); we also define the concept of a TEI extension. Since the ODD markup description language defined in chapter 22 Documentation Elements is fundamental to the way conformance and customization are handled in the TEI system, these two definitional sections are followed by a section (23.5 Implementation of an ODD System) which describes the intended behaviour of an ODD processor.
23.1 Serving TEI files with the TEI media type Serving TEI files with the TEI media type¶
In February 2011, the media type application/tei+xml was registered with
IANA for ‘markup languages defined in accordance with the Text Encoding
and Interchange guidelines’ (RFC 6129). We
recommend that any XML file whose root element is in the TEI
namespace be served with the media type application/tei+xml to enable
and encourage automated recognition and processing of TEI files by
external applications.
23.2 Obtaining the TEI Schemas Obtaining the TEI Schemas¶
As discussed in chapter 22 Documentation Elements, the modules making up the TEI scheme are generated from a single set of XML source files. Schemas can be generated for TEI customizations in each of XML DTD language, W3C schema language, and RELAX NG schema language. In the body of the Guidelines, only the latter form is presented, using the compact syntax.
The TEI schemas and Guidelines are widely available over the Internet and elsewhere. The canonical home for the TEI source, the schema fragments generated from it, and example modifications, is the TEI repository at http://tei.sf.net; versions are also available in other formats, along with copies of the Guidelines and related materials, from the TEI web site at http://www.tei-c.org.
23.3 Personalization and Customization Personalization and Customization¶
These Guidelines provide an encoding scheme suitable for encoding a very wide range of texts, and capable of supporting a wide variety of applications. For this reason, the TEI scheme supports a variety of different approaches to solving similar problems, and also defines a much richer set of elements than is likely to be necessary in any given project. Furthermore, the TEI scheme may be extended in well-defined and documented ways for texts that cannot be conveniently or appropriately encoded using what is provided. For these reasons, it is almost impossible to use the TEI scheme without customizing or personalizing it in some way.
This section describes how the TEI encoding scheme may be customized, and should be read in conjunction with chapter 22 Documentation Elements, which describes how a specific application of the TEI encoding scheme should be documented. The documentation system described in that chapter is, like the rest of the TEI scheme, independent of any particular schema or document type definition language.
Formally speaking, these Guidelines provide both syntactic rules about how elements and attributes may be used in valid documents and semantic recommendations about what interpretation should be attached to a given syntactic construct. In this sense, they provide both a document type definition and a document type declaration. More exactly, we may distinguish between the TEI Abstract Model, which defines a set of related concepts, and the TEI schema which defines a set of syntactic rules and constraints. Many (though not all) of the semantic recommendations are provided solely as informal descriptive prose, though some of them are also enforced by means of such constructs as datatypes (see 1.4.2 Datatype Macros). Although the descriptions have been written with care, there will inevitably be cases where the intention of the contributors has not been conveyed with sufficient clarity to prevent users of the Guidelines from ‘extending’ them in the sense of attaching slightly variant semantics to them.
Beyond this unintentional semantic extension, some of the elements described can intentionally be used in a variety of ways; for example, the element note has an attribute type which can take on arbitrary string values, depending on how it is used in a document. A new type of ‘note’, therefore, requires no change in the existing model. On the other hand, for many applications, it may be desirable to constrain the possible values for the type attribute to a small set of possibilities. A schema modified in this way would no longer necessarily regard as valid the same set of documents as the corresponding unmodified TEI schema, but would remain faithful to the same conceptual model.
This section explains how the TEI scheme can be customized by suppressing elements, modifying classes of elements, adding elements, and renaming elements. Documents which validate against an application of the TEI scheme which has been customized in this way may or may not be considered ‘TEI conformant’, as further discussed in section 23.4 Conformance.
The TEI scheme is designed to support modification and customization in a documented way that can be validated by an XML processor. This is achieved by writing a small TEI Conformant document, from which an appropriate processor can generate both human-readable documentation, and a schema expressed in a language such as RELAX NG or DTD. The mechanisms used to instantiate a TEI schema differ for different schema languages, and are therefore not defined here. In XML DTDs, for example, extensive use is made of parameter entities, while in RELAX NG schemas, extensive use is made of patterns. In either case, the names of elements and, wherever possible, their attributes and content models are defined indirectly. The syntax used to implement this indirection also varies with the schema language used, but the underlying constructs in the TEI Abstract Model are given the same names.
- all the elements defined by the module (and described in the corresponding section of these Guidelines) are included in the schema;
- each such element is identified by the canonical name given it in these Guidelines;
- the content model of each such element is as defined by these Guidelines;
- the names, datatypes, and permitted values declared for each attribute associated with each such element are as given in these Guidelines;
- the elements comprising element classes and the meaning of macro declarations expressed in terms of element classes is determined by the particular combination of modules selected.
- particular elements may be suppressed, removing them from any classes in which they are members, and also from any generated schema;
- within certain limits, the name (generic identifier) associated with an element may be changed, without changing the semantic or syntactic properties of the element;
- new elements may be added to an existing class, thus making them available in macros or content models defined in terms of those classes;
- additional attributes, or attribute values, may be specified for an individual element or for classes of elements;
- within certain limits, attributes, or attribute values, may also be removed either from an individual element or for classes of elements;
- the characteristics inherited by one class from another class may be modified by modifying its class membership: all members of the class then inherit the changed characteristics;
- the set of values legal for an attribute or attribute class may be constrained or relaxed by supplying or modifying a value list, or by modifying its datatype.
The recommended way of implementing and documenting all such modifications is by means of the ODD system described in chapter 22 Documentation Elements; in the remainder of this section we give specific examples to illustrate how that system may be applied. An ODD processor, such as the Roma application supported by the TEI, or any other comparable set of stylesheets will use the declarations provided by an ODD to generate appropriate sets of declarations in a specific schema language such as RELAX NG or the XML DTD language. We do not discuss in detail here how this should be done, since the details are schema language-specific; some background information about the methods used for XML DTD and RELAX NG schema generation is however provided in section 1.2 Defining a TEI Schema. Several example ODD files are also provided as part of the standard TEI release: see further section 23.3.4 Examples of Modification below.
23.3.1 Kinds of Modification Kinds of Modification¶
- deletion of elements;
- renaming of elements;
- modification of content models;
- modification of attribute and attribute-value lists;
- modification of class membership;
- addition of new elements.
Each kind of modification changes the set of documents that will be considered valid according to the resulting schema. Any combination of unchanged TEI modules may be thought of as defining a certain set of documents. Each schema resulting from a modified combination of TEI modules will define a different set of documents. The set of documents valid according to the unmodified schema may or may not be properly contained in the set of documents considered to be valid according to the modified schema. We use the term clean modification to describe a modification which regards as valid a subset of the documents considered valid by the same combination of TEI modules unmodified. Alternatively, the set of documents considered valid by the original schema might be disjoint from the set of documents considered valid by the modified schema, with neither being properly contained by the other. Modifications that have this result are called unclean modifications. Despite this terminology, unclean modifications are not particularly deprecated, and their use may often be vital to the success of a project. The concept is introduced solely to distinguish the effects of different kinds of modification.
Cleanliness can only be assessed with reference to elements in the TEI namespace.
23.3.1.1 Deletion of Elements Deletion of Elements¶
The simplest way to modify the supplied modules is to suppress one or more of the supplied elements. This is simply done by setting the mode attribute to delete on an elementSpec for the element concerned.
<moduleRef key="core"/>
<!-- other modules used by this schema -->
<elementSpec ident="note" module="core" mode="delete"/>
</schemaSpec>
In most cases, deletion is a clean modification, since most elements are optional. Documents that are valid with respect to the modified schema are also valid according to the unmodified schema. To say this another way, the set of documents matching the new schema is contained by the set of documents matching the original schema.
There are however some elements in the TEI scheme which have mandatory children; for example, the element fileDesc must contain both a titleStmt and a sourceDesc. A modification which deleted either of these would be unclean, because it would regard as valid documents that the unmodified schema would regard as invalid. Deleting one of the many optional children of fileDesc (editionStmt or notesStmt for example) would not have this effect, and would be a clean modification.
In general, whenever the element deleted by a modification is mandatory within the content model of some other (undeleted) element, the result is an unclean modification, and may also break the TEI Abstract Model (23.4.3 Conformance to the TEI Abstract Model). However, the parent of a mandatory child can be safely removed if it is itself optional.
To determine whether or not an element is mandatory in a given context, the user must inspect the content model of the element concerned. In most cases, content models are expressed in terms of model classes rather than elements; hence, removing an element will generally be a clean modification, since there will generally be other members of the class available. If a class is completely depopulated by a modification, then the cleanliness of the modification will depend upon whether or not the class reference is mandatory or optional, in the same way as for an individual element.
23.3.1.2 Renaming of Elements Renaming of Elements¶
Every element and other named markup construct in the TEI scheme has a canonical name, usually in the English language: this name is supplied as the value of the ident attribute on the elementSpec, attDef, classSpec, or macroSpec used to define it. The element or attribute declaration used within a schema generated from that specification may however be different, thus permitting schemas to be written using elements with generic identifiers from a different language, or otherwise modified. There may be many alternative identifiers for the same markup construct, and an ODD processor may choose which of them to use for a given purpose. Each such alternative name is supplied by means of an altIdent element within the specification element concerned.
<altIdent>annotation</altIdent>
</elementSpec>
Renaming in this way is always a reversible modification. Although it is an inherently unclean modification (because the set of documents matched by the resulting schema is disjoint with the set matched by its unmodified equivalent), the process of converting any document in which elements have been renamed into an exactly equivalent document using canonical names is completely deterministic, requiring only access to the ODD in which the renaming has been specified. This assumes that the renamed elements used are not placed in the TEI namespace but either use a null namespace or some user-defined namespace, as further discussed in 23.3.2 Modification and Namespaces; if this is not the case, care must be taken to avoid name collision between the new name and all existing TEI names. Furthermore, unclean modifications which do not specify a namespace are not conformant (see further 23.3 Personalization and Customization)
The TEI provides a systematic set of renamings into languages other than English. These all use a language-specific namespace.
23.3.1.3 Modification of Content Models Modification of Content Models¶
The content model for an element in the TEI scheme is defined by means of a content element within the elementSpec which specifies it. As shown elsewhere in these Guidelines, the content model is defined using RELAX NG syntax, whether the resulting schema is expressed in RELAX NG or in some other schema language.
<rng:ref name="macro.phraseSeq"/>
</content>
rng:text) with references to three
other classes (model.gLike, model.phrase, or model.global). For some particular application it
might be preferable to insist that term elements should only
contain plain text, excluding these other possibilities.86 This could be
achieved simply by supplying a specification for term like
the following: <content>
<rng:text/>
</content>
</elementSpec>
This is a clean modification which does not change the meaning of a TEI element; there is therefore no need to assign the element to some other namespace than that of the TEI, though it may be considered good practice; see further 23.3.2 Modification and Namespaces below.
A change of this kind, which simplifies the possible content of an element by reducing its model to one of its existing components, is always clean, because the set of documents matched by the resulting schema is a subset of the set of documents which would have been matched by the unmodified schema.
Note that content models are generally defined (as far as possible) in terms of references to model classes, rather than to explicit elements. This means that the need to modify content models is greatly reduced: if an element is deleted or modified, for example, then the deletion or modification will be available for every content model which references that element via its class, as well as those which reference it explicitly. For this reason it is not (in general) good practice to replace class references by explicit element references, since this may have unintended side effects.
An unqualified reference to an element class within a content model generates a content model which is equivalent to an alternation of all the members of the class referenced. Thus, a content model which refers to the model class model.phrase will generate a content model in which any one of the members of that class is equally acceptable. It is also possible to reference predefined content model fragments based on classes, such as ‘an optional repeatable alternation of all members of a class’, ‘a sequence containing no more than one of each member of the class’, etc. as described further in 22.4.6 Element Classes.
Content model changes which are not simple restrictions on an existing model should be undertaken with caution. The set of documents matching the schema which results from such changes is likely to be disjoint with the set of documents matching the unmodified schema, and such changes are therefore regarded as unclean. When content models are changed or extended, care should be taken to respect the existing semantics of the element concerned as stated in the Guidelines. For example, the element l is defined as containing a line of verse. It would not therefore make sense to redefine its content model so that it could also include members of the class model.pLike: such a modification although syntactically feasible would not be regarded as TEI conformant because it breaks the TEI Abstract Model.
23.3.1.4 Modification of Attribute and Attribute Value Lists Modification of Attribute and Attribute Value Lists¶
The attributes applicable to a given element may be specified in two ways: they may be given explicitly, by means of an attList element within the corresponding elementSpec, or they may be inherited from an attribute class, as specified in the classes element. To add a new attribute to an element, the schema builder should therefore first check to see whether this attribute is already defined by some existing attribute class. If it is, then the simplest method of adding it will be to make the element in question a member of that class, as further discussed below. If this is not possible, then a new attDef element must be added to the existing attList for the element in question.
Whichever method is adopted, the modification capabilities are the same as those available for elements. Attributes may be added or deleted from the list, using the mode attribute on attDef in the same way as on elementSpec. The ‘content’ of an attribute is defined by means of the datatype, valList, or valDesc elements within the attDef element. Any of these elements may be changed.
Suppose, for example, that we wish to add two attributes to the eg element (used to indicate examples in a text), type to characterize the example in some way, and source to indicate where the example comes from. A quick glance through the Guidelines indicates that the attribute class att.typed could be used to provide the type attribute, but there is no comparable class which will provide a source attribute. The existing eg element in fact has no local attributes defined for it at all: we will therefore need to add not only an attDef element to define the new attribute, but also an attList to hold it.
<attList>
<attDef
ident="source"
mode="add"
ns="http://www.example.com/ns/nonTEI">
<desc>specifies the source of an example by pointing to a
single bibliographic reference for it</desc>
<datatype maxOccurs="1">
<rng:ref name="data.pointer"/>
</datatype>
</attDef>
</attList>
</elementSpec>
The value supplied for the mode attribute on the attDef element is add; if this attribute already existed on the element we are modifying this should generate an error, since a specification cannot have more than one attribute of the same name. If the attribute is already present, we can replace the whole of the existing declaration by supplying replace as the value for mode; alternatively, we can change some parts of an existing declaration only by supplying just the new parts, and setting change as the value for mode.
Because the new attribute is not defined by the TEI, we must specify a namespace for it on the attDef; see further 23.3.2 Modification and Namespaces.
As noted above, adding the new type attribute involves changing this element's class membership; we therefore discuss that in the next section (23.3.1.5 Class Modification).
The canonical name for the new attribute is source, and is supplied on the ident attribute of the attDef element. In this simple example, we supply only a description and datatype for the new attribute; the former is given by the desc element, and the latter by the datatype element. (There are of course many other pieces of information which could be supplied, as documented in 22 Documentation Elements). The content of the datatype element, like that of the content element, uses patterns from the RELAX NG namespace, in this case to select one of the predefined TEI datatypes (1.4.2 Datatype Macros).
<attList>
<attDef
ident="source"
ns="http://www.example.com/ns/notTEI"
mode="add">
<desc>specifies the source of an example by supplying one of three
predefined codes for it.</desc>
<datatype maxOccurs="1">
<rng:ref name="data.word"/>
</datatype>
<valList type="closed">
<valItem ident="A">
<desc>Examples taken from the A-list</desc>
</valItem>
<valItem ident="B">
<desc>Examples taken from the B-list</desc>
</valItem>
<valItem ident="C">
<desc>Examples taken from the C-list</desc>
</valItem>
</valList>
</attDef>
</attList>
</elementSpec>
The same technique may be used to replace or extend the valList supplied as part of any attribute in the TEI scheme.
Depending on the modification, the set of documents matched by a schema generated from an ODD modified in this way, may or may not be a subset of the set of documents matched by the unmodified schema. As such, it is difficult to tell in principle whether such modifications are intrinsically unclean.
23.3.1.5 Class Modification Class Modification¶
The concept of element classes was introduced in 1.3.2 Model Classes; an understanding of it is fundamental to successful use of the TEI scheme. As noted there, we distinguish model classes, the members of which all have structural similarity, from attribute classes, the members of which simply share a set of attributes.
The part of an element specification which determines its class membership is an element called classes. All classes to which the element belongs must be specified within this, using a memberOf element for each.
ident="eg"
module="tagdocs"
mode="change"
ns="http://www.example.com/ns/notTEI">
<classes mode="change">
<memberOf key="att.typed"/>
</classes>
</elementSpec>
<classes mode="change">
<memberOf key="att.declaring" mode="delete"/>
</classes>
</elementSpec>
ident="term"
module="core"
mode="change"
ns="http://www.example.com/ns/notTEI">
<classes mode="replace">
<memberOf key="att.interpLike"/>
</classes>
</elementSpec>
If however the mode attribute is set to change, the implication is that the memberships indicated by its child memberOf elements are to be combined with the existing memberships for the element.
<attList>
<attDef ident="rend" mode="delete"/>
</attList>
</classSpec>
The classes used in the TEI scheme are further discussed in chapter 1 The TEI Infrastructure. Note in particular that classes are themselves classified: the attributes inherited by a member of attribute class A may come to it directly from that class, or from another class of which A is itself a member. For example, the class att.global is itself a member of the classes att.global.linking and att.global.analytic. By default, these two classes are predefined as empty. However, if (for example) the linking module is included in a schema, a number of attributes (corresp, sameAs, etc.) are defined as members of the att.global.linking class. All elements which are members of att.global will then inherit these new attributes (see further section 1.3.1 Attribute Classes). A new attribute may thus be added to the global class in two ways: either by adding it to the attList defined within the class specification for att.global; or by defining a new attribute class, and changing the class membership of the att.global class to reference it.
Such global changes should be undertaken with caution: in general removing existing non-mandatory attributes from a class will always be a clean modification, in the same way as removing non-mandatory elements. Adding a new attribute to a class however can be a clean modification only if the new attribute is labelled as belonging to some namespace other than the TEI.
The same mechanisms are available for modification of model classes. Care should be taken when modifying the model class membership of existing elements since model class membership is what determines the content model of most elements in the TEI scheme, and a small change may have unintended consequences.
23.3.1.6 Addition of New Elements Addition of New Elements¶
To add a completely new element into a schema involves providing a complete element specification for it, the classes element of which includes a reference to at least one TEI model class. Without such a reference, the new element will not be referenced by the content model of any other TEI element, and will therefore be inaccessible within a TEI document.
ident="myBibl"
mode="add"
ns="http://www.example.com/ns/notTEI">
<classes>
<memberOf key="model.biblLike"/>
</classes>
<!-- other parts of the new declaration here -->
</elementSpec>
23.3.2 Modification and Namespaces Modification and Namespaces¶
All the elements defined by the TEI scheme are labelled as belonging to a single namespace, maintained by the TEI and with the URI http://www.tei-c.org/ns/1.0.87 Only elements which are unmodified or which have undergone a clean modification may use this namespace. In a TEI-conformant document, it is assumed that all attributes not explicitly labelled with a namespace (such as, for example xml:id) also belong to the TEI namespace, and are defined by the TEI.
This implies that any other modification (including a renaming or reversible modification) must either specify a different namespace or specify no namespace at all. The ns attribute is provided on elements schemaSpec, elementSpec, and attDef for this purpose.
<attList>
<attDef ident="topic" mode="add" ns="http://www.example.org/ns/nonTEI">
<desc>indicates the topic of a TEI paragraph</desc>
<datatype>
<!-- ... -->
</datatype>
</attDef>
</attList>
</elementSpec>
