![]() |
Text Encoding Initiative |
The XML Version of the TEI Guidelines14 Linking, Segmentation, and Alignment |
Up: Contents Previous: 13 Terminological Databases Next: 15 Simple Analytic Mechanisms
|
14 Linking, Segmentation, and Alignment 14.3 Blocks, Segments and Anchors 14.4 Correspondence and Alignment 14.6 Identical Elements and Virtual Copies 14.9 Connecting Analytic and Textual Markup Introductory Note (March 2002) 2 A Gentle Introduction to XML 3 Structure of the TEI Document Type Definition 4 Languages and Character Sets 6 Elements Available in All TEI Documents 14 Linking, Segmentation, and Alignment 17 Certainty and Responsibility 18 Transcription of Primary Sources 21 Graphs, Networks, and Trees 22 Tables, Formulae, and Graphics 29 Modifying and Customizing the TEI DTD 32 Algorithm for Recognizing Canonical References 38 Sample Tag Set Documentation 39 Formal Grammar for the TEI-Interchange-Format Subset of SGML |
This chapter discusses a number of ways in which encoders may represent analyses of the structure of a text which are not necessarily linear or hierarchic. In this chapter, tag sets and global attributes are provided for the following common requirements:
These facilities all use the same basic set of techniques, which depend on the ability to point to an element which has some form of identifier. The most convenient such identifier, and that which is recommended by these Guidelines wherever possible, is provided by the global id attribute, as defined in section 3.5 Global Attributes. An extension to this mechanism is provided, for elements which are located in different documents, or to which identifiers cannot be attached (perhaps because they are held on read-only media), known as the TEI extended pointer mechanism in section 14.2 Extended Pointers. For many of the topics discussed in this chapter, a choice of methods of encoding is offered, ranging from simple but less general ones, which use attribute values only, to more elaborate and more general ones, which use specialized elements. The following DTD fragments show the overall organization of the additional tag set discussed in the remainder of this chapter. The file teilink2.ent begins by declaring a set of additional attributes available globally when this tag set is enabled. This is followed by declarations for the attribute classes pointer and pointerGroup to which most of the elements discussed in this chapter belong; these attributes are all further described in the remainder of the chapter. <!-- 14.: Global attributes for the TEI.linking tag set-->
<!--
** Copyright 2004 TEI Consortium.
** See the main DTD fragment 'tei2.dtd' or the file 'COPYING' for the
** complete copyright notice.
-->
<!--When tag set TEI.linking is used, the following attributes
may be attached to any element:-->
<!ENTITY % a.linking '
corresp IDREFS #IMPLIED
synch IDREFS #IMPLIED
sameAs IDREF #IMPLIED
copyOf IDREF #IMPLIED
next IDREF #IMPLIED
prev IDREF #IMPLIED
exclude IDREFS #IMPLIED
select IDREFS #IMPLIED'>
<!--The following attributes apply to all pointer
elements:-->
<!ENTITY % a.pointer '
type CDATA #IMPLIED
resp CDATA #IMPLIED
crdate %ISO-date; #IMPLIED
targType CDATA #IMPLIED
targOrder (Y | N | U) "U"
evaluate ( all | one | none ) #IMPLIED'>
<!--The following attributes apply to all pointer group
elements:-->
<!ENTITY % a.pointerGroup '
%a.pointer;
domains IDREFS #IMPLIED
targFunc NMTOKENS #IMPLIED'>
<!-- end of 14.-->
The element declarations for this tag set are contained in the file teilink2.dtd: <!-- 14.: Linking, Segmentation and Alignment--> <!-- ** Copyright 2004 TEI Consortium. ** See the main DTD fragment 'tei2.dtd' or the file 'COPYING' for the ** complete copyright notice. --> [declarations from 14.1.3: Links inserted here ] [declarations from 14.2.1: Extended pointers inserted here ] [declarations from 14.3: Blocks, Segments and Anchors inserted here ] [declarations from 14.5.2: Temporal specification inserted here ] [declarations from 14.7: Aggregation inserted here ] [declarations from 14.8: Alternation inserted here ] <!-- end of 14.--> This tag set is made available by the mechanisms described in section 3.3 Invocation of the TEI DTD; this implies that the document type subset for a document using any of the tags or attributes described in this chapter must define a parameter entity TEI.linking with the value INCLUDE. For example, a document using this additional tag set and the prose base would begin with a series of declarations like the following: <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN" "tei2.dtd" [
<!ENTITY % TEI.XML 'INCLUDE'>
<!ENTITY % TEI.prose 'INCLUDE'>
<!ENTITY % TEI.linking 'INCLUDE'>
]>
14.1 PointersWe say that one element points to others if the first has an attribute whose value is a reference to the others: such an element is called a pointer element, or simply a pointer. Among the pointers that have been introduced up to this point in these Guidelines are <note>, <ref> and <ptr>. These elements all indicate an association between one place in the document (the location of the pointer itself) and one or more others (the elements whose identifiers are specified by the pointer's target attribute). This element set defines a variation on this basic kind of pointer, known as a link which specifies both `ends' of an association. In addition, we define a syntax for representing locations in a document by a variety of means not dependent on the use of id attributes. 14.1.1 Pointers and LinksIn section 6.6 Simple Links and Cross References we introduced the simplest pointer elements, <ptr> and <ref>. Here we introduce additionally the <link> element, which represents an association between two (or more) locations by specifying each location explicitly. Its own location is irrelevant to the intended linkage.
As members of the class pointer, these elements share a common set of attributes: The targType and targOrder attributes may be used to constrain the scope of a link to certain element types. For example: <link type="echo" targets="p1 p2"/>This is a complete unconstrained link, of type echo. It assumes only that there is an element with identifier p1 and another with identifier p2 somewhere in the current document. <link type="echo" targType="p seg note" targets="p1 p2"/>This is a slightly more constrained link of the same type. p1 and p2 must now both identify a <p>, a <seg>, or a <note>, but there is no requirement as to which is which. (This may be useful if, as is often the case, different elements may participate in the same kind of link.) <link type="echo" targType="p note" targOrder="Y" targets="p1 p2"/>In this variation, not only must the link targets be either <p> or <note> elements, but the one with identifier p1 must be a <p>, and that with identifier p2 must be a <note>. Note that the present Guidelines provide no direct way of saying that p1 may identify either a <seg> or a <p> and p2 must identify a <note>. These attributes are most useful if applied to a group of links, when additional constraints may also be specified, as further discussed in section 14.1.3 Groups of Links below. Double connection among elements could also be expressed by a combination of pointer elements, for example, two <ptr> elements, or one <ptr> element and one <note> element. All that is required is that the value of the target (or other pointing) attribute of the one be the value of the id attribute of the other. What the <link> element accomplishes is the handling of double connection by means of a single element. Thus, in the following encoding: <ptr id="p1" target="p2"/> ... <ptr id="p2" target="p1"/>p1 points to p2, and p2 points to p1. This is logically equivalent to the more compact encoding: <link targets="p1 p2"/> As noted above, all elements pointed to or linked by these elements must be identifiable using the global id attribute. This implies that they must be present in the same document, and that they must bear unique id values. Pointing or linking to external documents and pointing or linking where identifiers are not available is implemented by the external pointing mechanisms discussed in section 14.2 Extended Pointers, where the <xptr> and <xref> elements are discussed. External links and links involving elements without identifiers do not require a special element; they may be represented using the standard <link> element, but an intermediate <xptr> element must be provided within the current document, to bear the id attribute used in the target of the link. 14.1.2 Using Pointers and Links
As an example of the use of these mechanisms which establish
connections among elements, consider the practice (common in 18th
century English verse and elsewhere) of providing footnotes citing
parallel passages from classical authors.
<l>(Diff'rent our parties, but with equal grace</l>
<l>The Goddess smiles on Whig and Tory race,</l>
<l><note type="imitation" place="foot" anchored="no">
<bibl>Virg. Æn. 10.</bibl>
<quote>
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem.</l>
</quote>
</note>'Tis the same rope at sev'ral ends they twist,</l>
<l>To Dulness, Ridpath is as dear as Mist)</l>
This use of the <note> element can be called implicit pointing (or implicit linking). It relies on the juxtaposition of the note to the text being commented on for the connection to be understood. If it is felt that the mere juxtaposition of the note to the text does not make it sufficiently clear exactly what text segment is being commented on (for example, is it the immediately preceding line, or the immediately preceding two lines, or what?), or if it is decided to place the note at some distance from the text, then the pointing or the linking must be made explicit. We now consider various methods for doing that. Firstly, a <ptr> element might be placed at an appropriate point within the text to link it with the annotation: <l>(Diff'rent our parties, but with equal grace</l>
<l>The Goddess smiles on Whig and Tory race,
<ptr rend="unmarked" target="n3.284"/></l>
<l>'Tis the same rope at sev'ral ends they twist,</l>
<l>To Dulness, Ridpath is as dear as Mist)</l>
<!-- elsewhere in the document ... -->
<note id="n3.284" type="imitation" place="foot" anchored="no">
<bibl>Virg. Æn. 10.</bibl>
<quote>
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem.</l>
</quote>
</note>
The <note> element has been given an arbitrary identifier
(n3.284) to enable it to be specified as the
target of the pointer element. Because there is nothing in the text
to signal the existence of the annotation, the rend
attribute has been given the value unmarked.
Secondly, the target attribute of the <note> element can be used to point at its associated text, provided that an id attribute has been supplied for the associated text. Since, in this case, the note itself contains a pointer to the place in the text which it is annotating, this has also been encoded, using a <ref> element, which bears a target attribute of its own and contains a (slightly misquoted) extract from the text marked as a <quote> element: <l id="l3.283">(Diff'rent our parties, but with equal grace</l>
<l id="l3.284">The Goddess smiles on Whig and Tory race,</l>
<l id="l3.285">'Tis the same rope at sev'ral ends they twist,</l>
<l id="l3.286">To Dulness, Ridpath is as dear as Mist)</l>
<!-- elsewhere... -->
<note type="imitation" place="foot" anchored="no" target="l3.284">
<ref rend="sc" target="l3.284">Verse 283–84.
<quote>
<l>——. With equal grace</l>
<l>Our Goddess smiles on Whig and Tory race.</l>
</quote>
</ref>
<bibl>Virg. Æn. 10.</bibl>
<quote>
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem. </l>
</quote>
</note>
Combining these two approaches gives us the following associations:
Note that we do not have any way of pointing from the line itself to the note: the association is implied by containment of the pointer. We do not as yet have a true double link between text and note. Thirdly, therefore, we supply identifiers for both verse line and annotation, and use a <link> element to associate the two. Note that the <ptr> element and the target attribute on the <note> may now be dispensed with: <l id="l3.283">(Diff'rent our parties, but with equal grace</l>
<l id="l3.284">The Goddess smiles on Whig and Tory race,</l>
<l id="l3.285">'Tis the same rope at sev'ral ends they twist,</l>
<l id="l3.286">To Dulness, Ridpath is as dear as Mist)</l>
<!-- elsewhere in the document ... -->
<note id="n3.284" type="imitation" place="foot" anchored="no">
<ref rend="sc" target="l3.284">Verse 283–84.
<quote>
<l>——. With equal grace</l>
<l>Our Goddess smiles on Whig and Tory race.</l>
</quote></ref>
<bibl>Virg. Æn. 10.</bibl>
<quote>
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem. </l>
</quote>
</note>
<!-- ... and yet elsewhere in the document ... -->
<link targType="note l" targOrder="Y" targets="n3.284 l3.284"/>
The targets attribute of the <link> element here bears the identifiers of the note followed by that of the verse line. The targType and targOrder attributes may be used to enable application programs to check that the identifiers in fact pick out a <note> element and an <l> element and in that order. If targOrder has the value N, then the elements indicated by the targets attribute have to be either <note> or <l> elements, but are otherwise unconstrained. If neither attribute is present, then the only constraint is that the identifiers given must apply to some element within the current document. For completeness, we could also allocate an identifier to the reference within the note and encode the association between it and the verse line in the same way: <!-- ... --> <note id="n3.284" type="imitation" place="foot" anchored="no"> <ref id="r3.284" rend="sc"> <!-- ... --> </ref> </note> <!-- ... --> <link targType="ref l" targOrder="Y" targets="r3.284 l3.284"/>Indeed, the two <link>s could be combined into one, as follows: <link targType="note ref l" targOrder="Y" targets="n3.284 r3.284 l3.284"/> 14.1.3 Groups of LinksClearly, there are many reasons for which an encoder might wish to represent a link or association between different elements. For some of them, specific elements are provided in these Guidelines; some of these are discussed elsewhere in the present chapter. The <link> element is a general purpose element which may be used for any kind of association. The element <linkGrp> may be used to group links of a particular type together in a single part of the document; such a collection may be used to represent what is sometimes referred to in the literature of Hypertext as a web, a term introduced by the Brown University FRESS project in 1969.
The <linkGrp> element provides a convenient way of establishing a default for the type attribute on a group of links of the same type: by default, the type attribute on a <link> element has the same value as that given for type on the enclosing <linkGrp>. Typical software might hide a web entirely from the user, but use it as a source of information about links, which are displayed independently at their referenced locations. Alternatively, software might provide a direct view of the link collection, along with added functions for manipulating the collection, as by filtering, sorting, and so on. To continue our previous example, this text contains many other notes, of a kind similar to the one shown above. To avoid having to repeat the type="imitation" on each <note>, we may specify it once for all on a <linkGrp> element containing all links of this type. The targType and targOrder attributes can also be specified for a <linkGrp> element: <l id="l2.79">A place there is, betwixt earth, air and seas</l>
<l id="l2.80">Where from Ambrosia, Jove retires for ease.</l>
<!-- ... -->
<l id="l2.88">Sign'd with that Ichor which from Gods distills.</l>
<!-- ... -->
<l id="l3.283">(Diff'rent our parties, but with equal grace</l>
<l id="l3.284">The Goddess smiles on Whig and Tory race,</l>
<l id="l3.285">'Tis the same rope at sev'ral ends they twist,</l>
<l id="l3.286">To Dulness, Ridpath is as dear as Mist)</l>
<!-- ... -->
<!-- elsewhere in the document ... -->
<note id="n2.79" place="foot" anchored="no">
<bibl>Ovid Met. 12.</bibl>
<quote lang="la">
<l>Orbe locus media est, inter terrasq; fretumq;</l>
<l>Cœlestesq; plagas —</l>
</quote>
</note>
<note id="n2.88" place="foot" anchored="no">
Alludes to <bibl>Homer, Iliad 5</bibl> ...
</note>
<!-- ... -->
<note id="n3.284" place="foot" anchored="no">
<bibl>Virg. Æn. 10.</bibl>
<quote>
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem.</l>
</quote>
</note>
<!-- ... -->
<!-- yet elsewhere in the document ... -->
<linkGrp type="imitation" targType="note l" targOrder="Y">
<link targets="n2.79 l2.79"/>
<link targets="n2.88 l2.88"/>
<!-- ... -->
<link targets="n3.284 l3.284"/>
<!-- ... -->
</linkGrp>
Additional information for applications that use <linkGrp> elements can be provided by means of special attributes. First, the domains attribute can be used to identify the text elements within which the individual targets of the links are to be found. Suppose that the text under discussion is organized into a <body> element, containing the text of the poem, and a <back> element containing the notes. Then the domains attribute can have as its value the identifiers of the <body> and the <back>, to enable an application to verify that the link targets are in fact contained by appropriate elements, or to limit its search space: <body id="dunciad">
<!-- ... -->
<l id="l2.79">A place there is, betwixt earth, air and seas</l>
<l id="l2.80">Where from Ambrosia, Jove retires for ease.</l>
<!-- ... -->
<l id="l2.88">Sign'd with that Ichor which from Gods distills.</l>
<!-- ... -->
<l id="l3.283">(Diff'rent our parties, but with equal grace</l>
<l id="l3.284">The Goddess smiles on Whig and Tory race,</l>
<l id="l3.285">'Tis the same rope at sev'ral ends they twist,</l>
<l id="l3.286">To Dulness, Ridpath is as dear as Mist)</l>
<!-- ... -->
</body>
<back>
<div id="dunnotes" type="Notes">
<head>Notes to the Dunciad</head>
<!-- ... -->
<note id="n2.79" place="foot" anchored="no">
<bibl>Ovid Met. 12.</bibl>
<quote lang="la">
<l>Orbe locus media est, inter terrasq; fretumq;</l>
<l>Cœlestesq; plagas —</l>
</quote>
</note>
<note id="n2.88" place="foot" anchored="no">
Alludes to <bibl>Homer, Iliad 5</bibl> ...
</note>
<!-- ... -->
<note id="n3.284" place="foot" anchored="no">
<bibl>Virg. Æn. 10.</bibl>
<quote>
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem.</l>
</quote>
</note>
<!-- ... -->
</div>
</back>
<!-- elsewhere in the document ... -->
<linkGrp type="imitation" targType="note l" targOrder="Y" domains="dunciad dunnotes">
<link targets="n2.79 l2.79"/>
<link targets="n2.88 l2.88"/>
<!-- ... -->
<link targets="n3.284 l3.284"/>
<!-- ... -->
</linkGrp>
Note that there must be a single parent element for each `domain'; if some notes are contained by a section with identifier dunnotes, and others by a section with identifier dunimits, an intermediate pointer must be provided (as described in section 14.1.4 Intermediate Pointers) within the <linkGrp> and its identifier used instead. Next, the targFunc attribute can be used to provide further information about the role or function of the various targets specified for each link in the group. The value of the targFunc attribute is a list of names (formally, name tokens), one for each of the targets in the link; these names can be chosen freely by the encoder, but their significance should be documented in the encoding declaration in the header.111 In the current example, we might think of the note as containing the source of the imitation and the verse line as containing the goal of the imitation. Accordingly, we can specify the <linkGrp> in the preceding example thus: <linkGrp type="imitation" targType="note l" targOrder="Y"
domains="dunciad dunnotes" targFunc="source goal">
<link targets="n2.79 l2.79"/>
<link targets="n2.88 l2.88"/>
<!-- ... -->
<link targets="n3.284 l3.284"/>
<!-- ... -->
</linkGrp>
The <link> and <linkGrp> elements are formally defined as follows: <!-- 14.1.3: Links-->
<!ELEMENT link %om.RO; EMPTY >
<!ATTLIST link
%a.global;
%a.pointer;
targets IDREFS #REQUIRED
TEIform CDATA 'link' >
<!ELEMENT linkGrp %om.RR; (link | ptr | xptr)+ >
<!ATTLIST linkGrp
%a.global;
%a.pointerGroup;
TEIform CDATA 'linkGrp' >
<!-- end of 14.1.3-->
14.1.4 Intermediate PointersIn the preceding examples, we have shown various ways of linking an annotation and a single verse line. However, the example cited in fact requires us to encode an association between the note and a pair of verse lines (lines 284 and 285). There are a number of possible ways of correcting this error: one could use the target and targetEnd attributes of the <note> element to delimit the span to which the note applies (see further section 6.8 Notes, Annotation, and Indexing). Alternatively one could create an element to encode the couplet itself and assign it an id attribute, which can then be linked to the <note> and <ref> elements. This could be done either explicitly by means of an <lg> element, as defined in section 6.11.1 Core Tags for Verse, or a <seg> element, as defined in section 14.3 Blocks, Segments and Anchors, or implicitly, by means of the <join> element discussed in section 14.7 Aggregation. A third possibility however, is to use an `intermediate pointer' as follows: <l id="l3.283">(Diff'rent our parties, but with equal grace</l> <l id="l3.284">The Goddess smiles on Whig and Tory race,</l> <!-- ... --> <ptr id="l3.283284" targOrder="Y" target="l3.283 l3.284"/>When the target attribute of a <ptr> or <ref> element specifies more than one element, the indicated elements are intended to be combined or aggregated in some way to produce the object of the pointer. (Such aggregation is however the task of a processing application, and cannot be defined simply by the mark-up). In this example, the targOrder attribute should be specified to indicate that the order in which identifier values are supplied in the target attribute is significant. The id attribute provides an identifier which can then be linked to the <note> and <ref> elements: <link targType="note ref ptr" evaluate="all" targets="n3.284 r3.284 l3.283284"/> The evaluate="all" attribute value is used on the <link> element to specify that any pointer encountered as a target of that element is itself evaluated. If evaluate had the value none, the link target would be the pointer itself, rather than the objects it points to. Where a <linkGrp> element is used to group a collection of <link> elements, any intermediate pointer elements used by those <link> elements should be included within the <linkGrp>. Intermediate pointers of this kind are particularly important when extended pointers (discussed in the next section) are in use. 14.2 Extended PointersWhere the object of a link or pointer element is not contained within the current document, or where it does not bear an id attribute, it is not possible to point at it with a <ptr> or <ref> element, nor to link it directly with a <link> element, because no IDREF value can be supplied for the target or targets attribute of these elements. In such cases, the encoder must indicate the intended element indirectly by means of the elements discussed in this section. These elements identify their target using a special TEI-defined extended pointer notation, defined in section 14.2.2 Extended Pointer Syntax below. This notation was originally designed for compatibility with an ISO standard called HyTime,112 and also informed the design of the later W3C XPath and XPointer specifications. The W3C has since adopted as a Recommendation the XML Path Language, (http://www.w3.org/TR/xpath) which defines a language for addressing parts of an XML Document, and as a Candidate Recommendation the XPointer language which extends that language in a number of ways (see http://www.w3.org/TR/xptr). A later revision of these Guidelines will review and revise the recommendations of this chapter in light of the close overlap between the facilities provided by the TEI Extended Pointer Syntax and these two W3C proposals. The most widespread application of such external document linking is, of course, provided by the World Wide Web. The original version of these Guidelines did not provide specific guidance concerning the representation in TEI of the subset of linking facilities provided by HTML, since the Guidelines predate the widespread adoption of HTML. For the present edition, a brief note on recommended ways of providing this capability in TEI documents has been added below (14.2.4 Representation of HTML links in TEI). 14.2.1 Extended Pointer ElementsTo point or refer to locations in the current or some other document without requiring that the target bear an identifier, the following elements should be used:
Unlike the pointer elements discussed in the previous section, these elements do not specify their target by means of a target attribute. Instead these elements use one or both of the attributes from and to to delimit a portion of some document specified by the doc attribute. In all other respects, these elements correspond with the elements <ptr> and <ref> discussed in sections 6.6 Simple Links and Cross References, and 14.1 Pointers. Note that there is no element <xlink> corresponding with the <link> element; links can be made both within and between documents using the same syntax, as further discussed below. The values of the from and to attributes on the <xptr> and <xref> elements indicate the point or passage being referred to by showing how to locate it, using one or more special keywords, as defined below in section 14.2.2 Extended Pointer Syntax. Examples are given there. The <xptr> and <xref> elements are formally defined as follows: <!-- 14.2.1: Extended pointers-->
<!ELEMENT xref %om.RO; %paraContent;>
<!ATTLIST xref
%a.global;
%a.xPointer;
TEIform CDATA 'xref' >
<!ELEMENT xptr %om.RO; EMPTY>
<!ATTLIST xptr
%a.global;
%a.xPointer;
TEIform CDATA 'xptr' >
<!-- end of 14.2.1-->
14.2.2 Extended Pointer SyntaxAs noted above, the elements <xptr> and <xref> are used to represent a link between their own location (the `link origin') and some other location (the `destination'), which may or may not be in the same document. Software supporting intra- and inter-document links (e.g. hypertext systems) should provide access from the location of such an element to the destination. This section defines the allowable values for the attributes from, to, and doc of the <xptr> and <xref> elements. An <xptr> or <xref> element with no attributes at all is, by definition, a link to the root element of the document indicated (i.e. by default, the <TEI.2> element). The doc attribute value must be the name of an entity declared in the document type declaration. If only the doc attribute is given a value, then by definition the destination is the entire entity named by the doc value. A more specific location within another entity must be specified with the from and the to attributes, as described below. The from and the to attributes indicate the specific location pointed at, within the entity named by the doc attribute (or within the current document, if no doc value is given). Their values are referred to below as location pointer specifications. When both attributes are specified, the span pointed at by the element runs from the starting point of the span indicated by from to the ending point of the string specified by to. If the latter precedes the former in the document, then the pointer is in error and fails. If only the from attribute is specified, the to attribute defaults to the same value; the effect is that the element as a whole points to the span indicated by the from attribute. It is a semantic error to specify a value for to but not for from. 14.2.2.1 Location LaddersEach location pointer specification consists of a sequence of location terms, each of which consists of a keyword specifying a location type followed by one or more parenthesized parameter lists, each of which specifies a location value via a list of parameters. Location types and values, and the parameters within a location value, must be separated by white space characters. Using terms borrowed from HyTime, we say that each TEI location term in a specification provides the location source for the next, and the entire specification is equivalent to a location ladder. By specifying the entire ladder in a single attribute value, the TEI extended pointer mechanism greatly reduces the syntactic and processing complexity of hypertextual pointers. In formal terms:113 ladder ::= locterm
| ladder locterm
14.2.2.2 Location TermsThe keywords used in location terms are these; references to ‘the tree’ mean the tree representing the document hierarchy.114
locterm ::= 'ROOT' // default first location
| 'HERE' // location of the xptr
| 'ID' '(' NAME ')' // only one ID allowed.
| 'REF' '(' characters ')' // only one ref allowed
| 'CHILD' steps
| 'DESCENDANT' steps
| 'ANCESTOR' steps
| 'PREVIOUS' steps
| 'NEXT' steps
| 'PRECEDING' steps
| 'FOLLOWING' steps
| 'PATTERN' regs // mult patterns allowed
| 'TOKEN' '(' range ')'
| 'STR' '(' range ')'
| 'SPACE' '(' NAME ')' pointpair
| 'FOREIGN' parms
| 'HYQ' parms
| 'DITTO' // valid only in TO att.
Note that the keywords, though shown here quoted in uppercase, are not
case sensitive.
Each location term specifies a location in the target document; this location may be a single point, more often a span of text (often the span of a single element) within the target document. The location ladder as a whole is interpreted from left to right, and each location term specifies a location relative to the location specified by the sequence prior to that point (i.e. to its location source). Unless here or id is specified as the first location term, the beginning location source is always root. An empty location sequence thus is the same as root and specifies the entire destination entity. In general, the search for the location specified by a location term will be conducted only within its location source (i.e. within the location already identified by preceding location terms). There are however several exceptions. The terms root, here, and id all ignore the location source defined by any preceding terms and therefore make sense only as the first items in the ladder. The terms ancestor, next, and previous do not ignore the location source, but select a new span from the adjacent or enclosing portions of the text, and not from within the location source. Finally the location terms foreign, space, and HyQ are not defined fully here; they may or may not ignore the existing location source. Some of the location terms make sense only in hierarchical documents; these are id, child, ancestor, descendant, previous, next, preceding, and following. The latter six involve traversing the tree representing the document hierarchy and are most easily understood when their location source is a single element. If the location source is not a single element, the tree-traversal keywords operate upon its beginning end-point, its `front end' (in English, this will be the leftmost point of the location source; in Arabic or Hebrew it will be the rightmost point). In this case child and descendant have no meaning, since character data has no descendants in the document tree; the first ancestor of such a location source is the element immediately containing the character data in question, and the siblings referred to by next and previous are the other children of that immediately containing element. The details of each keyword are given below, along with definitions of their syntax and semantics of their results. Examples are also provided. It is strongly recommended that when IDs are available, they should be used in preference to the other methods for pointing defined here. For all keywords, the description assumes that the target document does in fact contain a span or element which matches the description; otherwise, the location term has no referent and is said to `fail'. If any location term fails, the entire pointer fails. No backtracking or retrying is performed (and indeed for the most part the location terms are defined as having only one matching location, so backtracking would in most cases lead to no better result). 14.2.2.3 The ROOT KeywordThe location term root selects the root element of the destination document tree; in SGML terms, this is the `document element'.115 Since it ignores any existing location source, the root keyword makes sense only as the first location term in the ladder. Since root is assumed as the implicit first term in any ladder, the following two location ladders have the same meaning: ROOT DESCENDANT (2 div1) DESCENDANT (2 div1) 14.2.2.4 The HERE KeywordThe keyword here designates the location at which the pointer element itself is situated; it allows extended pointers to select items like ‘the paragraph immediately preceding the one within which this pointer occurs’. Since it ignores any existing location source, this keyword typically makes sense only as the first location term in a location specification. To designate ‘the paragraph preceding the current one’, the following location ladder could be used: HERE ANCESTOR (1 p) PREVIOUS (1 p)(See below for descriptions of the keywords ancestor and previous.) 14.2.2.5 The ID KeywordThe resulting location is the element within the destination entity whose ID attribute has the value specified as the location value. The ID location type typically makes sense only as the first location pair in a location specification, but there is no syntactic requirement that it be so. For example, the location specification ID (a27)chooses the necessarily unique element of the destination entity which has an attribute of declared value of type ID, whose value is a27. 14.2.2.6 The REF KeywordThe resulting location is an element which can be found by interpreting the location value in accordance with document-specific rules for a canonical reference. Such reference systems, particularly common in documents of interest to classical and biblical scholars, must also be defined in the TEI header, using the <refsDecl> element (see section 5.3.5 The Reference System Declaration). If more than one element matches the canonical reference, the first one encountered is chosen. For example, the location specification ref (MT.2.1)chooses the first element of the destination entity which is identified by the canonical reference ‘MT.2.1’ 14.2.2.7 The CHILD KeywordThe child location type specifies an element or span of character data in the document hierarchy using a location value which functions as a domain-style address. The value is a series of parenthesized steps, separated by white space. Each such step represents one level of the hierarchy within the location source. Each step may contain one or more parameters separated by white space and interpreted in order as follows:
In formal terms, the location value of child is a series of steps: steps ::= '(' step ')'
| steps '(' step ')'
step ::= instance
| instance element
| instance element avspecs
avspecs ::= attribute value
| avspecs attribute value
Location values of the same form are also used by the keywords descendant, ancestor, previous, and next; details of the interpretation may vary from keyword to keyword. If an instance indicator alone is specified, as a number n, it selects the nth child of the location source. If the special value ALL is given, then all the children of the location source are selected. If the instance indicator is specified with following parameters, it selects all, or the nth, among those children of the location source which satisfy the other parameters. If a negative number is given, the nth child is counted from the last child of the location source to the first. The location source must contain at least n children;116 if it does not, the child term fails. In formal terms, the first parameter of a step is an instance indicator, which in turn is either the special value ALL or a signed integer:
instance ::= 'ALL'
| signed
signed ::= NUMBER // default sign is +
| '+' NUMBER
| '-' NUMBER
If a second parameter is given, it is interpreted as a generic identifier, and only elements of the type indicated will be selected. For example, the location specification CHILD (3 div1) (4 div2) (29 p)chooses the 29th paragraph of the fourth sub-division of the third major division of the initial location source. The location specification CHILD (3 div1) (4 div2) (-2 p)chooses the next-to-last paragraph of the fourth <div2> of the third <div1> in the location source. Constraint by generic identifier is strongly recommended, because it makes links more perspicuous and more robust. It is perspicuous because humans typically refer to things by type: as ‘the second section’, ‘the third paragraph’, etc. It is robust because it increases the chance of detecting breakage if (due to document editing) the target originally pointed at no longer exists. The generic identifier may be specified as a literal name, as a (parenthesized) regular expression, or using the reserved values #CDATA or *. Regular expressions take the form described below; the location term CHILD (3 (div[123]))matches the third element which has a generic identifier of div1, div2, or div3. If the generic identifier is specified as *, any generic identifier is matched; this means that ‘CHILD (2 *)’ is synonymous with CHILD (2). If the second parameter is #CDATA, the location term selects only untagged sub-portions of an element having mixed content (a mixture of sub-elements and text portions). The location ladder CHILD (3 #CDATA)thus chooses the third span of character data directly contained by the current location source. If the location source is a paragraph containing
where the three sentences A, B, and C are character data enclosed by no element smaller than the paragraph itself, then CHILD (3 #CDATA) selects sentence C, while CHILD (3) selects sentence B. If specified as a name (i.e. without parentheses), the generic identifier is case sensitive if and only if the SGML declaration specifies that generic identifiers are case sensitive (in XML they are always case sensitive; in SGML by default they are not). If specified as a regular expression, the expression given is always case sensitive. In formal terms the second parameter of a step is defined thus:
element ::= NAME
| '#CDATA'
| '*'
| '(' regular ')'
The third and fourth parameters, if given, are interpreted as an attribute-value pair, and only elements which match that pair in the way described below will be selected; the fourth and fifth parameters, and all following pairs of parameters, are interpreted in the same way. When more than one pair is given, all must be matched. The third, fifth, seventh, etc., parameters are interpreted, if specified, as attribute names. Like generic identifiers, attribute names may be specified as * in location ladders in the (unlikely) event that an attribute value constitutes a constraint regardless of what attribute name it is a value for. The attribute name parameter may also be specified as a parenthesized regular expression. For example, the location term CHILD (1 * target *)selects the first child of the location source for which the attribute target has a value. The location term CHILD (1 * (target(s?)) *)will select the first child of the location source for which an attribute called either target or targets has a value. As with generic identifiers, attribute names are case sensitive if and only if the SGML declaration says they are (in XML they are always case sensitive; in SGML by default they are not); regular expressions are always case sensitive, as shown here. In formal terms, the attribute-name parameter of a tree-traversal step is defined thus:
attribute ::= NAME
| '*'
| '(' regular ')'
If a fourth, sixth, eighth, etc., parameter is specified, it is interpreted as an attribute value, and only elements satisfying the other constraints and also bearing an attribute of the specified name and value will be selected. The attribute value may be specified exactly as in an SGML document: if the attribute value to be specified contains non-name characters, it must be enclosed in quotation marks. The attribute value may also be specified as a regular expression, enclosed in parentheses, or using the two special values #IMPLIED and *. For example, the location specification CHILD (1 * n 2) (1 * n 1)chooses an element using the global n attribute. Beginning at the location source, the first child (whatever kind of element it is) with an n attribute having the value 2 is chosen; then that element's first direct sub-element having the value 1 for the same attribute is chosen. CHILD (1 fs resp ((lanc|LANC)(s|S|ashire|ASHIRE)))selects the first child of the location source which is an <fs> element bearing a resp attribute with the value lancs, lancashire, LANCS, or LANCASHIRE (as well as other possible combinations which are left to the reader's ingenuity). If specified with quotation marks or as a regular expression, the attribute-value parameter is case-sensitive; otherwise not. CHILD (1 fs resp #IMPLIED)selects the first child of the location source which is an <fs> element for which the resp attribute has been left unspecified. The location ladder ROOT DESCENDANT (1 (div[01234567]) type chapter n 2)selects the second chapter of a text, regardless of whether chapters are tagged using <div>, <div1>, <div2>, or some other text-division element. It does so by selecting the first text-division element in the document which is of type chapter and has the n value 2. In formal terms, the attribute-value parameter of a tree-traversal step is defined thus: value ::= LITERAL // i.e. quoted string.
| NAME // As for attribute values in
| NUMBER // document, NMTOKENs need not
| NUMTOKEN // be quoted
| '#IMPLIED' // No value specified, no default
| '*' // Any value matches.
| '(' regular ')'
14.2.2.8 The DESCENDANT KeywordIf the descendant keyword is used, the location term selects an element or character-data string which is a descendant of the current location source. Like child, descendant takes as a value a series of one or more parenthesized steps, which may contain the same four parameters described above. The set of elements and strings which may be selected, however, is the set of all descendants of the location source (i.e. the set of all elements contained by it), rather than only the set of immediate children. ID (a23) DESCENDANT (2 term lang de)thus selects the second <term> element with a lang of de occurring within the element with an id of a23. The search for matching elements occurs in document order; in terms of the document tree, this amounts to a depth-first left-to-right search. If the instance number is negative, the search is a depth-first right-to-left search, in which the right-most, deepest matching element is numbered -1, etc. The location specification DESCENDANT (-1 note)thus chooses the last <note> element in the document, that is, the one with the rightmost start-tag. 14.2.2.9 The ANCESTOR KeywordThe ancestor location term selects an element from among the direct ancestors of the location source in the document hierarchy. The location value is of the same form as defined for the child and descendant location types. However, the ancestor keyword selects elements from the list of containing elements or `ancestors' of the location source, counting upwards from the parent of the location source (which is ancestor number 1) to the root of the document instance (which is ancestor number -1). The location source must have at least as many ancestors as the absolute value of the instance number specified as the first parameter of the step. The ancestor type thus may not be specified as the first component of a location specification, because the initial location source in effect at that point is the root, which has no ancestors. For example, the location term ANCESTOR (1 * n 1) (1 div)first chooses the smallest element properly containing the location source and having attribute n with value 1; and then the smallest <div> element properly containing it. The location term ANCESTOR (1)chooses the immediate parent of the location source, regardless of its type or attributes. The location term ANCESTOR (1 * lang fr)selects the smallest ancestor for which the lang attribute has the value fr. The term ANCESTOR (-1 * lang fr)selects the largest ancestor for which the lang attribute has the value fr. Without the attribute specification, the term ANCESTOR (-1)selects the largest ancestor of the location source and is thus normally synonymous with the keyword ROOT. If the instance indicator is given as ALL, then all the ancestor elements which match the later parameters are selected; since the largest of these will necessarily include all the others, the value ALL is thus synonymous with the value (-1) when used with ANCESTOR. Finally, the term ANCESTOR (1 (div[0123456789]?))chooses the smallest <div> element of any level which contains the location source. 14.2.2.10 The PREVIOUS KeywordThe previous keyword selects an element or character-data string from among those which precede the location source within the same containing element. We speak of the elements and character-data strings contained by the same parent element as siblings; those which precede a given element or string in the document are its elder siblings; those which follow it are its younger siblings. The instance number in the location value of a previous term designates the nth elder sibling of the location source, counting from most recent to less recent. The location ladder id (a23) PREVIOUS (1)thus designates the element immediately preceding the element with an id of a23. Negative instance numbers also designate elder siblings, counting from the eldest sibling to the youngest. The location source must have at least as many elder siblings as the absolute value of the instance number. If the location source has at least one elder sibling, then the location term PREVIOUS (-1)designates its eldest sibling and is thus synonymous with the ladder ANCESTOR (1) CHILD (1)The value ALL may be used to select the entire range of elder siblings of an element: the location ladder ID (a23) PREVIOUS (ALL)thus designates the set of elements which precede the element with an id of a23 and are contained by the same parent. 14.2.2.11 The NEXT KeywordThe keyword next behaves like previous, but selects from the younger siblings of the location source, not the elder siblings. The location ladder ID (a23) NEXT (1)thus designates the element or string immediately following the element which has an id of a23. Negative instance numbers also designate younger siblings, counting from the youngest sibling to the location source. The location source must have at least as many younger siblings as the absolute value of the instance number. If the location source has at least one younger sibling, then the location term NEXT (-1)designates its youngest sibling and is thus synonymous with the ladder ANCESTOR (1) CHILD (-1) 14.2.2.12 The PRECEDING KeywordThe preceding keyword selects an element or character-data string from among those which precede the location source, without being limited to the same containing element. The set of elements and strings which may be selected is the set of all elements and strings in the entire document which occur or begin before the location source. (For purposes of the keywords PRECEDING and FOLLOWING, elements are interpreted as occurring where their start-tag occurs.) The PRECEDING keyword thus resembles PREVIOUS but differs in searching a larger set of strings and elements; its result is not guaranteed to be a subset of its location source. The instance number in the location value of a preceding term designates the nth element or character-data string preceding the location source, counting from most recent to less recent. The location ladder ID (a23) PRECEDING (5)thus designates the fifth element or string before the element with an id of a23. Negative instance numbers also designate preceding elements or strings, counting from the eldest to the youngest; the ladder ID (a23) PRECEDING (-5)thus selects the fifth element or string in the document overall, assuming that it precedes the element with an id of a23. It is thus normally synonymous with ROOT DESCENDANT (5)differing only in that it fails if four items or fewer precede element A23. The location source must have at least as many elder siblings as the absolute value of the instance number; otherwise, the preceding term fails. The value ALL may be used to select the entire portion of the document preceding the beginning of the location source: the location ladder ID (a23) PRECEDING (ALL)designates the entire portion of the document preceding the start-tag for element A23. 14.2.2.13 The FOLLOWING KeywordThe keyword following behaves like preceding, but selects from the portion of the document following the location source, not that preceding it. The location ladder ID (a23) FOLLOWING (1)thus designates the element or string immediately following the element which has an id of a23. Negative instance numbers select elements or strings counting from the end of the document to the location source. There must be at least as many elements or strings following the location source as the absolute value of the instance number. If the location source has at least one following element or string, then the location term FOLLOWING (-1)designates the youngest of these and is thus synonymous with the ladder ROOT DESCENDANT (-1) 14.2.2.14 The PATTERN KeywordThe pattern keyword selects the first place within the location source which matches a pattern-matching expression included as the location value. If more than one location matches that expression, there is no error, but the second and later matches are ignored. Matching is defined to be case-sensitive, i.e. ‘abc’ is not the same as ‘ABC’. The pattern is expressed as a regular expression in which the following characters have special meanings, similar to those of many Unix programs (such as grep) which handle regular expressions:
For example, the location specification PATTERN (Chapter.8)chooses the first instance of the content string ‘Chapter’ which is followed by any single character and then the digit 8, within the location source. Various elements which contain that location could be selected by following the pattern location term with one or more of other types such as ancestor (see above). It is recommended practice to use structure-oriented location types to specify the destination element as narrowly as possible, and then to specify a pattern only within that element context. If element boundaries are encountered within the location source, however, they are ignored and have no effect on the pattern matching operation. In formal terms, the location value of the pattern keyword is defined thus: regs ::= '(' regular ')'
| regs '(' regular ')'
regular ::= character
| '.' // match any character
| '^' & characters & '] // match any char not in list
| '[' & characters & '] // match any char in list
| '\a' // match any alphabetic
| '\d' // match any digit 0-9
| '\n' // match newline (&#RE;&#RS;)
| '\s' // match any whitespace character
| '\\' // match backslash (rev. solidus)
| '\' & nonspecial // match nonspecial character
| regular & '*' // match 0-n of 'regular'
| regular & '+' // match 1-n of 'regular'
| regular & '?' // match 0-1 of 'regular'
| '^' & regular // match at start of loc source
| regular & '$' // match at end of loc source
| regular & regular // match 1st, then 2d regular exp.
| regular & '|' & regular // match either 1st or 2d
| '(' & regular & ')' // use parentheses for grouping
characters ::= /* empty string */
| characters character
nonspecial ::= /* any character except a, d, n, or s */
14.2.2.15 The TOKEN KeywordThe token keyword selects a sequence of one or more tokens chosen from within the character content of the location source, where tokens are counted exactly as for the corresponding HyTime tokenloc form. The location value must be either a single positive integer, or a pair of positive integers separated by white space, representing the first and the last token numbers to be included in the resulting location. If two integers are specified, the second must not be less than the first. The location source must contain at least as many tokens as are specified in the location value. This location type should not be used to count across element boundaries. It is recommended practice to use structure-oriented location types to specify the destination element as narrowly as possible, and then to specify a token location only within that element context. If element boundaries are encountered within the location source, they are ignored. This location type, like the corresponding HyTime construct, behaves intuitively only for strings containing an alternating sequence of SGML name-characters and white space; this is the type of string found, for example, in attribute values of type IDREFS, such as a21 z a13. The related XPath and XPointer specification do not provide such a construct, and those interested in maximizing compatibility may wish to avoid it. For compatibility with the HyTime standard, all characters not included in the class of name characters by the current SGML declaration (by default this includes all punctuation other than the hyphen and full stop) are treated as white space characters. For example, the location specification ID (a27) TOKEN (3 5)chooses the 3rd, 4th, and 5th tokens from the content of the element whose identifier is a27. If this element contained the string ‘This is _not_ a very good idea’, the target selected would be ‘not_ a very’. In formal terms the location value of the token and str keywords is defined as a range: range ::= NUMBER
| NUMBER NUMBER
14.2.2.16 The STR KeywordThe str keyword identifies a sequence of one or more characters chosen from within the character content of the location source, where characters are counted exactly as for the HyTime dataloc form with quantum=str, which has a corresponding meaning and usage. The location value must be either a single positive integer, or a pair of positive integers separated by white space, indicating the first and the last characters to be included in the resulting location. If two integers are specified, the second must not be less than the first. The location source must have at least as many characters as are specified in the larger of the integers. This location type should not be used to count across element boundaries. The recommended practice is to use structure-oriented location types to specify the destination element, and then to specify a character location only within that element context. If element boundaries are encountered, however, within the location source, they have no effect. Character offsets in a document must be counted not from the original source file, but from the output of the SGML parser, (the element structure information set or ESIS, or the XML Document Object Model). This is because a parser may delete or expand certain characters transparently. For example, the location specification ID (a27) STRLOC (3 5)chooses the 3rd 4th and 5th characters of the content of the element having identifier a27. If this element contained the string ‘This turned out to be an even worse idea’, the result would be the string ‘is ’ (i, s and a space). In multi-byte character sets it is characters which are counted, not bytes. However, in the case of diacritics coded by sequences of bit combinations rather than having separate code points for every combination of letter and diacritic, the diacritics are counted. This means that the following location ladder may retrieve different strings, depending on the system character set in use and on the entity declarations in effect: PATTERN (Wagner's\sGötterd&aum;mmerung) STR (10 24)In some character sets, where ö and ä are encoded as single characters, it will select the string ‘Götterdämmerung’; in others, where they are encoded with distinct characters for umlaut, a, and o, it will select the string ‘Götterdämmeru’, truncating the last two letters. If a system-dependent definition is used (containing e.g. a printer escape sequence), the results are even less predictable. For this reason, the str keyword must be used with caution and should be avoided where possible. 14.2.2.17 The SPACE KeywordThe space location term applies to entities which represent graphical or spatio-temporal data; typically such entities are not encoded in SGML or XML, but in one of many specialized graphical formats. The NOTATION declaration and related constructs provide for specifying what format such an entity uses. The location value for space consists of two or three parenthesized parameter lists. The first contains the name of the co-ordinate space in use. The second and third each consist of any number of signed integers. The numbers in a parameter list represent locations along each dimension of a Cartesian co-ordinate space with all axes orthogonal; the length of the list equals the number of dimensions/axes of the space (usually, but not inevitably, 2, 3, or 4). If the third parameter list is not specified, the location is the single point in the co-ordinate space specified by the second parameter list. If all three parameter lists are specified, the location is the rectangular prism defined by treating corresponding items of the second and third lists as inclusive bounds along each dimension in turn. The mapping from co-ordinates to physical or display space, and the meaning and ordering of the axes, are not defined by these guidelines. They should be specified in the TEI header unless they can be determined by definition from the format in which the referenced entity is known to be encoded (for example, many graphics formats can only encode locations in units of pixels, counted in a 3 dimensional left-handed co-ordinate space). Time may be construed as an axis in addition to any others; when it is, it is TEI recommended practice that it be positioned last. The units used must be defined in the TEI header; it is acceptable in certain media (such as videodiscs) to use frame numbers as a surrogate axis for time. SPACE (D2) (0 0) (1 1)specifies the location of the unit square tangent to the origin in quadrant 1 of a common graph. The location value for a space location term is a NAME enclosed in parentheses, followed by a point pair: pointpair ::= '(' numbers ')'
| '(' numbers ')' '(' numbers ')'
numbers ::= signed
| numbers signed
14.2.2.18 The FOREIGN KeywordThe foreign keyword takes any number of parenthesized parameter lists, and is terminated by the end of the attribute value, or by the next non-parenthesized token, whichever comes first. The meaning of the foreign location term is not defined by these Guidelines. It is intended for use in pointing to special kinds of non-hierarchical, non-coordinate space data. That is, it should be used for making links to data which cannot be specified using the other mechanisms. The meaning of any foreign location types must be specified in the TEI header, as a series of paragraphs at the end of the <encodingDesc> element defined in section 5.3 The Encoding Description. If more than one such type is used, it is TEI recommended practice that the first parameter list to foreign be a name associated with the particular type by documentation in the TEI header. For example, assume that some program uses a proprietary data format called XFORM, and that the program has supplied an identifier 06286208998 for some piece of data it owns. Then the location specification FOREIGN (XFORM) (06286208998)would be one way of expressing a link to that piece of data. 14.2.2.19 The HYQ KeywordThe HyQ keyword takes a single parenthesized parameter list, which contains an expression in the HyQ query language defined by the HyTime standard. See documentation on HyTime and HyQ for definitions of HyQ expressions. 14.2.2.20 The DITTO KeywordThe ditto keyword is valid only as the first location term in a ladder, and only within the to attribute of an extended pointer element. It designates the location result of the from attribute on the same element. Thus in the pointer <xptr from="ID (a23) ANCESTOR (1 div[0123]) PATTERN (Wagnerian)"
to="DITTO PATTERN (Liebestod)"/>
the from attribute designates the first occurrence of the
string ‘Wagnerian’ in the <div> containing the
element with an id of a23. The
to attribute designates the first occurrence of the string
‘Liebestod’ which occurs after
‘Wagnerian’, within the same <div>. Without the
ditto keyword, it would be necessary to repeat
the entire location ladder of the from attribute in the
to attribute, which would be error-prone for complex
expressions.
14.2.3 Using Extended PointersAs noted above, when only the from attribute is specified, the <xref> or <xptr> element points at the span indicated by from. When both from and to are specified, the element points at the span running from the beginning of the span indicated by the former to the end of the span indicated by the latter. To point at the second, third, and fourth paragraphs of the second chapter (<div1>) in the body of the current document, therefore, one may specify either of the following: <xptr from="DESCENDANT (1 body) CHILD (2 div1) (2 p)"
to="DESCENDANT (1 body) CHILD (2 div1) (4 p)" />
<!-- or equivalently: -->
<xptr from="DESCENDANT (1 body) CHILD (2 div1) (2 p)"
to="DITTO NEXT (2 p)" />
To point to ‘the <term> occurring in the current <termEntry> with attribute n = 2’, only the from attribute would be required: <xptr from="HERE ANCESTOR (1 termEntry) DESCENDANT (1 term N 2)"/> The following example demonstrates how elements from two different documents may be combined <xptr id="x1" doc="doc1" from="ID (d1.1)"/> <xptr id="x2" doc="doc2" from="ID (d2.1) tree (2 *)"/> <ptr id="p1" target="x1 x2"/> <link evaluate="all" targets="p1 s1 s2"/>The first <xptr> indicates the element in doc1 which has identifier d1.1. The second indicates the second subelement of the element in doc2 which has identifier d2.1. These two elements are pointed to as a single item by the <ptr> element and given the identifier p1. This aggregation, finally, is linked with two other elements both in the current document, with identifiers s1 and s2. An extended pointer, as described above, may specify as its target only a single destination. Where the intended destination of a link is an aggregation or alignment of destinations, possibly in separate documents, an intermediate pointer of some kind must be used, as described in section 14.1.4 Intermediate Pointers elsewhere in this chapter. Like any other element, an <xref> and <xptr> may be given a unique id within the document that contains them. This id value can then be supplied as one of the target values for an intermediate <ptr> or <link> element, to represent aggregation or linkage respectively. The <join> element discussed in section 14.7 Aggregation may also be used. For example, a modern commentary on an older text must frequently refer to that text, which might well be encoded in a separate document. Some discussions will refer to set of discrete passages in the older text, and will thus require multi-headed pointers. In such a case, the document type declaration must contain a declaration for an entity containing the older text, which might look something like this: <!-- the 1729 Dunciad Variorum --> <!ENTITY dunciad SYSTEM 'dunc1729.tei' NDATA TEI-XML >In the commentary itself, reference will be made to this external document, using <xptr> and <xref> elements. When the commentary refers to aggregates of discontiguous passages, <xptr> elements are used to point to the individual passage, and a <ref> element may refer to these passages as a group by pointing to the <xptr>s: <xptr id="xl2.5" doc="dunciad" from="ID (L2.5)"/> <xptr id="xn1.48" doc="dunciad" from="ID (N1.48)"/> <xptr id="xn1.68" doc="dunciad" from="ID (N1.68)"/> <xptr id="xn1.104" doc="dunciad" from="ID (N1.104)"/> <xptr id="xn1.106" doc="dunciad" from="ID (N1.106)"/> ... <p>In <ref evaluate="all" target="xl2.5 xn1.48 xn1.68 xn1.104 xn1.106">the references to Theobald</ref>, Pope's satire characteristically ...</p>If the same discontiguous target is to be referred to repeatedly, it may be convenient to give it a single identifier, thus: <xptr id="xl2.5" doc="dunciad" from="ID (L2.5)"/> <xptr id="xn1.48" doc="dunciad" from="ID (N1.48)"/> <xptr id="xn1.68" doc="dunciad" from="ID (N1.68)"/> <xptr id="xn1.104" doc="dunciad" from="ID (N1.104)"/> <xptr id="xn1.106" doc="dunciad" from="ID (N1.106)"/> <ptr id="theobald" target="xl2.5 xn1.48 xn1.68 xn1.104 xn1.106"/> ... <p>In <ref evaluate="all" target="theobald">the references to Theobald</ref>, Pope's satire characteristically ...</p> A hypertext web might associate passages of the text and notes with the individuals mentioned, the ancient authors imitated, or thematic content, thus: <xptr id='xl2.5' doc='dunciad' from='ID (L2.5)' />
<xptr id='xn1.48' doc='dunciad' from='ID (N1.48)' />
<xptr id='xn1.68' doc='dunciad' from='ID (N1.68)' />
<xptr id='xn1.104' doc='dunciad' from='ID (N1.104)'/>
<xptr id='xn1.106' doc='dunciad' from='ID (N1.106)'/>
<ptr id='theorefs' target='xl2.5 xn1.48 xn1.68 xn1.104 xn1.106'/>
<xptr id='xn2.3' doc='dunciad' from='ID (N2.3)' />
<xptr id='xn2.46' doc='dunciad' from='ID (N2.46)' />
<!-- ... -->
<xptr id='xn2.54' doc='dunciad' from='ID (N2.54)' />
<xptr id='xn2.66' doc='dunciad' from='ID (N2.66)' />
<xptr id='xn2.78' doc='dunciad' from='ID (N2.78)' />
<ptr id='curlrefs' target='xn2.3 xn2.46 xn2.54 xn2.66 xn2.78 xn1.104'/>
<!-- ... -->
<div><head>Individuals Named in the Text</head>
<list type='gloss'>
<label id='curlname'>Curll, Edmund</label>
<item id='curldesc'>A bookseller and publisher ... </item>
<!-- ... -->
<label id='theoname'>Theobald, Lewis.</label>
<item id='theodesc'>Attorney, active also as editor and reviewer ... </item>
<!-- ... -->
</list>
</div>
<div><head>Ancient Authors Imitated in the Text</head>
<list type='bulleted'>
<item id='virgil'>Virgil</item>
<item id='homer' >Homer</item>
<item id='ovid' >Ovid</item>
<!-- ... -->
</list>
</div>
<div type="links">
<ab type="pointer-set">
<ptr id='curll' target='curlname curldesc'/>
<ptr id='theobald' target='theoname theodesc'/>
<!-- ... -->
</ab>
<linkGrp type='bio'>
<link targets='theobald theorefs'/>
<link targets='curl curlrefs'/>
<!-- ... -->
</linkGrp>
<linkGrp type='imitations'>
<link targets='virgil virgrefs' />
<link targets='homer homerefs' />
<!-- ... -->
</linkGrp>
</div>
14.2.4 Representation of HTML links in TEIAs we have indicated, linking to another document (in any format, including HTML) should be done by means of the <xref> or <xptr> element, the former being used if some text is to be supplied to identify the title of the intended link, the latter if it is not. In either case, it is the responsibility of the processor to determine what the target URL for the link should be. In canonical TEI, this target must be supplied as a predefined external entity, the name of which is supplied as the value of the doc attribute on the pointer element concerned: <p>This is discussed in <xref doc="TEIP3">The TEI Guidelines</xref>.</p>or, equivalently, <p>This is discussed in <xptr doc="TEIP3"/>.</p> In either case, the DTD must also include a declaration for the external entity TEIP3, which a processor can use to determine the intended URL, such as the following: <!ENTITY TEIP3 SYSTEM "http://www.tei-c.org/TEI/Guidelines/" NDATA HTML> The target of a link of this kind must always be a complete document. If it is desired to link to some element within the target HTML document, the from attribute may be used to specify its identifier. For example, to point to a subsection within one of the files making up the HTML version of the TEI Guidelines, one would first define an entity corresponding with the appropriate file: <!ENTITY TEIP3SA SYSTEM "http://www.tei-c.org/TEI/Guidelines/SA.html" NDATA HTML>and then use an xpointer to indicate a point within that entity: <p>This is discussed in <xref doc="TEIP3SA" from="id(SAXR)">the chapter on linking</xref>.</p>This is equivalent to the following HTML link: <p>This is discussed in <a href="http://www.tei-c.org/TEI/Guidelines/SA.html#SAXR">the subsection on external linking</a>. In this example, we use the XML identifier as a convenient way of indicating the element which forms the target of the link, since both HTML and XML support this concept. In the case of an HTML document, the target identifier (SAXR) must be supplied as the value for the name attribute on some <a> element in the document; in an XML document, of course, the target element may be of any type. Note that it is illegal to supply a URL like that in the HTML example above as value for an external entity, since its target is only a part of a document. External entities must always be complete documents. The requirement to predefine all target URLs as external entities has some obvious advantages, from the point of view of simplifying the maintenance of a suite of reliable links. It may be easier to maintain a single document containing declarations for all external links than to search through a large suite of documents checking that each link is still valid. However, it may also be regarded as an unnecessary additional chore. As with other parts of the TEI scheme, this method also assumes that external entity declarations can easily be declared and embedded in a DTD subset, a mechanism which may not be appropriate in all XML processing environment. For these reasons, TEI encoders may wish to declare an additional attribute url for the elements <xptr> and <xref>. Since in XML it is permissable to add attributes to an existing element by means of an additional ATTLIST declaration, all that is needed is to provide a DOCTYPE declaration like the following: <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN" "tei2.dtd" [ <!ENTITY % TEI.XML "INCLUDE"> <!ENTITY % TEI.prose "INCLUDE"> <!ENTITY % TEI.linking "INCLUDE"> <!ATTLIST xptr url CDATA #IMPLIED > <!ATTLIST xref url CDATA #IMPLIED > ]> A document with these additional declarations can then simply specify the intended target of a cross-reference using the new url attribute without further formality: <p>This is discussed in <xref url="http://www.tei-c.org/TEI/Guidelines/SA.html">the chapter on linking</xref>.or, equivalently, <p>This is discussed in <xptr url="http://www.tei-c.org/TEI/Guidelines/SA.html"/>. This modification may also, of course, be effected using the standard TEI DTD modification mechanisms discussed in 29 Modifying and Customizing the TEI DTD; this would be preferable if, for example, other modifications are also being made to the TEI DTD. In such a case declarations for the new attributes concerned would be supplied within the TEI extensions entity file. The same approach may be used to embed figures or graphics in an XML document: the <figure> element discussed in section 22 Tables, Formulae, and Graphics may also be given a url attribute for use in place of its existing entity attribute. This extension is not currently a formal recommendation of the TEI Guidelines. Its use is not recommended in documents intended for interchange. It is often convenient to specify the URL from which a document is canonically available within the document itself. This should be done witin the <publicationStmt> of the document's TEI Header (5.2.4 Publication, Distribution, etc.) as in the following example: <publicationStmt> <distributor>Made available by the TEI Consortium at http://www.tei-c.org/Guidelines</distributor> </publicationStmt>or, equivalently, either of the following: <!-- assuming availability of URL attribute --> <publicationStmt> <distributor>Made available by the TEI Consortium at <xptr url="http://www.tei-c.org/Guidelines"/></distributor> </publicationStmt> <!-- assuming pre-declaration of TEIP3 external entity --> <publicationStmt> <distributor>Made available by the TEI Consortium at <xptr doc="TEIP3"/></distributor> </publicationStmt> 14.3 Blocks, Segments and AnchorsIn this section, we define three general purposes elements which may be used to mark and categorize both a span of text and a point within one. These elements have several uses, most notably to provide elements which can be given identifiers for use when aligning or linking to parts of a document, as discussed elsewhere in this chapter. They also provide a convenient way of extending the semantics of the TEI markup scheme in a theory-neutral manner, by providing for two neutral or `anonymous' elements to which the encoder can add any meaning not supplied by other TEI defined elements.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||