Construction of an XML Version of the TEI DTD


C. M. Sperberg-McQueen

7 July 1999

This unpublished document is distributed privately for comment by friends and colleagues; it is not now a formal publication and should not be quoted in published material.

This document has not yet been reviewed by both editors of the TEI; what it says about the beliefs of the editors should be taken as a proposal by the author for the approval of his co-editor.

Table of Contents

Abstract

This document describes issues involved in creating an XML version of the SGML document type definition (DTD) created by the Text Encoding Initiative, and proposes solutions. It defines a TEI extensions file which incorporates those solutions, in order to allow experimentation.

The discussion of inclusion exceptions defines a method of rewriting SGML content models so as to achieve effects similar to those provided by inclusion exceptions. To make an SGML document type definition compatible with XML, inclusion exceptions must be eliminated. The simplest method of ensuring that this change does not invalidate existing documents is to modify the content model of every element which can occur as a descendant of any element with inclusion exceptions in its content model, in the manner described here. That will ensure that elements named in inclusion exceptions remain legal in all the locations where they are currently legal.

The methods of changing content models described in this paper are believed to preserve determinism (what ISO 8879 calls lack of ambiguity) and to simulate the effects of inclusion exceptions properly. At this point, however, no proof of either conjecture is offered.


1 Introduction

1.1 XML and DTDs

The Extensible Markup Language (XML) defines a syntax for document type definitions similar to that provided by the Standard Generalized Markup Language (SGML), but more restrictive. In particular, XML allows neither inclusion nor exclusion exceptions, and prohibits the ampersand connector.

Modifying an existing SGML document type definition (DTD), such as the TEI DTD, to conform to XML thus involves:

1.2 Modifying the TEI DTD for XML

This document describes in detail the changes necessary to perform these modifications on the TEI DTD. The changes take the form of TEI modifications files suitable for use as the entities TEI.extensions.ent and TEI.extensions.dtd files.

The modifications have different degrees of difficulty. Some affect the technical content of the TEI DTD in serious ways, and therefore require review by the TEI's Technical Review Committee before being formally integrated into TEI P3, while others do not affect the technical content of the TEI at all, or affect it only in minor ways. Changes of this latter type may be regarded as corrections of obvious simple errors, and may be performed by the editors under their authority to correct corrigible errors in the text of the Guidelines. (The concept of corrigible error is defined in document TEI ED W46 (?); in brief, a corrigible error is one which both editors agree is an error, which has an obvious fix, and the fix for which will not affect any existing data.) Each change proposed in this paper is identified as either a correction to a corrigible error, which the editors expect to fix in the course of preparing a revised and corrected reprint of TEI P3, or else a substantive change requiring review by the Technical Review Committee.

1.3 Overview of changes to the TEI DTD

Not all of the changes to the DTD are handled by this document. [1] Those that are, are summarized in the following overviews of the extensions files.

< 1 teixml.ent >(teixml.ent) =

<!--* teixml.ent:  XML version of TEI (1999-07-07)           *-->
<!--* This is the TEI.extensions.ent file of an experimental
    * version of the TEI P3 DTD, adapted to be XML conformant.
    * N.B. using this extensions file with the standard TEI DTD
    * will not make the DTD completely XML compliant.  Some
    * post-processing is needed.  Use the pizza chef at
    * http://www.uic.edu/orgs/tei/pizza.html or
    * http://firth.natcorp.ox.ac.uk/TEI/nupizza.html
    *
    * This version:  1999-07-07b
    *
    * Send comments to tei-l@listserv.uic.edu or to 
    * teitech@listserv.uic.edu
    * Thank you for beta testing! 
    *-->
< Provide default tagset declarations 152 >
< Define TEI keywords 153 >

< Fix placePart class 154 >
< Reproduce class declarations for phrases 22 >
< Reproduce inclusion classes 42 >
< Reproduce classes used by specPara 51 >
< Embed tag-set-specific ent files 151 >
< Element class m.Incl 41 >
< New specialPara 50 >
< New declaration for phrase and phrase.seq 45 >
< New declaration for paraContent 49 >
< New declaration for component and component.seq 47 >

< Suppress definitions of elements with ampersand 3 >
< Suppress element declarations with exclusions 40 >
< Suppress some mixed content elements 11 >
< Suppress users of phrase.seq 24 >
< Suppress standard definitions of PCDATA elements 43 >

< Suppress definitions in core tag set 54 >
< Suppress definitions in text-structure tag set 67 >
< Suppress definitions in front-matter tag set 82 >
< Suppress definitions in header tag set 86 >
< Suppress definitions in verse tag set 98 >
< Suppress definitions in drama tag set 104 >
< Suppress definitions in spoken-text tag set 110 >
< Suppress definitions in terminology tag set 112 >
< Suppress definitions in segmentation and alignment tag set 117 >
< Suppress definitions in analysis tag set 122 >
< Suppress definitions in feature-structures tag set 128 >
< Suppress definitions in text-criticism tag set 136 >
< Suppress definitions in graphs tag set 140 >
< Suppress definitions in tables tag set 146 >

< 2 teixml.dtd >(teixml.dtd) =

<!--* teixml.dtd:  XML version of TEI (1999-07-07)           *-->
<!--* This is the TEI.extensions.dtd file of an experimental
    * version of the TEI P3 DTD, adapted to be XML conformant.
    * N.B. using this extensions file with the standard TEI DTD
    * will not make the DTD completely XML compliant.  Some
    * post-processing is needed.  Use the pizza chef at
    * http://www.uic.edu/orgs/tei/pizza.html or
    * http://firth.natcorp.ox.ac.uk/TEI/nupizza.html
    *
    * This version:  1999-07-07b
    *
    * Send comments to tei-l@listserv.uic.edu or to 
    * teitech@listserv.uic.edu
    * Thank you for beta testing! 
    *-->
< New definitions of elements with ampersand 4 >
< Redeclare elements with mixed content elements 12 >
< New declarations for users of phrase.seq 25 >
< New declarations for exclusion exceptions 37 >
< New definitions for PCDATA elements 44 >
<!--* handle specialPara *-->
< New definition of set element 53 >

< New definitions for core tag set 55 >
< New definitions for text-structure tag set 68 >
< New definitions for front-matter tag set 83 >
< New definitions for header tag set 87 >
< New definitions for verse tag set 99 >
< New definitions for drama tag set 105 >
< New definitions for spoken-text tag set 111 >
< New definitions for terminology tag set 113 >
< New definitions for flat terminology tag set 116 >
< New definitions for segmentation and alignment tag set 118 >
< New definitions for analysis tag set 123 >
< New definitions for feature-structures tag set 129 >
< New definitions for text-criticism tag set 137 >
< New definitions for graphs tag set 141 >
< New definitions for tables tag set 147 >

1.4 Intended use of this document

The immediate goal of this document is to allow experimentation with the TEI DTD and XML processors, by providing the extensions files needed to make the full TEI P3 DTD work with XML processors. To use the extensions files created by this document with other extensions files (e.g. those of TEI Lite), manual merger of the extensions files is required. The editors plan to automate this merger as soon as possible; the following stages of development are anticipated:

A list of open questions is included at the end of the document.

2 Tag omissibility information

Removing tag omissibility information is a trivial task which can be accomplished by a DTD pretty printer, or even a simple editor script. The strings - -, - O, O -, and O O are legal in a DTD only as tag omissibility information, within comments, or within literals. In the TEI DTDs, they do not occur within literals or comments, so a global change in an editor would handle the problem.

To enable the necessary changes to be made with a minimum of manual intervention, however, it is probably better to add a run-time option to a DTD pretty printer, to make it suppress this information, or replace it with a reference to one of the parameter entities om.RR, om.RO, om.OR, or om.OO. If the run-time flag is set, the following entities will be added to the beginning of the DTD:

<!ENTITY % om.RR '- -'>
<!ENTITY % om.RO '- O'>
<!ENTITY % om.OR 'O -'>
<!ENTITY % om.OO 'O O'>
The program carthago has accordingly been outfitted with two run-time options to suppress the omissibility markers, or to replace them with entity references.

3 Normalizing parameter-entity references

In the short term, we will normalize parameter-entity references using the pretty printer mentioned above (or else eliminate them entirely, by running the test DTD through a pre-processor like Carthage, which expands all parameter-entity references).

In the long run, we will systematically normalize all content models in the tagdocs of TEI P3 by adding semicolons to parameter-entity references which currently do not have them. N.B. the editors regard this as a correction of a corrigible error, and this normalization will be performed in the text of TEI P3 as soon as possible.

4 Ampersand connectors

Removing ampersand connectors involves either rewriting the content model as a set of alternative sequence groups (thus retaining strict equivalence with the existing model) or revising the content model entirely. In the case of the TEI, the editors both agree that most uses of & have proven to be design errors, so we propose simply to revise the content models.

The following content models use ampersand connectors in TEI P3:

In this section, we provide alternate declarations for each of them. In the entity extensions file we must first suppress all of them:

< 3 Suppress definitions of elements with ampersand > =

<!ENTITY % cit             'IGNORE' >
<!ENTITY % respStmt        'IGNORE' >
<!ENTITY % publicationStmt 'IGNORE' >
<!ENTITY % graph           'IGNORE' >


And in thd DTD extensions file we must redefine them all:

< 4 New definitions of elements with ampersand > =

< New cit declaration 5 >
< Define new respStmt 8 >
< New publicationStmt 9 >
< New graph element 10 >

N.B. All the ampersand-eliminating content-model changes in this section are regarded by the editors as corrections of corrigible errors, and will be integrated into the text of TEI P3 as soon as possible.

4.1 The <cit> element

The standard declaration for <cit> is as follows:

<!ELEMENT %n.cit;       - -  ((%n.q; | %n.quote;) & (%m.bibl; |
                             %m.loc;))                          >
We will redefine it with a slightly more general content model (well, almost -- see below):

< 5 New cit declaration > =

<!ENTITY % XML.cit "INCLUDE" >
<![%XML.cit;[
<!ELEMENT %n.cit;       - -  ((%n.q; | %n.quote; | %m.bibl; |
                             %m.loc; | %m.Incl;)+)              >
<!ATTLIST %n.cit;            %a.global;
          TEIform            CDATA               'cit'          >
]]>


(The Incl class included here has to do with inclusion exceptions; see below.) If we wished to replicate precisely the original content model, without the ampersand, we could define <cit> thus:
<!ELEMENT %n.cit;       - -  (((%n.q; | %n.quote;),
                               (%m.bibl; | %m.loc;))
                             | ((%m.bibl; | %m.loc;),
                               (%n.q; | %n.quote;)))            >

As it turns out, however the declaration proposed above is ambiguous, since <link> is a member of both the loc and Incl classes. We'll have to unroll one or the other of these two classes; a coin toss decides that we should unroll loc.

< 6 New cit declaration (alternate) > =

<!ENTITY % XML.cit "INCLUDE" >
<![%XML.cit;[
<!ELEMENT %n.cit;       - -  ((%n.q; | %n.quote; | %m.bibl; 
                             | %n.ptr; | %n.ref; 
                             | %n.xptr; | %n.xref;
                             | %m.Incl;)+)                      >
<!ATTLIST %n.cit;            %a.global;
          TEIform            CDATA               'cit'          >
]]>


After further investigation (i.e. further attempts to use the DTD produced by a draft of this paper), however, it becomes clear that loc is a subclass of phrase, so that every content model which uses both the phrase class and the Incl class is going to have troubles. So instead of unrolling each case individually, we take a harsher approach, and remove <link> from the loc class.

< 7 New loc class > =

<!--* remove link from loc class to avoid ambiguity          *-->
<!ENTITY % x.loc ''                                             >
<!ENTITY % m.loc '%x.loc; %n.ptr; | %n.ref; |
           %n.xptr; | %n.xref;'                                 >


This should not cause problems for any existing data, since <link> is still a member of the class Incl, which is (after all) allowed virtually everywhere.

4.2 The <respStmt> element

Similarly, we could replicate the original definition of <respStmt> if we wished, but it's probably better regarded as a design error to be fixed:

<!ELEMENT %n.respStmt;  - O  ((%n.resp; & %n.name;), (%n.resp;
                             | %n.name;)*)                      >
We give it a simpler and looser declaration instead:

< 8 Define new respStmt > =

<!ENTITY % XML.respStmt "INCLUDE" >
<![%XML.respStmt;[
<!ELEMENT %n.respStmt;  - O  (%n.resp; | %n.name;
                             | %m.Incl;)+                       >
<!ATTLIST %n.respStmt;       %a.global;
          TEIform            CDATA               'respStmt'     >
]]>


The prose should make clear that in principle, a <respStmt> should have at least one <resp> and at least one <name>. Enforcing that with the content model may be more pedantic than we want to be, though.
<!ELEMENT %n.respStmt;  - O  (((%n.resp;)+,
                             (%n.name;, (%n.resp; | %n.name;)*))
                             | ((%n.name;)+,
                             (%n.resp;, (%n.resp; | %n.name;)*)))

4.3 The <publicationStmt> element

The content model for <publicationStmt> includes an editorial error I am glad to have the occasion to fix. (In normal bibliographic practice, when place and publisher are both given, the place is given first. I don't know what got into me that morning.)

<!ELEMENT %n.publicationStmt;
                        - O  ((%n.p;)+ | ( (%n.publisher; |
                             %n.distributor; | %n.authority;) &
                             ((%n.pubPlace)?, (%n.address)?,
                             (%n.idno)*, (%n.availability)?,
                             (%n.date)?)+ )+ )                  >
Rather than simply replace the current content model with an equivalent ampersand-less expression, we'll change it. For compatibility with existing data, we'll make the new expression loose rather than tight.

< 9 New publicationStmt > =

<!ENTITY % XML.publicationStmt "INCLUDE" >
<![%XML.publicationStmt;[
<!ELEMENT %n.publicationStmt;
                        - O  ( (%n.p;, (%m.Incl;)*)+
                             | ((%n.publisher; | %n.distributor;
                             | %n.authority; | %n.pubPlace;
                             | %n.address; | %n.idno;
                             | %n.availability; | %n.date;),
                               (%m.Incl;)*)+ )                  >
<!ATTLIST %n.publicationStmt; %a.global;
          TEIform            CDATA               'publicationStmt'
                                                                >
]]>


4.4 The <graph> element

The <graph> element uses the content model to require that graphs be encoded nodes-first or arcs-first, but not mixed hugger-mugger. We'll retain that characteristic. The old declaration is this:

<!ELEMENT %n.graph;     - -  ((%n.node;)+ & (%n.arc;)*)         >
We could require arbitrarily that all nodes come first; it's not clear whether any legacy data using <graph> actually exists. But in the interests of backward compatibility, the new content model might as well allow precisely what the old one did, even if that now seems like a design error:

< 10 New graph element > =

<![%TEI.nets;[
<!ENTITY % XML.graph "INCLUDE" >
<![%XML.graph;[
<!ELEMENT %n.graph;     - -  (((%n.node;, (%m.Incl;)*)+,
                               (%n.arc;, (%m.Incl;)*)*)
                             | ((%n.arc;, (%m.Incl;)*)+,
                               (%n.node;, (%m.Incl;)*)+))       >
<!ATTLIST %n.graph;          %a.global;
          type               CDATA               #IMPLIED
          label              CDATA               #IMPLIED
          order              NUMBER              #IMPLIED
          size               NUMBER              #IMPLIED
          TEIform            CDATA               'graph'        >
]]>
]]>


5 Normalizing mixed-content models

5.1 Individual elements

The following elements use the keyword #PCDATA in ways that must be changed to be legal in XML:

In most of these cases, the #PCDATA keyword is given last, not first, in the content model; in one or two, it's neither first nor last. For example:
<!ELEMENT %n.sense;     - -  (%n.sense; | %m.dictionaryTopLevel
                             | %m.phrase | #PCDATA)*            >
In one or two cases, the group also has a plus operator instead of a star operator.
<!ELEMENT %n.timeStruct;
                        - -  ((%m.temporalExpr; | #PCDATA)+)    >

We must redeclare each of them, which means first of all that we must suppress their standard declarations:

< 11 Suppress some mixed content elements > =

<!ENTITY % sense 'IGNORE' >
<!ENTITY % re 'IGNORE' >
<!ENTITY % persName 'IGNORE' >
<!ENTITY % placeName 'IGNORE' >
<!ENTITY % geogName 'IGNORE' >
<!ENTITY % dateStruct 'IGNORE' >
<!ENTITY % timeStruct 'IGNORE' >
<!ENTITY % dateline 'IGNORE' >


and separately we must redefine them:

< 12 Redeclare elements with mixed content elements > =

<![%TEI.dictionaries;[
< New mixed content elements for dictionaries 13 >
]]>
<![%TEI.names.dates;[
< New mixed content elements for names and dates 15 >
]]>
< New mixed content elements for structure 20 >

Since the normalization is purely mechanical, there seems to be no need to reproduce the original declarations here. The new declarations are given below.

N.B. All the mixed-content normalization changes in this section are regarded by the editors as corrections of corrigible errors, and will be integrated into the text of TEI P3 as soon as possible.

Two elements in this group are from the dictionary tag set:

< 13 New mixed content elements for dictionaries > =

<!ENTITY % XML.sense "INCLUDE" >
<![%XML.sense;[
<!ELEMENT %n.sense;     - -  (#PCDATA | %n.sense;
                             | %m.dictionaryTopLevel;
                             | %m.phrase; | %m.Incl;)*          >
<!ATTLIST %n.sense;          %a.global;
                             %a.dictionaries;
          level              NUMBER              #IMPLIED
          TEIform            CDATA               'sense'        >
]]>


< 14 New mixed content elements for dictionaries 13 (cont'd) > =

<!ENTITY % XML.re "INCLUDE" >
<![%XML.re;[
<!ELEMENT %n.re;        - O  (#PCDATA | %n.sense;
                             | %m.dictionaryTopLevel;
                             | %m.phrase; | %m.Incl;)*          >
<!ATTLIST %n.re;             %a.global;
                             %a.dictionaries;
          type               CDATA               #IMPLIED
          TEIform            CDATA               're'           >
]]>


Note that the standard declaration for <re> also has an exclusion exception which has been dropped silently here. N.B. Elimination of exclusion exceptions is not a corrigible error; the version of this declaration which will go into TEI P3 without review is this:
<!ELEMENT %n.re;        - O  (#PCDATA | %n.sense;
                             | %m.dictionaryTopLevel;
                             | %m.phrase;)*      -(%n.re;)      >

The other elements in this group are from the tag set for names and dates.

< 15 New mixed content elements for names and dates > =


<!ENTITY % XML.persName "INCLUDE" >
<![%XML.persName;[
<!ELEMENT %n.persName;  - -  (#PCDATA | %m.personPart;
                             | %m.phrase; | %m.Incl;)*          >
<!ATTLIST %n.persName;       %a.global;
                             %a.names;
          type               CDATA               #IMPLIED
          TEIform            CDATA               'persName'     >
]]>


< 16 New mixed content elements for names and dates 15 (cont'd) > =

<!ENTITY % XML.placeName "INCLUDE" >
<![%XML.placeName;[
<!ELEMENT %n.placeName; - -  (#PCDATA | %m.placePart;
                             | %m.phrase; | %m.Incl;)*          >
<!ATTLIST %n.placeName;      %a.global;
          type               CDATA               #IMPLIED
          full               (yes | abb | init)  yes
                             %a.names;
          TEIform            CDATA               'placeName'    >
]]>


< 17 New mixed content elements for names and dates 15 (cont'd) > =

<!ENTITY % XML.geogName "INCLUDE" >
<![%XML.geogName;[
<!ELEMENT %n.geogName;  - -  (#PCDATA | %n.geog; | %n.name;
                             | %m.Incl;)*                       >
<!ATTLIST %n.geogName;       %a.global;
                             %a.placePart;
          TEIform            CDATA               'geogName'     >
]]>


< 18 New mixed content elements for names and dates 15 (cont'd) > =

<!ENTITY % XML.dateStruct "INCLUDE" >
<![%XML.dateStruct;[
<!ELEMENT %n.dateStruct;
                        - -  (#PCDATA | %m.temporalExpr;
                             | %m.Incl;)*                       >
<!ATTLIST %n.dateStruct;     %a.global;
                             %a.temporalExpr;
          calendar           CDATA               #IMPLIED
          exact              CDATA               #IMPLIED
          TEIform            CDATA               'dateStruct'   >
]]>


< 19 New mixed content elements for names and dates 15 (cont'd) > =

<!ENTITY % XML.timeStruct "INCLUDE" >
<![%XML.timeStruct;[
<!ELEMENT %n.timeStruct;
                        - -  (#PCDATA | %m.temporalExpr;
                             | %m.Incl;)*                       >
<!ATTLIST %n.timeStruct;     %a.global;
                             %a.temporalExpr;
          zone               CDATA               #IMPLIED
          TEIform            CDATA               'timeStruct'   >
]]>


The <dateline> element (from the default text-structure tag set) is the last one needing a mixed-content fix:

< 20 New mixed content elements for structure > =

<!ENTITY % XML.dateline "INCLUDE" >
<![%XML.dateline;[
<!ELEMENT %n.dateline;  - O  (#PCDATA | %n.date; | %n.time;
                             | %n.name; | %n.address;
                             | %m.Incl;)*                       >
<!ATTLIST %n.dateline;       %a.global;
          TEIform            CDATA               'dateline'     >
]]>


5.2 The entities phrase and phrase.seq

The XML rules for mixed-content models also require that the declarations for phrase and phrase.seq be changed slightly. The current defintions are:

<!ENTITY % phrase '(#PCDATA | %m.phrase)'                       >
<!ENTITY % phrase.seq '(%phrase;)*'                             >
These give us one level too many of parentheses; we need to remove the parentheses from the entity phrase:

< 21 New declaration for phrase and phrase.seq > =

<!ENTITY % phrase '#PCDATA | %m.phrase;'                        >
<!ENTITY % phrase.seq '(%phrase;)*'                             >


N.B. This change to the declaration of phrase is regarded by the editors as the correction of a corrigible error, and will be integrated into the text of TEI P3 as soon as possible.

Unfortunately, integrating this particular fix into the XML modifications file for testing will require that we either hard-code the effective value of m.phrase, or that we recreate the entire sequence of class declarations for phrase in the modifications file. (Sigh.) While we are here, we will introduce some fixes to the declarations of some classes:

< 22 Reproduce class declarations for phrases > =

< Declare new GIs 23 >
<!ENTITY % x.hqphrase ''                                        >
<!ENTITY % m.hqphrase '%x.hqphrase; %n.distinct; | %n.emph; |
           %n.foreign; | %n.gloss; | %n.hi; | %n.mentioned; |
           %n.soCalled; | %n.term; | %n.title;'                 >
<!ENTITY % x.data ''                                            >
<!ENTITY % m.data '%x.data; %n.abbr; | %n.address; | %n.date; 
           | %n.dateRange; | %n.dateStruct; | %n.expan; 
           | %n.geogName; 
           | %n.lang; | %n.measure; | %n.name; | %n.num;
           | %n.orgName; | %n.persName; | %n.placeName; 
           | %n.rs; | %n.time; | %n.timeRange; 
           | %n.timeStruct;'                                    >
<!ENTITY % x.edit ''                                            >
<!ENTITY % m.edit '%x.edit; %n.add; | %n.app; |
           %n.corr; | %n.damage; | %n.del; | 
           %n.orig; | %n.reg; | %n.restore; | %n.sic;
           | %n.space; | %n.supplied; | %n.unclear;'            >
<!ENTITY % x.editIncl ''                                        >
<!ENTITY % m.editIncl '%x.editIncl; %n.addSpan; | %n.delSpan; | 
           %n.gap;'                                             >

< New loc class 7 >
<!ENTITY % x.seg ''                                             >
<!ENTITY % m.seg '%x.seg; %n.c; | %n.cl; | %n.m; |
           %n.phr; | %n.s; | %n.seg; | %n.w;'                   >
<!ENTITY % x.sgmlKeywords ''                                    >
<!ENTITY % m.sgmlKeywords '%x.sgmlKeywords; %n.att; | %n.gi; |
           %n.tag; | %n.val;'                                   >
<!ENTITY % x.phrase.verse ''                                    >
<!ENTITY % m.phrase.verse '%x.phrase.verse; %n.caesura;'        >
<!ENTITY % x.formPointers ''                                    >
<!ENTITY % m.formPointers '%x.formPointers; %n.oRef; | %n.oVar; 
           | %n.pRef; | %n.pVar;'                               >
<!ENTITY % x.phrase ''                                          >
<!ENTITY % m.phrase '%x.phrase; %m.data; | %m.edit; |
           %m.formPointers; | %m.hqphrase; | %m.loc; |
           %m.phrase.verse; | %m.seg; | %m.sgmlKeywords; |
           %n.dictAnomaly; |
           %n.formula; | %n.fw; | %n.handShift;'                >

<!ENTITY % x.fmchunk ''                                         > 
<!ENTITY % m.fmchunk '%x.fmchunk; %n.argument; | %n.byline; | 
           %n.docAuthor; | %n.docDate; | %n.docEdition; | 
           %n.docImprint; | %n.docTitle; | %n.epigraph; | 
           %n.head; | %n.titlePart;'                            >


The element <dictAnomaly> is new; for a description, see below, section The problem of the dictionary chapter.

We need to declare the name of <dictAnomaly>.

< 23 Declare new GIs > =

<!ENTITY % n.dictAnomaly 'dictAnomaly'                          >

5.3 Elements using phrase.seq and paraContent

Note that neither phrase.seq nor paraContent may be combined with other elements in a content model, in XML, because of the XML requirement that mixed content models not have nested groups. This affects the declarations for

These must be suppressed, in order to be redeclared:

< 24 Suppress users of phrase.seq > =

<!ENTITY % castItem 'IGNORE' >
<!ENTITY % docImprint 'IGNORE' >
<!ENTITY % catDesc 'IGNORE' >
<!ENTITY % byline 'IGNORE' >
<!ENTITY % opener 'IGNORE' >
<!ENTITY % closer 'IGNORE' >
<!ENTITY % form 'IGNORE' >
<!ENTITY % gramGrp 'IGNORE' >
<!ENTITY % trans 'IGNORE' >
<!ENTITY % etym 'IGNORE' >
<!ENTITY % xr 'IGNORE' >


And they need to be redefined, tag set by tag set. (We put elements from each tag set into separate scraps to simplify production of specialized modification files.)

< 25 New declarations for users of phrase.seq > =

< New castItem 26 >
< New docImprint 27 >
< New catDesc 28 >
< New opener and closer 29 >
< New phrase.seq elements for dictionaries 32 >

First, the base tag set for drama:

< 26 New castItem > =

<![%TEI.drama;[
<!ENTITY % XML.castItem "INCLUDE" >
<![%XML.castItem;[
<!ELEMENT %n.castItem;  - O  (#PCDATA | %n.role; | %n.roleDesc;
                             | %n.actor; | %m.phrase;
                             | %m.Incl;)*                       >
<!ATTLIST %n.castItem;       %a.global;
          type               (role | list)       role
          TEIform            CDATA               'castItem'     >
]]>
]]>


Next the tag set for front matter:

< 27 New docImprint > =

<!ENTITY % XML.docImprint "INCLUDE" >
<![%XML.docImprint;[
<!ELEMENT %n.docImprint;
                        - O  (#PCDATA | %m.phrase; | %n.pubPlace;
                             | %n.docDate; | %n.publisher;
                             | %m.Incl;)*                       >
<!ATTLIST %n.docImprint;     %a.global;
          TEIform            CDATA               'docImprint'   >
]]>


Then, the header:

< 28 New catDesc > =

<!ENTITY % XML.catDesc "INCLUDE" >
<![%XML.catDesc;[
<!ELEMENT %n.catDesc;   - O  (#PCDATA | %m.phrase;
                             | %n.textDesc;)*                   >
<!ATTLIST %n.catDesc;        %a.global;
          TEIform            CDATA               'catDesc'      >
]]>


And the default text-structure tag set:

< 29 New opener and closer > =

<!ENTITY % XML.byline "INCLUDE" >
<![%XML.byline;[
<!ELEMENT %n.byline;    - O  (#PCDATA | %m.phrase;
                             | %n.docAuthor; | %m.Incl;)*       >
<!ATTLIST %n.byline;         %a.global;
          TEIform            CDATA               'byline'       >
]]>


< 30 New opener and closer 29 (cont'd) > =

<!ENTITY % XML.opener "INCLUDE" >
<![%XML.opener;[
<!ELEMENT %n.opener;    - O  (#PCDATA | %m.phrase;
                             | %n.argument; | %n.byline;
                             | %n.epigraph;
                             | %n.signed; | %n.dateline;
                             | %n.salute; | %m.Incl;)*          >
<!ATTLIST %n.opener;         %a.global;
          TEIform            CDATA               'opener'       >
]]>


< 31 New opener and closer 29 (cont'd) > =

<!ENTITY % XML.closer "INCLUDE" >
<![%XML.closer;[
<!ELEMENT %n.closer;    - O  (#PCDATA | %m.phrase;
                             | %n.signed; | %n.dateline;
                             | %n.salute; | %m.Incl;)*          >
<!ATTLIST %n.closer;         %a.global;
          TEIform            CDATA               'closer'       >
]]>


And finally the base tag set for dictionaries; unlike the preceding elements, these all use paraContent, not phrase.seq. N.B. these content models will require further changes before publication. See below, The problem of the dictionary chapter.

< 32 New phrase.seq elements for dictionaries > =

<![%TEI.dictionaries;[
<!ENTITY % XML.form "INCLUDE" >
<![%XML.form;[
<!ELEMENT %n.form;      - -  (#PCDATA | %m.phrase; | %m.inter;
                             | %m.formInfo; | %m.Incl;)*        >
<!ATTLIST %n.form;           %a.global;
                             %a.dictionaries;
          type               CDATA               #IMPLIED
          TEIform            CDATA               'form'         >
]]>


< 33 New phrase.seq elements for dictionaries 32 (cont'd) > =

<!ENTITY % XML.gramGrp "INCLUDE" >
<![%XML.gramGrp;[
<!ELEMENT %n.gramGrp;   - -  (#PCDATA | %m.phrase; | %m.inter;
                             | %m.gramInfo; | %m.Incl;)*        >
<!ATTLIST %n.gramGrp;        %a.global;
                             %a.dictionaries;
          TEIform            CDATA               'gramGrp'      >
]]>


< 34 New phrase.seq elements for dictionaries 32 (cont'd) > =

<!ENTITY % XML.trans "INCLUDE" >
<![%XML.trans;[
<!ELEMENT %n.trans;     - O  (#PCDATA | %m.phrase; | %m.inter;
                             | %m.dictionaryParts; | %m.Incl;)* >
<!ATTLIST %n.trans;          %a.global;
                             %a.dictionaries;
          TEIform            CDATA               'trans'        >
]]>


< 35 New phrase.seq elements for dictionaries 32 (cont'd) > =

<!ENTITY % XML.etym "INCLUDE" >
<![%XML.etym;[
<!ELEMENT %n.etym;      - O  (#PCDATA | %m.phrase; | %m.inter;
                             | %n.usg; | %n.lbl; | %n.def;
                             | %n.trans; | %n.tr;
                             | %m.morphInfo; | %n.eg;
                             | %n.xr; | %m.Incl;)*              >
<!ATTLIST %n.etym;           %a.global;
                             %a.dictionaries;
          TEIform            CDATA               'etym'         >
]]>


< 36 New phrase.seq elements for dictionaries 32 (cont'd) > =

<!ENTITY % XML.xr "INCLUDE" >
<![%XML.xr;[
<!ELEMENT %n.xr;        - O  (#PCDATA | %m.phrase; | %m.inter;
                             | %n.usg; | %n.lbl; | %m.Incl;)*   >
<!ATTLIST %n.xr;             %a.global;
                             %a.dictionaries;
          type               CDATA               #IMPLIED
          TEIform            CDATA               'xr'           >
]]>
]]>


Since paraContent also occurs in the definition of specialPara, in a form not legal in XML, the specialPara entity must also be redefined; see below, The problem of specialPara elements.

6 Exceptions

Removing inclusion and exclusion exceptions typically involves changing the set of documents accepted by the DTD.[2] In the discussion which follows, I assume that our goal is to ensure that every document legal in the original DTD remains legal in the modified DTD. The changes will cause the modified DTD to accept some other documents which are not valid instances of the original DTD. That is, if the original DTD is taken as an absolutely correct definition of a language, the revised DTD will overgenerate.[3] We will wish to keep the overgeneration to a minimum, but in general we cannot eliminate it entirely, since inclusion and exclusion exceptions do extend the expressive power of the DTD notation.[4]

7 Exclusions

Rewriting declarations without exclusion exceptions involves simply removing the exception, and adding an application-specific constraint to be checked outside the SGML parser, that says the excluded element types must not occur within the element type which excluded them. Thus, for example, the TEI <s> element (for end-to-end segmentation on the level of the orthographic sentence) is currently declared thus:

<!ELEMENT s  - -  (%phrase.seq)  -(s) >
An XML-compatible TEI DTD would replace this with:
<!ELEMENT s %phrase.seq;  >

<!--* CONSTRAINT:  <s> must not occur within
    * an <s>, i.e. Ancestor(1,s) = NIL
    *-->
The important change here, for present purposes, is the removal of the exclusion exception. In addition, we have removed the tag omissibility indicators and the parentheses around phrase.seq, for reasons that should be clear from other portions of this document.

It would be possible to simulate the effect of exclusion exceptions by modifying the content models of possible descendants of <s>, so as to remove <s> from their content model; for elements which can occur both as parents and as descendants of <s>, however, this change would render some existing documents illegal; it is thus not pursued further here.

The following elements have exclusion exceptions in TEI P3:

The new declarations are precisely the same as the old declarations, only without the exclusions:

< 37 New declarations for exclusion exceptions > =

<![ %TEI.analysis; [
<!ENTITY % XML.s "INCLUDE" >
<![%XML.s;[
<!ELEMENT %n.s;         - -  %phrase.seq;                       >
<!ATTLIST %n.s;              %a.global;
                             %a.seg;
          TEIform            CDATA               's'            >
]]>
]]>


< 38 New declarations for exclusion exceptions 37 (cont'd) > =

<!ENTITY % XML.speaker "INCLUDE" >
<![%XML.speaker;[
<!ELEMENT %n.speaker;   - O  %phrase.seq;                       >
<!ATTLIST %n.speaker;        %a.global;
          TEIform            CDATA               'speaker'      >
]]>


< 39 New declarations for exclusion exceptions 37 (cont'd) > =

<!ENTITY % XML.stage "INCLUDE" >
<![%XML.stage;[
<!ELEMENT %n.stage;     - -  %specialPara;                      >
<!ATTLIST %n.stage;          %a.global;
          type               CDATA               mix
          TEIform            CDATA               'stage'        >
]]>


And they have to be excluded from the base DTD:

< 40 Suppress element declarations with exclusions > =

<!ENTITY % s       'IGNORE' >
<!ENTITY % speaker 'IGNORE' >
<!ENTITY % stage   'IGNORE' >


A new definition of <re> has already been given above, in the context of normalizing mixed-content models. The new definition of <hom> would be as follows:

<!ELEMENT %n.hom;       - O  (%n.sense; |
                             %m.dictionaryTopLevel)*            >
The actualy form to be used for <hom> in an XML DTD, however, varies from this, as described below in The problem of the dictionary chapter.

8 Inclusions

Removing inclusion exceptions requires simulating their effect in the content model of each element type which can occur as a descendant of the element type bearing the inclusions. This section discusses

A brief note on the notation used is given in an appendix.

8.1 The Effect of Inclusions

Inclusions make included elements legal at any location in a content model, without however changing the requirements of the basic content model, which must still be fulfilled. (For now, I make the simplifying assumption that the set of included elements and the set of elements named in the content model are disjoint. When they are not, special considerations will apply, because of SGML's requirement that content models be deterministic.)

We can summarize the effect of inclusions very simply if we think of an FSA recognizing a content model: included elements do not change the state of the FSA. So to change an FSA without inclusions to an FSA that accepts the same language, except that it also allows the inclusion of any element i in the set of inclusions I,

    for each state s in the FSA {
       for each element i in I {
          add a transition from s to s, on i
       }
    }

8.2 The Function imf()

We can characterize the language recognized using inclusion exceptions this way. Let us construct a function imf(E,I) which maps from a regular expression E and a set of inclusions I to a new regular expression E'. Ideally we want the following to be true:

In general, for sequences of terminals x, y in Sigma*:

My best cut so far at defining such a function relies in some places on a couple of auxiliary functions. So let us define functions imf(E), mf(E), and m(E) (where i is for `initial', m for `medial', f for `final').[5] imf(E) makes the claim about xiy true for all x, y in Sigma*. mf(E) makes it true for x in Sigma+ and y in Sigma*. m(E) makes it true for x, y in Sigma+. Equivalently, we can say that any element i in I can appear initially, medially, or finally in imf(E), medially or finally (but not initially) in mf(E), and medially (but not initially or finally) in m(E).

The care we have to take with initial and final positions results from the SGML rules about determinism, but also helps keep the resulting expressions simpler than they'd be if we just slapped (I*) in everywhere in the content model.

Here is a first cut at defining the functions. In a number of circumstances, they are undefined; it might perhaps be useful, therefore, to define a simple normalization on (ampersand-free) content models, which would ensure that the functions are always defined.

If E is the empty set, then the content model in question cannot be satisfied; this would be the case if a DTD which lacked any element called <nonesuch> nevertheless included an element which required it as a subelement:

<!ELEMENT impossible - - (nonesuch) >
Given that we want L(E) is a subset of L(E') we must define imf etc. thus for this case:

An element may accept the empty string as its content in either of two ways. First, the element may be declared EMPTY: in this case, inclusions are not legal inside the element.

Second, the element's content model may accept the empty string, either because all subelements are optional or because the content model may be satisfied by #PCDATA: in this case, inclusions are legal within the element.

If E is an atomic symbol, e.g. a, then

If E has the form F?, and F is not nullable (does not accept the empty string), then

Note that we require F to be non-nullable in order to preserve determinism.

If E has the form F?, and F is nullable, then

In other words, if F is nullable, the ? is redundant and may be stripped without loss of information.

If E has the form F+, and F is not nullable, then

If E has the form F+, and F is nullable, then

If E has the form F*, and F is not nullable, then

If E has the form F*, and F is nullable, then

If E has the form (F,G), then

If E has the form (F|G), then

If E has the form (F&G), then

8.3 Examples

Let's do some simple examples, abstracted from the TEI.

8.3.1 Simple Examples

8.3.2 A Complex Example: back

The element <back> is defined thus:

<!ELEMENT %n.back;      - O
  ( (%m.front)*,
    ( ( (%m.divtop),
        (%m.divtop | %n.titlePage;)*
      )
    | ( (%n.div;),
        (%n.div; | (%m.front))*
      )
    | ( (%n.div1;),
        (%n.div1; | (%m.front))*
      )
    )?
  )     >

Removing the parameter entities and using single-letter identifiers, we can rewrite the content model this way to show its structure a little more clearly:

( (a | b | c)*,
  ( ( (d | e | f),
      (d | e | f | g)*
    )
  | ( (h),
      (h | (a | b | c))*
    )
  | ( (i),
      (i | (a | b | c))*
    )
  )?
)
Or more compactly:
( (a | b | c)*,
  ( ( (d | e | f), (d | e | f | g)* )
  | ( h, (h | a | b | c)* )
  | ( i, (i | a | b | c)* )
  )?
)
i.e. E has the form F,G where F=(a|b|c)* and G=(((d|e|f) ... (i|a|b|c)*))?. So imf(E) = imf(F), mf(G).

Now, F is simple: imf(a|b|c)* = (a | b | c | I)*

But mf(G) requires more work.

G = H? where H =

     ( ( (d | e | f), (d | e | f | g)* )
     | ( h, (h | a | b | c)* )
     | ( i, (i | a | b | c)* )
     )
So mf(G) = (m(H), I*)?

H in turn is an alternation of three sequences, each of the form (x, (y|z)*). This leads to a problem, because the final term in each sequence is nullable; we will have a determinism conflict with the trailing I*.

So we add a new definition of mf(E) where E = F?. mf(F?) = mf(F)?

Applied to G, we have: mf(G) = (mf(H))?, with H = (J | K | L).

So mf(H) = ((m(J) | m(K) | m(L)), I*)

But J, K, and L don't have m() forms, since their final term is nullable. So we use the alternate definition:

mf(H) = (mf(J) | mf(K) | mf(L))

We have the following:

So mf(H) =

        ( ( (d | e | f), I*, (d | e | f | g | I)*)
        | ( h, I*, (h | a | b | c | I)* )
        | ( i, I*, (i | a | b | c | I)* )
        )

Recall that mf(G) = (mf(H))?.

So mf(G) =

        ( ( (d | e | f), I*, (d | e | f | g | I)*)
        | ( h, I*, (h | a | b | c | I)* )
        | ( i, I*, (i | a | b | c | I)* )
        )?
and imf(E) = imf(F), mf(G) =
         ( (a | b | c | I)*,
           ( ( (d | e | f), I*, (d | e | f | g | I)*)
           | ( h, I*, (h | a | b | c | I)* )
           | ( i, I*, (i | a | b | c | I)* )
           )?
         )

Or, in content model terms (using the usual TEI conventions for names of element classes):

<!ELEMENT %n.back;      - O
  ( (%m.front; | %m.I;)*,
    ( ( (%m.divtop;),
        (%Istar;),
        (%m.divtop; | %n.titlePage; | %m.I;)*
      )
    | ( (%n.div;),
        (%Istar;),
        (%n.div; | %m.front; | %m.I;)*
      )
    | ( (%n.div1;),
        (%Istar;),
        (%n.div1; | %m.front; | %m.I;)*
      )
    )?
  )     >

I think we've got a system we can use manually, though I don't know for sure how to make it a program, given the problems we have defining some of the functions.

8.4 Removing inclusions in TEI P3

The following elements have inclusion exceptions in TEI P3 (as of September 1994):

The inclusions on <entry>, <entryFree>, and <eg> will be taken care of separately, in the section on the dictionary chapter.

The inclusions on <orgName> were dropped in October 1994 (though this change has not been propagated to any public version of the DTD), and so we will ignore them.

The inclusions on <text> must be propagated to all potential descendants of <text>.

The inclusions on <lem> and <rdg> must be propagated to all potential descendants; it might be possible to do without these, but it's probably not worth the effort.

Note that in the case of terminologyInclusions, the set of inclusions is not disjoint from the set of children named directly in content models.

Study of the full TEI DTD shows that the sets of possible descendants of <text>, <lem>, <rdg>, and <termEntry> are all identical. This is not surprising given that <text> is recursive.

The 263 elements in this set fall into the following groups:

Note that this list excludes most element types from the dictionary tag set, since they need special treatment anyway. (It does not exclude all of them, though, which puzzles me.)

Empty elements need no changes.

The other groups of elements do require changes to the DTD, which are described in the following sections.

8.4.1 The m.Incl element class

In order to simplify the process of adding inclusions to the content models of the DTD, we define a new class for use in content models, namely m.Incl. This consists of:

For now, we ignore the problems posed by the <termEntry> element. In the long run, they mean the terminology tag set is going to need to be rewritten. (Of course, it needs rewriting anyway, to align it with more recent ISO work.)

< 41 Element class m.Incl > =

<!ENTITY % x.Incl ''>
<![%TEI.textcrit;[
<!--* If text criticism tag set is selected, include m.fragmentary
    * in the class m.Incl.
    *-->
<!ENTITY % m.Incl '%x.Incl; %m.globincl; | %m.editIncl; 
    | %m.fragmentary; | %n.anchor;'                             >
]]>
<!--* Otherwise, don't.                                      *-->
<!ENTITY % m.Incl '%x.Incl; %m.globincl; | %m.editIncl;
    | %n.anchor;'                                               >


We have to reproduce the standard declarations for the inclusion classes:

< 42 Reproduce inclusion classes > =

<!ENTITY % x.metadata ''                                        >
<!ENTITY % m.metadata '%x.metadata; %n.alt; | %n.altGrp; | 
           %n.certainty; | %n.fLib; | %n.fs; | %n.fsLib; | 
           %n.fvLib; | %n.index; | %n.interp; | %n.interpGrp; | 
           %n.join; | %n.joinGrp; | %n.link; | %n.linkGrp; | 
           %n.respons; | %n.span; | %n.spanGrp; | %n.timeline;' >
<!ENTITY % x.refsys ''                                          >
<!ENTITY % m.refsys '%x.refsys; %n.cb; | %n.lb; | %n.milestone; 
           | %n.pb;'                                            >
<!ENTITY % x.globincl ''                                        >
<!ENTITY % m.globincl '%x.globincl; %m.metadata; | %m.refsys;'  >


8.4.2 Changing #PCDATA elements

Each element which now has a content model of #PCDATA should, for compatibility, be revised to have a content model of (#PCDATA | %m.Incl;)*.

In some cases, it might be preferable to leave the content model alone: it's not clear that it's really useful to allow index entries, feature structure libraries, and joins to occur within attribute names, generic identifiers, and the components of structured times and dates. Even within generic identifiers and so on, there might be line breaks, page breaks, or other milestones, but perhaps we should define at least some of these elements as (#PCDATA | %m.refsys;)*.

For now, for purposes of the experimental XML DTD, I propose to use the first form given.

First, we suppress all of these elements:

< 43 Suppress standard definitions of PCDATA elements > =

<!ENTITY % day             'IGNORE' >
<!ENTITY % hour            'IGNORE' >
<!ENTITY % minute          'IGNORE' >
<!ENTITY % month           'IGNORE' >
<!ENTITY % offset          'IGNORE' >
<!ENTITY % second          'IGNORE' >
<!ENTITY % week            'IGNORE' >
<!ENTITY % year            'IGNORE' >
<!ENTITY % idno            'IGNORE' >
<!ENTITY % postBox         'IGNORE' >
<!ENTITY % postCode        'IGNORE' >
<!ENTITY % str             'IGNORE' >


Then we supply the new declarations:

< 44 New definitions for PCDATA elements > =

<![%TEI.names.dates;[
<!ENTITY % XML.day "INCLUDE" >
<![%XML.day;[
<!ELEMENT %n.day;         - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.day;            %a.global;
                             %a.temporalExpr;
          TEIform            CDATA               'day'          >
]]>
<!ENTITY % XML.hour "INCLUDE" >
<![%XML.hour;[
<!ELEMENT %n.hour;        - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.hour;           %a.global;
                             %a.temporalExpr;
          TEIform            CDATA               'hour'         >
]]>
<!ENTITY % XML.minute "INCLUDE" >
<![%XML.minute;[
<!ELEMENT %n.minute;      - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.minute;         %a.global;
                             %a.temporalExpr;
          TEIform            CDATA               'minute'       >
]]>
<!ENTITY % XML.month "INCLUDE" >
<![%XML.month;[
<!ELEMENT %n.month;       - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.month;          %a.global;
                             %a.temporalExpr;
          TEIform            CDATA               'month'        >
]]>
<!ENTITY % XML.offset "INCLUDE" >
<![%XML.offset;[
<!ELEMENT %n.offset;      - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.offset;         %a.global;
          value              CDATA               #IMPLIED
                             %a.placePart;
          TEIform            CDATA               'offset'       >
]]>
<!ENTITY % XML.second "INCLUDE" >
<![%XML.second;[
<!ELEMENT %n.second;      - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.second;         %a.global;
                             %a.temporalExpr;
          TEIform            CDATA               'second'       >
]]>
<!ENTITY % XML.week "INCLUDE" >
<![%XML.week;[
<!ELEMENT %n.week;        - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.week;           %a.global;
                             %a.temporalExpr;
          TEIform            CDATA               'week'         >
]]>
<!ENTITY % XML.year "INCLUDE" >
<![%XML.year;[
<!ELEMENT %n.year;        - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.year;           %a.global;
                             %a.temporalExpr;
          TEIform            CDATA               'year'         >
]]>
]]>
<!ENTITY % XML.idno "INCLUDE" >
<![%XML.idno;[
<!ELEMENT %n.idno;        - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.idno;           %a.global;
          type               CDATA               #IMPLIED
          TEIform            CDATA               'idno'         >
]]>
<!ENTITY % XML.postBox "INCLUDE" >
<![%XML.postBox;[
<!ELEMENT %n.postBox;     - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.postBox;        %a.global;
          TEIform            CDATA               'postBox'      >
]]>
<!ENTITY % XML.postCode "INCLUDE" >
<![%XML.postCode;[
<!ELEMENT %n.postCode;    - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.postCode;       %a.global;
          TEIform            CDATA               'postCode'     >
]]>
<![%TEI.fs;[
<!ENTITY % XML.str "INCLUDE" >
<![%XML.str;[
<!ELEMENT %n.str;         - -  (#PCDATA | %m.Incl;)*  >
<!ATTLIST %n.str;            %a.global;
          rel                (eq | ne | sb | ns | lt | le | gt 
                             | ge)               eq
          TEIform            CDATA               'str'          >
]]>
]]>


8.4.3 Changing phrase.seq

The parameter entity phrase.seq should be redefined as follows:

< 45 New declaration for phrase and phrase.seq > =

<!ENTITY % phrase '#PCDATA | %m.phrase; | %m.Incl;'             >
<!ENTITY % phrase.seq '(%phrase;)*'                             >


(This supersedes the redefinition given earlier. Adding the inclusions to the class phrase (i.e. to the entity m.phrase) might enable some of the redefinitions already given above to stand unchanged, but for now, at least, I propose to keep the inclusions logically separate from the original element classes.) Note that the entity phrase is used only once, in the definition of <u>.

No changes to the actual content models are needed. (Ah, the joys of indirection.)

(Note, 14 May 1999.) No, wait, actually, that's not true. Many of these declarations read

<!ELEMENT %n.foo;       - O  (%phrase.seq;)                     >
which, expanded, would be
<!ELEMENT %n.foo;       - O  ((#PCDATA | %m.phrase; | %m.Incl;)*)>
which is illegal. The content models do need to be changed, to
<!ELEMENT %n.foo;       %phrase.seq;                            >
This is only required if we wish to allow the extensions file to work with the current (1994-09) production DTDs. Since those are what I currently have on this laptop, I do wish. But since we will shortly be releasing corrected versions, we want to make this part of the extensions file optional. We'll do so using a conditional inclusion on the parameter entity base9409, which by default will be defined IGNORE.

The same logic applies to paraContent and (for now) specialPara.

(Note, 30 May 1999.) No, no, wait. Doesn't carthage already normalize these correctly by omitting extra parentheses? I've already spent several hours making the scraps below, and now realize we may not need them after all. (17 June 1999.) I've removed them, since carthage actually does produce legal XML.

8.4.4 Changing component.seq

The entity component.seq must be redefined to allow inclusions between any two components. In the long run, the changes should be made directly within the various declarations which go into component.seq, but those declarations are among the most complicated of the entire TEI DTD, since there are variant versions for each of the two hundred or so possible combinations of base tag sets.

The quick and dirty approach most suitable for use in the experimental XML DTD is to include the Incl class as a subclass of common, thus:

< 46 New declaration for x.common > =

<!ENTITY % x.common '%m.Incl; |'>

If this proves to introduce ambiguity in the content model, we'll have to find a slower, cleaner way to do it.

Experiment shows that it does indeed introduce ambiguity in content models, notably those for <body> and text divisions. Rather than hack at those content models, I am going to take the longer and slower approach.

< 47 New declaration for component and component.seq > =

<!ENTITY % x.common ''                                          >
<!ENTITY % m.common '%x.common %m.bibl; | %m.chunk; | 
           %m.hqinter; | %m.lists; | %m.notes; | %n.stage;'     >
< Reproduce standard component declarations 48 >
<!-- The entity component.seq is always a starred sequence    -->
<!-- of component elements. Its definition does not vary      -->
<!-- with the base (unless we are using the general base, in  -->
<!-- which case it has already been de