Notes from TEI Meeting on Element Classes


This was a meeting of the subset of the TEI Council charged by the complete Council to examine and improve the TEI class system.

The meeting was held at the University Club at Oxford University. Thanks are extended to the members from Oxford (LB, JC, SR) for their efforts in organizing the meeting, and to SR for providing home-grown apples.

Meeting started with SB, LB, JC, SR, JW, CW at 09-26 09:25; LR arrived at 12:04; broke at ~17:00; continued 09-27 starting at 09:04; ended at 16:57. A subset (SB, JC, SR, CW) met the following morning (09-28) from 09:30 to 12:25 and worked on developing the guidelines for naming model classes.

Contents

Preliminaries

CW chaired the meeting; notes of decisions taken were recorded by SB, LB, and JC. This document is based upon SB's notes with some additions from JC's, revised by LB.

First, the agenda was reviewed, and printed copies of ED W 84 were handed out. It was quickly realized that the current ED W 84 was insufficient for our purposes, so SR updated it so that part 1 also included a column for the class membership of each element. This new version was placed on the TEI website, and projected for the group.

LB provided a quick overview of how classes work, defining attribute, model, and content classes as (respectively) elements which share a set of attributes, which occur in the same places in content models, and which share the same content model.

General Decisions.

We began by reviewing several general questions circulated before the meeting.
  1. We agreed to distinguish attribute classes from model classes. (I.e., no more ‘both’ classes, and the naming scheme should reflect the different purposes.)

    We discussed whether or not we wanted another kind of class in order to group elements that share a content model ("content classes"), or for syntactic sugar issues. The following example was used to help anchor this question:
    • If we had a model class X w/ members A and B.
    • A is a member of X, has content D*
    • thus there exists a content class DSTAR of which A is a member
    • DSTAR has members A and others
    • Now I want to change A so that it has E also: E|D*
    • have I removed A from DSTAR, or have I changed DSTAR?

    We also considered whether it was helpful to divide classes into semantic and structural classes, noting that all current semantic classes are required to be subclasses of structural ones -- a semantic class such as tei.edit cannot combine elements which are members of different structural classes (say tei.phrase and tei.chunk). We decided that, at least for now, we would consider only attribute and model classes, although recognizing the potential usefulness of the ‘content class’ concept.

  2. We affirmed that although different modules may extend membership of classes, the TEI class namespace is global. Consequently, all TEI classes are declared in the same module (at present, core, but possibly textstructure).

  3. We developed three criteria for an element to directly reference elements instead of only classes in its content model.
    • the content does not already appear in some other TEI element, and we deem it unlikely a customizer will want to use it in an added element; or
    • we deem it very unlikely that a customizer will want to add other child elements to the content model; or
    • the content model includes occurrence indicators, or sequence markers making it more complex than a simple starred alternation of elements (in which case, it should probably be expressed as a macro).

    Because attribute classes are global, it makes no sense for the same attribute to be declared by more than one attribute class, unless, we can be very sure that no particular element would want to be members of both classes.

  4. We agreed that greater use of classes was unlikely to result in significantly looser content models than those already in place. We also noted the desirability of making explicit constraints (beyond the content model) currently represented only by prose.

    A <constraint> element as a child of <elementSpec> might be needed to do this, rather than (as at present) including them in the general <remarks> element. It might also contain Schematron code.

We discussed putting <note> into tei.Incl but decided not to, based on argument that
  • it can be a pain to process
  • may cause content model problems (would not be able to have a <div> of only <note> s; would need to fix lots of content models)
  • the existing <note> element represents both the content of a note and its point of attachment. It might be good practice to separate the two functions: in this case, the point of attachment would be represented by an <anchor> , and <anchor> is already a member of tei.incl
It was noted that the last argument implies a need for specific recommendations on how the anchoring element and the note should be linked (e.g., what the value of type should be).

Participants were reminded that mixing leaf and non-leaf nodes in a single class is generally frowned upon.

We agreed to the rule that if S is a sub-class of C and element <E> is a member of S then it cannot also be a member of C.

Specific Decisions on Classes

In going through the element list, the following specific changes and recommendations were agreed upon.

  1. move <addrLine> into tei.addrPart, and change <address> content model accordingly: ( tei.Incl*, ( tei.addrPart, tei.Incl* )+ )
  2. add the elements referenced by macro.glossSeq into a new class tei.glossing; redefine macro.glossSeq to reference this class
  3. remove <analytic> , <monogr> , and <imprint> from tei.biblPart
  4. new class tei.resp contains <author> , <editor> , and <respStmt> , so that threesome can be factored out of various places (tei.biblPart, <analytic> , <monogr> ×3, <titleStmt> )
  5. new class, tei.biblPhrases containing those phrase level elements which should be available within <bibl> ( <title> , <date> , <dateRange> ): this new class should be a subclass of tei.biblPart. Content model of bibl should reference only tei.biblPart. (See email discussion of 1 oct 05)
  6. need a new class, tei.biblBoxes has members <bibl[Struct|Item|Full]> , many if not all references to <bibl> should be to new class. — but tei.bibl already does this.
  7. factor out repeating parts of the <fileDesc> and <biblFull> content model to a macro macro.fileDesc. I.e.,
    • macro.fileDesc = titleStmt, editionStmt?, extent?, publicationStmt, seriesStmt?, notesStmt?
    • fileDesc.content = macro.fileDesc, sourceDesc+
    • biblFull.content = macro.fileDesc, sourceDesc*
  8. change name of tei.refsys to tei.refSys.
  9. new class, tei.citable that currently contains only <quote> , and is referred to instead of q | quote in content of <cit> : cit.content = ( tei.citable | tei.bibl | tei.loc | tei.Incl )+
  10. change the content model of all various non-empty members of tei.edit except <choice> , to macro.paraContent. The list is <add> , <app> , <corr> , <damage> , <del> , <orig> , <reg> , <restore> , <sic> , <supplied> , and <unclear> .
  11. tei.data should reference tei.date, thus remove <date> , <dateRange> , and <dateStruct> from it; i.e. tei.data = ( abbr | address | tei.date | dimensions | eg | egXML | expan | geogName | lang | measure | name | num | orgName | persName | placeName | rs | time | timeRange | timeStruct ) (But note that more reductions should occur to this class later.)
  12. move <abbr> and <expan> from tei.data to tei.edit
  13. move <gap> into tei.intervention, removing its resp
  14. make macro.glossSeq (or tei.glossing) the content for all those elements that used to have desc (i.e., <certainty> , <event> , <gap> , <join> , <joinGrp> , <kinesic> , <relation> , <respons> , and <restore> )
  15. <headItem> and <headLabel> should not become a class; they may be removed during a reconsideration of the content model of <list>
  16. define new tei.imprint class for current children of <imprint> (pubPlace | publisher | date | biblScope ); factor them out of <imprint> , which becomes ( tei.imprint, tei.Incl* )+ (note that this model requires an element from tei.imprint, which the previous did not); remove <imprint> from tei.biblPart and make the content model of <bibl> reference tei.imprint
  17. Remove <indexTerm> element, replacing it by <term> within <index> ; add sortKey to <term> . (yes, this gives target and cRef to the <term> in an index) (no, we don't know where the <gloss> that the target points to goes).
  18. move <label> from tei.lists to tei.inter and tei.common.
  19. new class, tei.metricalCompents … we did not decide whether this class contains only <l> or both <l> and <lg> , or references to two other classes, one for potentially recursive (e.g. <lg> ) and one for non-recursive components (e.g. <l> ). This class is referenced as the content model of <lg> ; LR argued that the potential recursion should not be hidden.
  20. change content models of <list> and <listBibl> so that <head> is replaced by tei.divtop; also add tei.divbot at the end. E.g., ( tei.divtop | tei.Incl )*, ... ( tei.divbot | tei.Incl )*
  21. take <measure> out of tei.names, remember to change example in Guidelines (addendum — all examples with key or reg are in CO/co.odd or CO/measure.odd)
  22. new attribute class tei.measured to contain unit, quantity, commodity ((A different name is needed to avoid confusion with the existing tei.measured class defined for manuscript description))
  23. factor <persName> and <orgName> , out of tei.naming into tei.agent (which becomes a subclass of tei.naming), and change content of <respStmt> accordingly: ( ( tei.agent, resp ) | ( resp, tei.agent ) ), ( tei.agent | resp )*
  24. factor <p> elements out passim in favour of reference to tei.paragraph, always putting tei.paragraph first in the content model
  25. Class tei.citable needs to be a subclass of msItemPart (and possibly other elements currently directly referencing this class need similar treatment, e.g. making tei.bibl )
  26. make <choice> content model ( seg | choice | tei.choosable )*, removing <choice> and <seg> from tei.chooseable, thus allowing it to be a pure subclass of tei.edit
  27. new class tei.spComponent used only in <sp> , only current member is <stage> ; thus content of <sp> becomes ... (( tei.paragraph | tei.metricalComponent | lg | tei.segment | tei.spComponent ), tei.Incl* )+
  28. move <ab> out of tei.segment into tei.paragraph
  29. add <time> , <timeRange> (if kept), and <timeStruct> to tei.date (perhaps rename to tei.temporal)
  30. <scriptStmt> , <sourceDesc> , <taxonomy> , and <broadcast> need to refer to tei.bibl instead of directly to its elements
  31. rename tei.front to tei.extraBodyDiv and tei.fmchunk to tei.extraBodyChunk, but with something other than ‘extraBody’ to indicate that this is stuff which can appear either at the front or the back. The content of <front> and <back> should be the same: boiling down to something akin to ( tei.extraBodyChunk | tei.chunk )*, ( tei.extraBodyDiv | div-stuff-here )+
  32. transform the repetitive parts of <body> content model into macros (and consider ways of abolishing numbered divs)
  33. define new class tei.bookends with members byline | dateline | epigraph | salute | signed as content of <closer> and <opener> ; factor these out of divtop and divbot
  34. put all the individual children of <publicationStmt> into new class tei.publicationComponents
  35. new content model defined for <change> : move <date> to a required attribute, change <item> to macro.specialPara, change <respStmt> to required who, which points to a <person> or <personGrp> , [schematron rule ... pending work on prosopography]
  36. define new class tei.recording with members respStmt | equipment | broadcast | date . Then change content model of <recording> to use it: p+ | tei.recording+
  37. define new macro, macro.rendition which contains macro.paraContent and becomes the sole token in the content of <rendition> ; this makes it easier to modify content of <rendition> eg to include an XML vocabulary
  38. try changing <sourceDesc> to have content tei.paragraph+ | ( tei.bibl | tei.sourceDesc )+
  39. remove tei.teiHeader and tei.teiText classes
  40. replace references to <note> with references to tei.notes in classes and content models
  41. <setting> should reference new tei.temporal class

Specific Decisions in Related Areas

In going through each element in the core, textstructure, and header modules looking for improvements in the class system, we came across quite a few other areas where changes are needed.

  • remove <p> from content of <langUsage>
  • change content of <foreign> to macro.phraseSeq instead of macro.paraContent.
  • change model of <teiCorpus> to teiHeader, ( TEI | teiCorpus )+
  • remember to fix tagdoc of <binaryObject> : attributes need proper declarations
  • add mimeType attribute to <rendition>
  • consider making <encodingDesc> and <tagsDecl> require at least one child element
  • remove tei.Incl from <monogr> , <biblFull> , and all of <teiHeader>
  • suggested changing name of tei.tpParts to tei.model.titlePageComponents
  • default position is to change text to tei.text wherever it occurs. Only exception noticed so far is <binaryObject> but there may be others.

Action 1: LR investigate merging <label> and <lbl> .
Action 2: SR fix stylesheet so that namespace isn't lost from xml:id and xml:lang in examples
Action 3: eds. & SR sort out getting <g> into all references to text; i.e., make new class tei.text with only one member, <g> , new macro macro.text which is ( text | tei.text)*.

On Naming

We had time to discuss the major structural names, listed below.

datatypes
tei.data.*
attribute class
tei.att.*
model class
tei.model.*
macro
tei.macro.*
module
tei..*

We agreed that both the actual name and the corresponding filename should be in camelCase. We did not resolve whether or not the ‘tei.’ prefix would be included in filenames or not.

Several members (SB, JC, SR, and CW) met at OUCS 2005-09-28 from ~09:30 to 11:57 and continued the naming discussion. The results of this meeting have been incorporated into ED W 87, thanks to SR.

Further Work

We agreed that work was needed in the following areas, but could not be taken up at the current meeting.

  • tei.sourceDesc needs a new name
  • whether or not the various <docXX> elements are needed, and if so should they still be in divtop and fmchunk
  • Need to manage two things in ND: the murky mess of names, and prosopographic data. Could be a complete WG or a quick hack
  • fix the content model of <list> , possibly dividing into two: <list> and a separate <glossList>
  • tei.chunk needs to be broken up into subclasses