How to edit the TEI Guidelines
- Organization of the Guidelines chapters
- Organization of the specifications
- Processing Instructions
- Style Notes
- Making a change to the Guidelines
- Adding Schematron constraints to specifications
- Building the release
- Reference section
This document is intended to set out the way things are currently managed in the editing of the TEI Guidelines. General notes on the rationale for this state -- why it is the way it is -- may be added here later. The intention is to provide information for Council members wishing to contribute actively to the continued development and maintenance of the text of the Guidelines.
Organization of the Guidelines chapters
- it begins with a paragraph explaining what the module is for, and containing a lot of links to the individual subsections it contains;
- each subsection introduces a (small) group of elements, usually beginning with a <specList> , which will typically list all the elements discussed within a given section, listed in whatever order makes sense for the context, but not necessary all possible child elements for a given element;
- each element is then introduced in turn, usually including an appropriate usage example (on examples, see further Examples);
- a <specGrp> for each group of elements defined may be given at the end of each section;
- a <specGrp> for the whole module is given at the end of the chapter: it includes the other specifications either directly (by means XIncludes) or indirectly (by means of a <specGrpRef> pointing to a preceding <specGrp> ).
The only chapters not organised in this way are those which do not introduce or define particular modules.
Organization of the specifications
Each element, class, and macro defined in the Guidelines is declared within its own XML file, containing an <elementSpec> , <classSpec> , or <macroSpec> as appropriate. These files are in the directory Source/Specs. For example, the file Source/Specs/abbr.xml contains the element spec for the <abbr> element. Note that the elements for the major components of the spec each have a @versionDate. If editing the content of the element, you must remember to update the value of this attribute in order to allow for detection of translations (stored in the same file) that need updating. As a general rule, don't update a translation for any language of which you are not a native speaker. If you feel confident enough to adjust the translation, leave the @versionDate attribute unchanged on the translation in order to ensure the translation will be reviewed eventually.
Each chapter of the Guidelines is stored in a file called Source/Guidelines/xx/YY-name.xml where xx is the language (currently only en or fr), YY is the two letter identifier for each chapter (see Chapter codes) and name is the name of the module being defined by that chapter.
The file Source/guidelines-xx.xml (where xx is either en or fr) is the ‘driver file’ for the whole shebang. It contains XInclude elements for each of the chapters making up the Guidelines.
- Write a new file saintName.xml containing an <elementSpec> for your new element and add it to the Specs folder. Look at other specifications to see which ODD elements to use. Note that we do not use <valDesc> at this time, instead using only a <datatype> .
Edit the source of the relevant chapter (presumably
ND-namesdates.xml in this example) to include a documentation of the element. Use a
to reference the description from your new spec within the body of
the text, like this:
<p xmlns:sch="http://purl.oclc.org/dsdl/schematron">This module also defines the following canonical element: <specList> <specDesc key="saintName"/> </specList> </p>and follow up with some discussion of usage.
Also in the relevant chapter, make sure you include an XInclude instruction to bring the specification
file for your element into the Guidelines source. Normally you would add it to an existing
element somewhere in the chapter source which already contains similar links:
<specGrp xml:id="DNDPER" n="Personal and organizational names" xmlns:sch="http://purl.oclc.org/dsdl/schematron"> <include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/orgName.xml"/> <include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/saintName.xml"/> [...] </specGrp>
- <?insert tab-content-models?>
- Inserts a generated table of content models into the ST chapter (not likely to be used elsewhere)
- <?insert totalElements?>
- Inserts a count of the total number of elements defined in the current version
- <?insert version?>
- Inserts the current version number of the TEI release
See the Style Guide for Editing the TEI Guidelines, which attempts to state preferred practice on vexed issues issues about spelling, punctuation, etc. The goal of these rules is to avoid inconsistency, and also (wherever possible) to avoid producing text which is markedly either British or American English.
The purpose of an example is to illustrate a specific element or feature. Do not include irrelevant encoding which does not contribute to this primary goal. If such encoding is unavoidable (eg to make your example valid), then it must be explained in the supporting text.
- in a <note> element, with a @place whose value is bottom or foot, following the <egXML> (if the citation is only a URL in a <ptr> )
- in a full bibliographic citation (with or without a URL in <ptr> ) in the file BIB-Bibliography.xml
All examples should be valid against a modified TEI schema in which any element can act as a root element; this validity is checked during the build process.
When you have added or edited examples, always check in the following Jenkins build that they are displaying correctly. In particular, note that if your examples include long lines without linebreaks, the result can be a horizontally-scrolling page and a broken table layout in the reference page. If you see this in the output, you can fix it by adding hard-coded linebreaks at suitable points in the example code.
Good encoding practice
Good encoding practice will ensure not only valid but also highly functional Guidelines.
When referencing figures and to other sections of the Guidelines, use <ptr> , not <ref> , to ensure that the title and number of the referenced item is automatically inserted when the Guidelines are compiled.
The build process validates cross-references. Since the Guidelines is compiled into a single XML document at build time, IDs must be unique across the text and the examples. Consequently, any @xml:id attribute values appearing in your examples must be unique within the text of the whole of the Guidelines. Furthermore, any @target (etc.) values which do not point to anything in the source will be flagged with a warning during the build process.
Making a change to the Guidelines
Most changes to the Guidelines are the result of a bug or feature request ticket on GitHub. Once a ticket is assigned to you, and you're sure there is agreement to proceed with the change, follow the steps below.
- If you don't have a local clone of the git repo, go to the GitHub site and clone the repository following the instructions there. It's usually something like git clone https://github.com/TEIC/TEI.git TEI
- If you have already checked out a copy, make sure to update it (git pull) before you make any changes in it.
- Make sure you are on the dev branch (we do not make direct changes to the master branch): git checkout dev
- Edit the appropriate file(s) to make your changes, or if the change requires it, create a new file. Make sure your source is still valid. The TEI source files should all contain xml-model processing instructions linking them to the latest version of the NVDL file used for validation of the P5 source.
- If you have a locally installed P5 build environment, make sure you can still build, and that the examples are still valid. If you don't, just use git to check your updated version back in and wait for our two Jenkins Continuous Integration servers ( http://tei.oucs.ox.ac.uk/jenkins/ and http://teijenkins.hcmc.uvic.ca/ ) to assess your work.
Check in your changes:
- If the file is new, add it to the git repository: git add filename.xml
- Commit your changes: git commit -m "Your commit message" filename.xml Be sure to add a detailed commit message which includes a link to the git issue ticket which prompted the change.
- Push the change to the git repository: git push origin dev
- Make a note of the revision hash when git receives your change.
- Assuming the change was successful (see below), add a comment to the GitHub issue ticket which includes the hash of the revision. You may also close the ticket if it is complete.
Error messages may appear at any stage. Please do not leave the source in an invalid state (it makes life unnecessarily difficult for others). If you cannot immediately fix a validity error, revert your change while you think about it. If you are working on a particularly complex change involving multiple files, it may be better to create a branch in which to work on your changes, and then to merge the branch when you are confident that the work is successfully completed. See any guide to git for instructions on how to approach that. Bear in mind that our Jenkins servers only build the master branch, so you cannot depend on them to test-build your working branch.
The Jenkins servers monitor the Git repository, and when they detect a change, they check it out and commence building several targets, just as you would build them on your local machine. There are a couple of advantages to letting the Jenkins servers check your build for you:
- You don't have to have all the various packages and other software required for a build installed on your system. This means you can make a quick fix to the Guidelines on any computer you happen to be using, without installing a lot of extra software.
- The required packages on the Jenkins servers tend to be updated regularly, and we're watching them to make sure they work properly.
- Jenkins attempts to let you know by email if there's a problem, and provides useful debugging tools.
If you submit a change, and later get an email from one of the Jenkins servers telling you that the build failed, it will provide a link to the build information on the server. Here's what to do:
- First, check that the build is broken on both Jenkins servers. If it's only broken on one of them, it may have been caused by a lag in updates to packages on that server.
- If both servers have completed a build since your commit, and both are showing an error, then you need to check where the error is occurring. On the page for that build on the Jenkins server site, click on ‘Parsed Console Output’ on the left menu. You'll see links to ‘Errors’ and ‘Warnings’; these will show you the exact point in the build script where the errors or warnings occurred. This may give you a useful clue to the cause of the failure.
- If you still can't figure out the problem, email the Council list with a link to the build information, and someone will be able to help.
- Once you know what the problem is, fix it by editing the source again and committing the change to the git repository. Jenkins will then do its stuff, and you'll know whether your fix worked as expected.
Error messages appearing during the make test phase (the ‘TEIP5-Test’ job on Jenkins) usually indicate that your changes are in conflict with the Birnbaum Doctrine, which decrees that changes in the Guideline schemas should not invalidate existing documents. You may wish to discuss the specific issue with other Council members.
If you use an image in your Guidelines change, you will need to add it to the git repository in the P5/Images directory. If you have asked for and received permission from a rights-holder to use the image, include all of the relevant correspondence in a zip file named the same as the image file (so for an image fred.png, include permission documents in a file called fred.zip).
Adding Schematron constraints to specifications
The TEI ODD system is primarily concerned with generating schemas in the form of RelaxNG or XML Schema. However, there are often circumstances in which you want to apply constraints to elements and attributes which cannot easily be captured by normal XML schemas. For instance, you might want to apply a co-occurrence constraint on some attributes. The @targetLang attribute is a good example. @targetLang is an optional attribute which “specifies the language of the content to be found at the destination referenced by @target, using a ‘language tag’ generated according to BCP 47.” Obviously, there is no point in using @targetLang if you're not also using @target. However, many such co-occurrence constraints are difficult to express in RelaxNG schemas, and may not survive conversion to other schema formats such as XML Schema or DTD.
For this reason, we often use ISO Schematron to express constraints like this. If you look in att.pointing.xml, where the @targetLang attribute is defined, you'll find this constraint, inside the <attDef> for @targetLang:
This Schematron rule is an assertion that if @targetLang is used, @target should also be present. <constraintSpec> has an attribute @scheme (normally set to isoschematron). Inside <constraintSpec> , <constraints> s have <assert> elements, which have @test attributes. The @test attribute value is always an XPath expression; if the XPath tests false, the assertion will be triggered, and its contents will appear on the console when you build or validate. There is also a <report> element which is similar, but which triggers when true rather than when false, so you can check both positive and negative conditions. In Roma, you can also generate a Schematron schema which you can also use to test your document against. This document is essentially a compilation in Schematron of all the TEI constraints. The operation of checking a document with Schematron is independent of any other validation processes that take place using other schemas. For a full introduction to Schematron, see the Schematron website.
<constraintSpec> can appear as a child of <attDef> , <classSpec> , <elementSpec> , <macroSpec> , and <schemaSpec> . We'll go through the process of adding a constraint like the one above. The constraint we're going to add relates to dating elements ( <date> , <birth> etc.) and the @calendar attribute. @calendar ‘indicates the system or calendar to which the date represented by the content of this element belongs.’ In other words, @calendar should only be used if the dating element has textual content. This makes sense (assuming that @calendar points at a valid <calendar> element):
- Ensure that the context item for the rule is an element, not an attribute, as in the example above. For technical reasons, the Schematron processing in the build process will generate error messages if the context item is an attribute.
- Break up the message up into fairly short lines, so that it's easy to read when it appears in the build log.
- Use the <name/> element in place of the name of the context element. This will be helpful for future processing needs.
- Refer to attributes using the "@" prefix, rather than quotation marks or plain names (as above).
The Schematron constraint above should cover what we're trying to accomplish. However, it's quite difficult for us to test whether it is in fact doing exactly what it should be, unless we build a new copy of Roma and use it to generate a Schematron schema, then validate a test document against it. This is probably not practical for most of us. Fortunately, the TEI build system provides a way for us to do this; in fact, we can put in place a couple of tests that will always be run whenever P5 is built, checking that our Schematron constraint is intact and functioning as we expect.
The first thing we're going to do is add a couple of tests that should pass. We'll add a dating element which has both @calendar and some textual content, as well as an empty dating element with no textual content. If these tests pass, then we know that our constraint is not doing anything wrong. (We don't yet know whether it's doing anything at all, of course; that comes later.)
If you look at trunk/P5/Test, you'll see there is a whole folder full of files whose purpose is to test various aspects of the TEI build process and products. We want to add our tests to one of these files. The question is which one? We'll add it to the basic test file, which is testbasic.xml; this is tested against schemas generated from testbasic.odd, which should contain all the dating features we're interested in testing. If we look at that file, we find there are already several date elements in there, so we can try adding our @calendar attribute to one of those. Let's choose the date of 1685 on a dictionary entry sense:
We also want to add, somewhere, a date element which has no textual content and no @calendar attribute. We might as well do this in the header, by adding a simple <revisionDesc> element, which gives us the added bonus of being able to describe our change:
- Schemas are built from detest.odd (including a Schematron schema).
- The file detest.xml is validated against those schemas.
- Resulting error messages are collected in a file called detest.log (in the Test directory).
- That file is compared with the detest.log file in the expected-results subdirectory.
- If they are not identical, the test build fails.
- Add our new test to detest.xml.
- Commit the change to the git repository.
- Let Jenkins run the build (which should fail).
- Examine the resulting detest.log on Jenkins, and copy it to our local expected-results/detest.log.
- Commit that change to the repository.
- Let Jenkins build again, and make sure that the build completes successfully.
- Download the detest.log file from the TEIP5-Test workspace on the Jenkins server (job/TEIP5-Test/ws/Test/).
- Copy its contents into our local file expected-results/detest.log.
- Commit this change to git (git commit followed by git push).
- Watch Jenkins build P5-Test again, and make sure it completes successfully.
Building the release
Note: the original content of this section has been removed, because a longer document dedicated to documenting the release process has been created. Please refer to TCW22: Building a TEI Release.
Following a lengthy debate in the Council as to whether the two-character codes originally used to identify individual chapters should be dropped in favour of longer more human-readable names, a compromise solution was reached in which the two character codes were retained as prefixes to longer human-readable names. The same two-character codes are also used to identify the HTML and PDF files generated during the release process.
|[i]||Releases of the TEI Guidelines||TitlePageVerso.xml|
|[iii]||Preface and Acknowledgments||FM1-IntroductoryNote.xml|
|[iv]||About These Guidelines||AB-About.xml|
|[v]||A Gentle Introduction to XML||SG-GentleIntroduction.xml|
|[vi]||Languages and Character Sets||CH-LanguagesCharacterSets.xml|
|||The TEI Infrastructure||ST-Infrastructure.xml|
|||The TEI Header||HD-Header.xml|
|||Elements Available in All TEI Documents||CO-CoreElements.xml|
|||Default Text Structure||DS-DefaultTextStructure.xml|
|||Representation of Non-standard Characters and Glyphs||WD-NonStandardCharacters.xml|
|||Transcriptions of Speech||TS-TranscriptionsofSpeech.xml|
|||Representation of Primary Sources||PH-PrimarySources.xml|
|||Names, Dates, People, and Places||ND-NamesDates.xml|
|||Tables, Formulæ, and Graphics||FT-TablesFormulaeGraphics.xml|
|||Linking, Segmentation, and Alignment||SA-LinkingSegmentationAlignment.xml|
|||Simple Analytic Mechanisms||AI-AnalyticMechanisms.xml|
|||Graphs, Networks, and Trees||GD-GraphsNetworksTrees.xml|
|||Certainty, Precision, and Responsibility||CE-CertaintyResponsibility.xml|
|||Using the TEI||USE.xml|
|[A5]||Datatypes and Other Macros||REF-MACROS.xml|
In most chapters, the two character code is also used as a prefix for the @xml:id values given to each <div> element. Note that every <div> element carries an @xml:id value, whether or not it is actually referenced explicitly elewhere in the Guidelines.
Note that files with names beginning REF contain only <divGen> elements: their content, which provides the reference documentation (sections A1 to A5 inclusive), is automatically generated during the build process.
TEI naming conventions have evolved over time, but remain fairly consistent.
- generic identifiers
- An element and attribute identifiers should be a single natural language word in lowercase if possible. If more than one word is conjoined to form a name, then the first letter of the second and any subsequent word should be uppercased. Hyphens, underscores, dots etc are not used within element or attribute names.
- class names
- Class names are made up three parts: a name, constructed like an element name, with a prefix and optionally a suffix. The prefix is one of model. or att. and indicates whether this is a model or an attribute class. The suffix, if present, is used to indicate subclassing: for example att.linking.foo is the foo subclass of the attribute class att.linking
- xml:id values
- The conventions for these vary somewhat. Most of the older chapters of the
guidelines have consistently constructed identifiers, derived from the individual
section headings. Identifiers must be provided for:
- every <div> , whether or not it is explicitly linked to elsewhere
- every bibliographic reference in the BIB-Bibliography.xml file
File release structure
Appendix A: Some other (mostly superceded) documents on the topic
- TEI ED W9 Points of Style For Drafts of TEI Guidelines 2 Mar 1990 in Waterloo Script format
- Notes on House Style TEI ED W11 14 Sep 1992 in Waterloo script formatted text
- TEI ED W55 Form for Draft Chapters of the TEI Guidelines 5 june 1996 in TEI P2 format in HTML format in ODD format
- TEI ED W57 Procedures for Correcting Errors in the TEI Guidelines July 23, 1994 in TEI P2 format in HTML format