A TEI-conformant SGML instance is typically characterized by the presence of a mandatory TEI Header which is often referred to as the electronic title page. The TEI Header meets the demands of the scholarly community for an in-file documentation of text-specific metadata. The record of this information is essential for any satisfactory interchange of texts coming from multiple sources, or for which long term uses are envisaged, and can serve multiple goals:
The TEI Header is introduced by the element <teiHeader> and has four major parts, only the first of which is mandatory:
The full form of a TEI Header is thus:
<teiHeader> <fileDesc> ... </fileDesc> <encodingDesc> ... </encodingDesc> <profileDesc> ... </profileDesc> <revisionDesc> ... </revisionDesc> </teiHeader>while a minimal header takes the form:
<teiHeader> <fileDesc> ... </fileDesc> </teiHeader>
The strenght of this system of one mandatory and three optional elements is that it caters for a wide range of applications in the Humanities. According to the needs and the specific features of the project for which texts are encoded, the encoder can more or less freely decide which set of elements (s)he will use for the in-file documentation of the encoded text. But when applying this freedom of user defined metadata to an enterprise such as ours (i.e. agreeing on a communal set of minimal tagging in order for our texts to be maximally interchangeable and (re)usable) this may turn out to be a poisened gift. Therefore, we should agree on a solid header structure which is a trade-off between completeness and density.
In trying to agree on a minimal header structure it is important to emphasize that this is not a limiting proposal. Bringing the proposal to practice only assumes that the complete structure of the proposal is present in the proposal-conformant electronic text, be it as a basis for more. It is for instant absurd to read this proposal as a prohibition to use <editionStmt> with its typical elements inside <titleStmt> when creating a new edition of an electronic text, e.g. by adding referential links to external datasets.
The following is a list of the possible contents of such a minimal header, and is as a working document under discussion. (Originally deviced as BEB-TEC2 draft, and adopted as CTB-TEC1 draft). The elements are presented in the order in which they are presented in chapter 5 of TEI P3 "The TEI Header" and chapter 20 of TEI U5 "The Electronic Title Page". The proposed structure is further explained in 3. Documentation.
In this chapter, all the elements of the above structure are documented with their respective and proposed attributes. Extensive use has been made of the aforementioned documents TEI P3, and TEI U5.
All elements in the proposed header structure (as all elements in the TEI Lite DTD) have the following global attributes:
The use of these global attributes is recommended when applicable but the attributes as such do not belong to the proposed header structure.
Each header starts with the <teiHeader> tag which carries two attributes:
The header ends with the </teiHeader> tag.
<teiHeader creator="Edward Vanhoutte" date.created="2000-01-10"> <!-- header --> </teiHeader>
The File Description is a mandatory element in TEI (Lite) and thus in the proposed header structure. It provides full bibliographic information on the electronic file. The File Description element has been closely modelled on exisiting standards in library cataloguing and should provide enough information to enable referencing and cataloguing.
The File Description starts with the <fileDesc> tag and in the proposed structure it consists of three parts: the title statement, the publication statement, and the source description. The File Description ends with the </fileDesc> tag.
<fileDesc> <titleStmt> ... </titleStmt> <publicationStmt> ... </publicationStmt> <sourceDesc> ... </sourceDesc> </fileDesc>
The Title Statement <titleStmt> contains the title given to the electronic work together with information on the parties responsible for the contents of the electronic text:
<titleStmt> <title>It's all in the Head(er): From minimal to optimal use of the TEI Header.</title> <author>Edward Vanhoutte</author> <principal>Edward Vanhoutte</principal> <funder> <name>Centrum voor Teksteditie en Bronnenstudie - CTB</name> <address> <addrline>Koningstraat 18</addrline> <addrline>b-9000 Gent</addrline> <addrline>(BelgiŽ)</addrline> <addrline>tel: +32 (0)9 265 93 50 x334</addrline> <addrline>fax: +32 (0)3 265 93 49</addrline> <addrline>email: firstname.lastname@example.org</addrline> </address> </funder> </titleStmt> <titleStmt> <title>De gedichten I: a machine readable transcription</title> <author>Herman de Coninck</author> <respStmt> <resp>compiled by</resp> <name>Hugo Brems</name> </respStmt> <principal>Edward Vanhoutte</principal> </titleStmt>
The Publication Statement <publicationStmt> groups information concerning the publication or distribution of an electronic text. At least one of the first three elements (<publisher>, <distributor>, <authority>) must be present, followed by (one or more of) the other elements as given in the proposed structure:
<publicationStmt> <publisher>KANTL.</publisher> <pubPlace>Gent</pubPlace> <date>2000</date> <distributor>Amsterdam University Press. Amsterdam, 2000</distributor> <idno type="ISBN">90-5356-441-1</idno> <availability status="RESTRICTED"> <p>© Copyright 2000, Edward Vanhoutte</p> <p>Niets uit deze uitgave mag door middel van elektronische of andere middelen, met inbegrip van automatische informatiesystemen, worden gereproduceerd en/of openbaar gemaakt zonder schriftelijke toestemming van de uitgever, uitgezonderd korte fragmenten, die uitsluitend voor recensies en onderwijs mogen worden geciteerd.</p> </availability> </publicationStmt> <publicationStmt> <authority>CTB</authority> <pubPlace>Gent</pubPlace> <date>2001</date> <idno type="internal">CTB-TEC1</idno> <availability status="RESTRICTED"> <p>revised draft version for discussion purposes only</p> </availability> </publicationStmt>
The Source Description <sourceDesc> records bibliographic details of the source(s) from which computer files are derived or generated. This may be a printed text or a manuscript, another computer file, an audio or video recording of some kind or a combination of these. The bibliographic details are documented inside a <bibl> element:
An electronic file may also have no source, when it is created as an original electronic text. This is signalled inside a <p> element using the formula "No source: created in machine readable form."
If a machine readable text is based not on a printed source but upon another machine-readable text which include a TEI-header, the header information of the latter will have to be incorporated in the header information of the former. TEI P3 chapter 5.2.8 "Computer Files Derived from Other Computer Files" explains how to do that.
<sourceDesc> <bibl>Stijn Streuvels, De teleurgang van den Waterhoek, Brugge, Excelsior, s.d. (1927). & Amsterdam, L.J. Veen, s.d. (1927). Eerste druk.</bibl> </sourceDesc> <sourceDesc> <p>No source: created in machine readable form.</p> </sourceDesc>
The Encoding Description is the second mandatory major division in the proposed header structure. Though not formally required by the TEI Guidelines, its use is made mandatory in this proposal, because it specifies the methods and editorial principles which governed the transcription or encoding of the text in hand, and thus it provides an answer to the question about the intellectual integrity of the encoded text.
The Encoding Description starts with the <encodingDesc> tag and in the proposed structure it consists of three parts: the project description, the editorial practices declaration, and the tagging declaration. The File Description ends with the </encodingDesc> tag.
<encodingDesc> <projectDesc> ... </projectDesc> <editorialDecl> ... </editorialDecl> <tagsDecl> ... </tagsDecl> </fileDesc>
The Project Description <projectDesc> gives a detailed prose description of the aim or purpose for which an electronic file was encoded inside a <p> element
<projectDesc> <p>This SGML instance was created for the Electronic Streuvels Project (ESP): Stijn Streuvels, De teleurgang van den Waterhoek. Elektronisch-kritische editie door Marcel De Smedt en Edward Vanhoutte. Amsterdam: AUP/KANTL, 2000. ISBN: 90-5356-441-1.</p> </projectDesc>
<editorialDecl> <p>All editorial principles are explained in the chapter <ref target="constitutie">Tekstconstitutie</ref> <list> <item>correction: the text is thoroughly collated against the original and proofread several times. Mistakes in the source text are corrected inside a <CORR> tag and the original reading is documented inside a SIC attribute. The editor responsible for the correction is named in a RESP attribute with "MD" for Marcel De Smedt and "EV" for Edward Vanhoutte.</item> <item>normalization: no normalizations are carried out.</item> <item>quotation: all quotation marks are transcribed and standardized as data in the text. The source text uses low opening marks and high closing marks.</item> <item>hyphenation: end-of-line hyphenation has been removed and is documented in the chapter <ref target="constitutie"> end-of-line hyphenation</ref>. All other hyphenation has been retained.</item> <item>interpretation: no analytical or interpretive encoding added.</item> </list> </p> </editorialDecl>
The Tagging Declaration <tagsDecl>is used to record the following information about the tagging used within a particular text:
The <tagsDecl> element consists of a sequence of <tagUsage> elements, one for each distinct element occurring within the outermost <text> element of a TEI document:
<tagsDecl> <tagUsage gi="add" occurs="1">Addition</tagUsage> <tagUsage gi="body" occurs="1">Body of a text, excluding front or back matter</tagUsage> <tagUsage gi="corr" occurs="1">Corrected form</tagUsage> <tagUsage gi="del" occurs="1">Deletion</tagUsage> <tagUsage gi="div" occurs="75">Subdivision</tagUsage> <tagUsage gi="eg" occurs="73">Example</tagUsage> <tagUsage gi="emph" occurs="73">Used to mark Examples</tagUsage> <tagUsage gi="figure" occurs="29">Figure of any kind</tagUsage> <tagUsage gi="gi" occurs="146">Generic identifier</tagUsage> <tagUsage gi="head" occurs="65">Heading of subdivision or list</tagUsage> <tagUsage gi="hi" occurs="342">Used only to mark text italicized in the source text</tagUsage> <tagUsage gi="item" occurs="349">Component of a list</tagUsage> <tagUsage gi="lb" occurs="2">Linebreak</tagUsage> <tagUsage gi="list" occurs="70">Sequence of items</tagUsage> <tagUsage gi="note" occurs="26">Annotation</tagUsage> <tagUsage gi="p" occurs="199">Paragraph</tagUsage> <tagUsage gi="ref" occurs="96">Reference to another location</tagUsage> <tagUsage gi="seg" occurs="1">Nestable segment of any kind</tagUsage> <tagUsage gi="text" occurs="1">Individual text, unitary or composite</tagUsage> <tagUsage gi="xref" occurs="1">External reference</tagUsage> </tagsDecl>
The Revision Description is the third and last mandatory major division in the proposed header structure. Though not formally required by the TEI Guidelines, its use is made mandatory in this proposal because it summarizes the revision history for a file. No change should be made in any TEI-conformant file without corresponding entries being added in a change log. The Revision Description provides the means to do so. It provides essential information for the administration of large numbers of files which are being updated, corrected, or otherwise modified. It proves to be extremely useful for files which are being exchanged between researchers or systems. Without change logs, as provided by this element, it is easy to confuse different versions of a file, or to remain unaware of small but important changes made in the file by some earlier link in the chain of distribution.
The Revision Description starts with the <revisionDesc> tag and ends with the </revisionDesc> tag. It contains one or more <change> elements:
Each <change> element has the following child elements:
<revisionDesc> <change> <date>2000-01-11</date> <respStmt> <resp>markup</resp> <name>Edward Vanhoutte (EV)</name> </respStmt> <item>added chapters 3.1-3.4</item> </change> <change> <date>2000-01-10</date> <respStmt> <resp>markup</resp> <name>Edward Vanhoutte (EV)</name> </respStmt> <item>creation file</item> </change> </revisionDesc>