2 The TEI Header
Table of contents
This chapter addresses the problems of describing an encoded work so that the text itself, its source, its encoding, and its revisions are all thoroughly documented. Such documentation is equally necessary for scholars using the texts, for software processing them, and for cataloguers in libraries and archives. Together these descriptions and declarations provide an electronic analogue to the title page attached to a printed work. They also constitute an equivalent for the content of the code books or introductory manuals customarily accompanying electronic data sets.
- a file description, tagged fileDesc, containing a full bibliographical description of the computer file itself, from which a user of the text could derive a proper bibliographic citation, or which a librarian or archivist could use in creating a catalogue entry recording its presence within a library or archive. The term computer file here is to be understood as referring to the whole entity or document described by the header, even when this is stored in several distinct operating system files. The file description also includes information about the source or sources from which the electronic document was derived. The TEI elements used to encode the file description are described in section 2.2 The File Description below.
- an encoding description, tagged encodingDesc, which describes the relationship between an electronic text and its source or sources. It allows for detailed description of whether (or how) the text was normalized during transcription, how the encoder resolved ambiguities in the source, what levels of encoding or analysis were applied, and similar matters. The TEI elements used to encode the encoding description are described in section 2.3 The Encoding Description below.
- a text profile, tagged profileDesc, containing classificatory and contextual information about the text, such as its subject matter, the situation in which it was produced, the individuals described by or participating in producing it, and so forth. Such a text profile is of particular use in highly structured composite texts such as corpora or language collections, where it is often highly desirable to enforce a controlled descriptive vocabulary or to perform retrievals from a body of text in terms of text type or origin. The text profile may however be of use in any form of automatic text processing. The TEI elements used to encode the profile description are described in section 2.4 The Profile Description below.
- a revision history, tagged revisionDesc, which allows the encoder to provide a history of changes made during the development of the electronic text. The revision history is important for version control and for resolving questions about the history of a file. The TEI elements used to encode the revision description are described in section 2.5 The Revision Description below.
A TEI header can be a very large and complex object, or it may be a very simple one. Some application areas (for example, the construction of language corpora and the transcription of spoken texts) may require more specialized and detailed information than others. The present proposals therefore define both a core set of elements (all of which may be used without formality in any TEI header) and some additional elements which become available within the header as the result of including additional specialized modules within the schema. When the module for language corpora (described in chapter 15 Language Corpora) is in use, for example, several additional elements are available, as further detailed in that chapter.
The next section of the present chapter briefly introduces the overall structure of the header and the kinds of data it may contain. This is followed by a detailed description of all the constituent elements which may be used in the core header. Section 2.6 Minimal and Recommended Headers , at the end of the present chapter, discusses the recommended content of a minimal TEI header and its relation to standard library cataloguing practices.
2.1 Organization of the TEI HeaderTEI: Organization of the TEI Header¶
2.1.1 The TEI Header and its ComponentsTEI: The TEI Header and its Components¶
The teiHeader element should be clearly distinguished from the front matter of the text itself (for which see section 4.5 Front Matter). A composite text, such as a corpus or collection, may contain several headers, as further discussed below. In the usual case, however, a TEI-conformant text will contain a single teiHeader element, followed by a single text element.
- teiHeader (TEI Header) supplies the descriptive and declarative information making
up an electronic title page prefixed to every TEI-conformant
text.
type specifies the kind of document to which the header is attached, for example whether it is a corpus or individual text.
- fileDesc (file description) contains a full bibliographic description of an electronic file.
- encodingDesc (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.
- profileDesc (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.
- revisionDesc (revision description) summarizes the revision history for a file.
<fileDesc>
<!-- ... -->
</fileDesc>
<encodingDesc>
<!-- ... -->
</encodingDesc>
<profileDesc>
<!-- ... -->
</profileDesc>
<revisionDesc>
<!-- ... -->
</revisionDesc>
</teiHeader>
<fileDesc>
<!-- ... -->
</fileDesc>
</teiHeader>
<teiHeader type="corpus">
<!-- corpus-level metadata here -->
</teiHeader>
<TEI>
<teiHeader type="text">
<!-- metadata specific to this text here -->
</teiHeader>
<text>
<!-- ... -->
</text>
</TEI>
<TEI>
<teiHeader type="text">
<!-- metadata specific to this text here -->
</teiHeader>
<text>
<!-- ... -->
</text>
</TEI>
</teiCorpus>
2.1.2 Types of Content in the TEI HeaderTEI: Types of Content in the TEI Header¶
- free prose
- Most elements contain simple running prose at some level. Many elements may contain either prose (possibly organized into paragraphs) or more specific elements, which themselves contain prose. In this chapter's descriptions of element content, the phrase prose description should be understood to imply a series of paragraphs, each marked as a p element. The word phrase, by contrast, should be understood to imply character data, interspersed as need be with phrase-level elements, but not organized into paragraphs. For more information on paragraphs, highlighted phrases, lists, etc., see section 3.1 Paragraphs.
- grouping elements
- Elements whose names end with the suffix Stmt (e.g. editionStmt, titleStmt) usually enclose a group of specialized elements recording some structured information. In the case of the bibliographic elements, the suffix Stmt is used in names of elements corresponding to the ‘areas’ of the International Standard Bibliographic Description.4 In most cases grouping elements may contain prose descriptions as an alternative to the set of specialized elements, thus allowing the encoder to choose whether or not the information concerned should be presented in a structured form or in prose.
- declarations
- Elements whose names end with the suffix Decl (e.g. tagsDecl, refsDecl) enclose information about specific encoding practices applied in the electronic text; often these practices are described in coded form. Typically, such information takes the form of a series of declarations, identifying a code with some more complex structure or description. A declaration which applies to more than one text or division of a text need not be repeated in the header of each such text or subdivision. Instead, the decls attribute of each text (or subdivision of the text) to which the declaration applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.
- descriptions
- Elements whose names end with the suffix Desc (e.g. settingDesc, projectDesc) contain a prose description, possibly, but not necessarily, organized under some specific headings by suggested sub-elements.
2.1.3 Model Classes in the TEI HeaderTEI: Model Classes in the TEI Header¶
The TEI Header provides a very rich collection of metadata categories, but makes no claim to be exhaustive. It is certainly the case that individual projects may wish to record specialised metadata which either does not fit within one of the predefined categories identified by the TEI Header or requires a more specialized element structure than is proposed here. To overcome this problem, the encoder may elect to define additional elements using the customization methods discussed in 23.2 Personalization and Customization. The TEI class system makes such customizations simpler to effect and easier to use in interchange.
- model.applicationLike groups elements used to record application-specific information about a document in its header.
- model.catDescPart groups component elements of the TEI Header Category Description.
- model.editorialDeclPart groups elements which may be used inside editorialDecl and appear multiple times.
- model.encodingPart groups elements which may be used inside encodingDesc and appear multiple times.
- model.profileDescPart groups elements which may be used inside profileDesc and appear multiple times.
- model.headerPart groups high level elements which may appear more than once in a TEI Header.
- model.sourceDescPart groups elements which may be used inside sourceDesc and appear multiple times.
- model.textDescPart groups elements used to categorise a text for example in terms of its situational parameters.
2.2 The File DescriptionTEI: The File Description¶
This section describes the fileDesc element, which is the first component of the teiHeader element.
The bibliographic description of a machine-readable or digital text resembles in structure that of a book, an article, or any other kind of textual object. The file description element of the TEI header has therefore been closely modelled on existing standards in library cataloguing; it should thus provide enough information to allow users to give standard bibliographic references to the electronic text, and to allow cataloguers to catalogue it. Bibliographic citations occurring elsewhere in the header, and also in the text itself, are derived from the same model (on bibliographic citations in general, see further section 3.11 Bibliographic Citations and References). See further section 2.7 Note for Library Cataloguers.
- fileDesc (file description) contains a full bibliographic description of an electronic file.
- titleStmt (title statement) groups information about the title of a work and those responsible for its intellectual content.
- editionStmt (edition statement) groups information relating to one edition of a text.
- extent describes the approximate size of a text as stored on some carrier medium, whether digital or non-digital, specified in any convenient units.
- publicationStmt (publication statement) groups information concerning the publication or distribution of an electronic or other text.
- seriesStmt (series statement) groups information about the series, if any, to which a publication belongs.
- notesStmt (notes statement) collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description.
- sourceDesc (source description) supplies a description of the source text(s) from which an electronic text was derived or generated.
<fileDesc>
<titleStmt>
<!-- ... -->
</titleStmt>
<editionStmt>
<!-- ... -->
</editionStmt>
<extent>
<!-- ... -->
</extent>
<publicationStmt>
<!-- ... -->
</publicationStmt>
<seriesStmt>
<!-- ... -->
</seriesStmt>
<notesStmt>
<!-- ... -->
</notesStmt>
<sourceDesc>
<!-- ... -->
</sourceDesc>
</fileDesc>
</teiHeader>
<fileDesc>
<titleStmt>
<!-- ... -->
</titleStmt>
<publicationStmt>
<!-- ... -->
</publicationStmt>
<sourceDesc>
<!-- ... -->
</sourceDesc>
</fileDesc>
<!-- other optional parts of the header here -->
</teiHeader>
2.2.1 The Title StatementTEI: The Title Statement¶
- titleStmt (title statement) groups information about the title of a work and those responsible for its intellectual content.
- title contains a title for any kind of work.
- author in a bibliographic reference, contains the name of the author(s), personal or corporate, of a work; the primary statement of responsibility for any bibliographic item.
- editor secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc.
- sponsor specifies the name of a sponsoring organization or institution.
- funder (funding body) specifies the name of an individual, institution, or organization responsible for the funding of a project or text.
- principal (principal researcher) supplies the name of the principal researcher responsible for the creation of an electronic text.
- respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
- resp (responsibility) contains a phrase describing the nature of a person's intellectual responsibility.
- name (name, proper noun) contains a proper noun or noun phrase.
The title element contains the chief name of the electronic work, including any alternative title or subtitles it may have. It may be repeated, if the work has more than one title (perhaps in different languages) and takes whatever form is considered appropriate by its creator. Where the electronic work is derived from an existing source text, it is strongly recommended that the title for the former should be derived from the latter, but clearly distinguishable from it, for example by the addition of a phrase such as ‘: an electronic transcription’ or ‘a digital edition’. This will distinguish the electronic work from the source text in citations and in catalogues which contain descriptions of both types of material.
The electronic work will also have an external name (its ‘filename’ or ‘data set name’) or reference number on the computer system where it resides at any time. This name is likely to change frequently, as new copies of the file are made on the computer system. Its form is entirely dependent on the particular computer system in use and thus cannot always easily be transferred from one system to another. Moreover, a given work may be composed of many files. For these reasons, these Guidelines strongly recommend that such names should not be used as the title for any electronic work.
Helpful guidance on the formulation of useful descriptive titles in difficult cases may be found in the Anglo-American Cataloguing Rules (Gorman and Winkler, 1978, chapter 25) or in equivalent national-level bibliographical documentation.
The elements author, editor, sponsor, funder, and principal, are specializations of the more general respStmt element. These elements are used to provide the statements of responsibility which identify the person(s) responsible for the intellectual or artistic content of an item and any corporate bodies from which it emanates.
Any number of such statements may occur within the title statement. At a minimum, identify the author of the text and (where appropriate) the creator of the file. If the bibliographic description is for a corpus, identify the creator of the corpus. Optionally include also names of others involved in the transcription or elaboration of the text, sponsors, and funding agencies. The name of the person responsible for physical data input need not normally be recorded, unless that person is also intellectually responsible for some aspect of the creation of the file.
Where the person whose responsibility is to be documented is not an author, sponsor, funding body, or principal researcher, the respStmt element should be used. This has two subcomponents: a name element identifying a responsible individual or organization, and a resp element indicating the nature of the responsibility. No specific recommendations are made at this time as to appropriate content for the resp: it should make clear the nature of the responsibility concerned, as in the examples below.
Names given may be personal names or corporate names. Give all names in the form in which the persons or bodies wish to be publicly cited. This would usually be the fullest form of the name, including first names.5
<title>Capgrave's Life of St. John Norbert: a
machine-readable transcription</title>
<respStmt>
<resp>compiled by</resp>
<name>P.J. Lucas</name>
</respStmt>
</titleStmt>
<title>Two stories by Edgar Allen Poe: electronic version</title>
<author>Poe, Edgar Allen (1809-1849)</author>
<respStmt>
<resp>compiled by</resp>
<name>James D. Benson</name>
</respStmt>
</titleStmt>
<title>Yogadarśanam (arthāt
yogasūtrapūṭhaḥ):
a digital edition.</title>
<title>The Yogasūtras of Patañjali:
a digital edition.</title>
<funder>Wellcome Institute for the History of Medicine</funder>
<principal>Dominik Wujastyk</principal>
<respStmt>
<name>Wieslaw Mical</name>
<resp>data entry and proof correction</resp>
</respStmt>
<respStmt>
<name>Jan Hajic</name>
<resp>conversion to TEI-conformant markup</resp>
</respStmt>
</titleStmt>
2.2.2 The Edition StatementTEI: The Edition Statement¶
- editionStmt (edition statement) groups information relating to one edition of a text.
- edition (edition) describes the particularities of one edition of a text.
- respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
- name (name, proper noun) contains a proper noun or noun phrase.
- resp (responsibility) contains a phrase describing the nature of a person's intellectual responsibility.
For printed texts, the word edition applies to the set of all the identical copies of an item produced from one master copy and issued by a particular publishing agency or a group of such agencies. A change in the identity of the distributing body or bodies does not normally constitute a change of edition, while a change in the master copy does.
For electronic texts, the notion of a ‘master copy’ is not entirely appropriate, since they are far more easily copied and modified than printed ones; nonetheless the term edition may be used for a particular state of a machine-readable text at which substantive changes are made and fixed. Synonymous terms used in these Guidelines are version, level, and release. The words revision and update, by contrast, are used for minor changes to a file which do not amount to a new edition.
No simple rule can specify how ‘substantive’ changes have to be before they are regarded as producing a new edition, rather than a simple update. The general principle proposed here is that the production of a new edition entails a significant change in the intellectual content of the file, rather than its encoding or appearance. The addition of analytic coding to a text would thus constitute a new edition, while automatic conversion from one coded representation to another would not. Changes relating to the character code or physical storage details, corrections of misspellings, simple changes in the arrangement of the contents and changes in the output format do not normally constitute a new edition, whereas the addition of new information (e.g. a linguistic analysis expressed in part-of-speech tagging, sound or graphics, referential links to external data sets) almost always does.
Clearly, there will always be borderline cases and the matter is somewhat arbitrary. The simplest rule is: if you think that your file is a new edition, then call it such. An edition statement is optional for the first release of a computer file; it is mandatory for each later release, though this requirement cannot be enforced by the parser.
Note that all changes in a file, whether or not they are regarded as constituting a new edition or simply a new revision, should be independently noted in the revision description section of the file header (see section 2.5 The Revision Description).
The edition element should contain phrases describing the edition or version, including the word edition, version, or equivalent, together with a number or date, or terms indicating difference from other editions such as new edition, revised edition etc. Any dates that occur within the edition statement should be marked with the date element. The n attribute of the edition element may be used as elsewhere to supply any formal identification (such as a version number) for the edition.
One or more respStmt elements may also be used to supply statements of responsibility for the edition in question. These may refer to individuals or corporate bodies and can indicate functions such as that of a reviser, or can name the person or body responsible for the provision of supplementary matter, of appendices, etc., in a new edition. For further detail on the respStmt element, see section 3.11 Bibliographic Citations and References.
<edition n="P2">Second draft, substantially
extended, revised, and corrected.</edition>
</editionStmt>
<edition>Student's edition, <date>June 1987</date>
</edition>
<respStmt>
<resp>New annotations by</resp>
<name>George Brown</name>
</respStmt>
</editionStmt>
2.2.3 Type and Extent of FileTEI: Type and Extent of File¶
- extent describes the approximate size of a text as stored on some carrier medium, whether digital or non-digital, specified in any convenient units.
For printed books, information about the carrier, such as the kind of medium used and its size, are of great importance in cataloguing procedures. The print-oriented rules for bibliographic description of an item's medium and extent need some re-interpretation when applied to electronic media. An electronic file exists as a distinct entity quite independently of its carrier and remains the same intellectual object whether it is stored on a magnetic tape, a CD-ROM, a set of floppy disks, or as a file on a mainframe computer. Since, moreover, these Guidelines are specifically aimed at facilitating transparent document storage and interchange, any purely machine-dependent information should be irrelevant as far as the file header is concerned.
This is particularly true of information about file-type although library-oriented rules for cataloguing often distinguish two types of computer file: ‘data’ and ‘programs’. This distinction is quite difficult to draw in some cases, for example, hypermedia or texts with built in search and retrieval software.
- in bytes of a specified length (e.g. ‘4000 16-bit bytes’)
- as falling within a range of categories, for example:
- less than 1 Mb
- between 1 Mb and 5 Mb
- between 6 Mb and 10 Mb
- over 10 Mb
- in terms of any convenient logical units (for example, words or sentences, citations, paragraphs)
- in terms of any convenient physical units (for example, blocks, disks, tapes)
The use of standard abbreviations for units of quantity is recommended where applicable, here as elsewhere (see http://physics.nist.gov/cuu/Units/binary.html).
<extent>4.2 MiB</extent>
<extent>4532 bytes</extent>
<extent>3200 sentences</extent>
<extent>5 90 mm High Density Diskettes</extent>
2.2.4 Publication, Distribution, etc.TEI: Publication, Distribution, etc.¶
- publicationStmt (publication statement) groups information concerning the publication or distribution of an electronic or other text.
- publisher provides the name of the organization responsible for the publication or distribution of a bibliographic item.
- distributor supplies the name of a person or other agency responsible for the distribution of a text.
- authority (release authority) supplies the name of a person or other agency responsible for making an electronic file available, other than a publisher or distributor.
The publisher is the person or institution by whose authority a given edition of the file is made public. The distributor is the person or institution from whom copies of the text may be obtained. Where a text is not considered formally published, but is nevertheless made available for circulation by some individual or organization, this person or institution is termed the release authority.
- pubPlace (publication place) contains the name of the place where a bibliographic item was published.
- address contains a postal address, for example of a publisher, an organization, or an individual.
- idno (identifying number) supplies any standard or non-standard number used to identify a
bibliographic item.
type categorizes the number, for example as an ISBN or other standard series. - availability supplies information about the availability of a text, for
example any restrictions on its use or distribution, its copyright
status, etc.
status supplies a code identifying the current availability of the text. - date contains a date in any format.
Note that the dates, places, etc., given in the publication statement relate to the publisher, distributor, or release authority most recently mentioned. If the text was created at some date other than its date of publication, its date of creation should be given within the profileDesc element, not in the publication statement. Give any other useful dates (e.g., dates of collection of data) in a note.
Additional detailed elements may be used for the encoding of names, dates, and addresses, as further described in section 3.5 Names, Numbers, Dates, Abbreviations, and Addresses when the module described in chapter 13 Names, Dates, People, and Places is included in a schema.
<publisher>Oxford University Press</publisher>
<pubPlace>Oxford</pubPlace>
<date>1989</date>
<idno type="ISBN">0-19-254705-4</idno>
<availability>
<p>Copyright 1989, Oxford University Press</p>
</availability>
</publicationStmt>
<authority>James D. Benson</authority>
<pubPlace>London</pubPlace>
<date>1984</date>
</publicationStmt>
<publisher>Sigma Press</publisher>
<address>
<addrLine>21 High Street,</addrLine>
<addrLine>Wilmslow,</addrLine>
<addrLine>Cheshire M24 3DF</addrLine>
</address>
<date>1991</date>
<distributor>Oxford Text Archive</distributor>
<idno type="ota">1256</idno>
<availability>
<p>Available with prior consent of depositor for
purposes of academic research and teaching only.</p>
</availability>
</publicationStmt>
2.2.5 The Series StatementTEI: The Series Statement¶
- seriesStmt (series statement) groups information about the series, if any, to which a publication belongs.
- A group of separate items related to one another by the fact that each item bears, in addition to its own title proper, a collective title applying to the group as a whole. The individual items may or may not be numbered.
- Each of two or more volumes of essays, lectures, articles, or other items, similar in character and issued in sequence.
- A separately numbered sequence of volumes within a series or serial.
- title contains a title for any kind of work.
- idno (identifying number) supplies any standard or non-standard number used to identify a bibliographic item.
- respStmt (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
- resp (responsibility) contains a phrase describing the nature of a person's intellectual responsibility.
- name (name, proper noun) contains a proper noun or noun phrase.
The idno may be used to supply any identifying number associated with the item, including both standard numbers such as an ISSN and particular issue numbers. (Arabic numerals separated by punctuation are recommended for this purpose: 6.19.33, for example, rather than VI/xix:33). Its type attribute is used to categorize the number further, taking the value ISSN for an ISSN for example.
<title level="s">Machine-Readable Texts for the Study of
Indian Literature</title>
<respStmt>
<resp>ed. by</resp>
<name>Jan Gonda</name>
</respStmt>
<idno type="vol">1.2</idno>
<idno type="ISSN">0 345 6789</idno>
</seriesStmt>
2.2.6 The Notes StatementTEI: The Notes Statement¶
- the nature, scope, artistic form, or purpose of the file; also the genre or other intellectual category to which it may belong: e.g. ‘Text types: newspaper editorials and reportage, science fiction, westerns, and detective stories’. These should be formally described within the profileDesc element (section 2.4 The Profile Description).
- summary description providing a factual, non-evaluative account of the subject content of the file: e.g. ‘Transcribes interviews on general topics with native speakers of English in 17 cities during the spring and summer of 1963.’ These should also be formally described within the profileDesc element (section 2.4 The Profile Description).
- bibliographic details relating to the source or sources of an electronic text: e.g. ‘Transcribed from the Norton facsimile of the 1623 Folio’. These should be formally described in the sourceDesc element (section 2.2.7 The Source Description).
- further information relating to publication, distribution, or release of the text, including sources from which the text may be obtained, any restrictions on its use or formal terms on its availability. These should be placed in the appropriate division of the publicationStmt element (section 2.2.4 Publication, Distribution, etc.).
- publicly documented numbers associated with the file: e.g. ‘ICPSR study number 1803’ or ‘Oxford Text Archive text number 1243’. These should be placed in an idno element within the appropriate division of the publicationStmt element. International Standard Serial Numbers (ISSN), International Standard Book Numbers (ISBN), and other internationally agreed upon standard numbers that uniquely identify an item, should be treated in the same way, rather than as specialized bibliographic notes.
- dates, when they are relevant to the content or condition of the computer file: e.g. ‘manual dated 1983’, ‘Interview wave I: Apr. 1989; wave II: Jan. 1990’
- names of persons or bodies connected with the technical production, administration, or consulting functions of the effort which produced the file, if these are not named in statements of responsibility in the title or edition statements of the file description: e.g. ‘Historical commentary provided by Mark Cohen’
- availability of the file in an additional medium or information not already recorded about the availability of documentation: e.g. ‘User manual is loose-leaf in eleven paginated sections’
- language of work and abstract, if not encoded in the langUsage element, e.g. ‘Text in English with summaries in French and German’
- The unique name assigned to a serial by the International Serials Data System (ISDS), if not encoded in an idno
- lists of related publications, either describing the source itself, or concerned with the creation or use of the electronic work, e.g. ‘Texts used in Burrows (1987)’
<note>Historical commentary provided by Mark Cohen.</note>
<note>OCR scanning done at University of Toronto.</note>
</notesStmt>
<title>…</title>
<respStmt>
<persName>Mark Cohen</persName>
<resp>historical commentary</resp>
</respStmt>
<respStmt>
<orgName>University of Toronto</orgName>
<resp>OCR scanning</resp>
</respStmt>
</titleStmt>
2.2.7 The Source DescriptionTEI: The Source Description¶
- sourceDesc (source description) supplies a description of the source text(s) from which an electronic text was derived or generated.
<p>Born digital.</p>
</sourceDesc>
- model.biblLike groups elements containing a bibliographic description.
- model.sourceDescPart groups elements which may be used inside sourceDesc and appear multiple times.
- model.listLike groups list-like elements.
- bibl (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged.
- biblStruct (structured bibliographic citation) contains a structured bibliographic citation, in which only bibliographic sub-elements appear and in a specified order.
- listBibl (citation list) contains a list of bibliographic citations of any kind.
<bibl>The first folio of Shakespeare, prepared by
Charlton Hinman (The Norton Facsimile, 1968)</bibl>
</sourceDesc>
<biblStruct xml:lang="fr">
<monogr>
<author>Eugène Sue</author>
<title>Martin, l'enfant trouvé</title>
<title type="sub">Mémoires d'un valet de chambre</title>
<imprint>
<pubPlace>Bruxelles et Leipzig</pubPlace>
<publisher>C. Muquardt</publisher>
<date when="1846">1846</date>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
- biblFull (fully-structured bibliographic citation) contains a fully-structured bibliographic citation, in which all components of the TEI file description are present.
- msDesc (manuscript description) contains a description of a single identifiable manuscript.
- scriptStmt (script statement) contains a citation giving details of the script used for a spoken text.
- recordingStmt (recording statement) describes a set of recordings used as the basis for transcription of a spoken text.
- listNym (list of canonical names) contains a list of nyms, that is, standardized names for any thing.
- listOrg (list of organizations) contains a list of elements, each of which provides information about an identifiable organization.
- listPerson (list of persons) contains a list of descriptions, each of which provides information about an identifiable person or a group of people, for example the participants in a language interaction, or the people referred to in a historical source.
- listPlace (list of places) contains a list of places, optionally followed by a list of relationships (other than containment) defined amongst them.
2.2.8 Computer Files Derived from Other Computer Files TEI: Computer Files Derived from Other Computer Files ¶
- fileDesc
- A's file description should be copied into the sourceDesc section of B's file description, enclosed within a biblFull element
- profileDesc
- A's profileDesc should be copied into B's, in principle unchanged; it may however be expanded by project-specific information relating to B.
- encodingDesc
- A's encoding practice may or (more likely) may not be the same as B's. Since the object of the encoding description is to define the relationship between the current file and its source, in principle only changes in encoding practice between A and B need be documented in B. The relationship between A and its source(s) is then only recoverable from the original header of A. In practice it may be more convenient to create a new complete encodingDesc for B based on A's.
- revisionDesc
- B is a new computer file, and should therefore have a new revision description. If, however, it is felt useful to include some information from A's revisionDesc, for example dates of major updates or versions, such information must be clearly marked as relating to A rather than to B.
2.3 The Encoding DescriptionTEI: The Encoding Description¶
- encodingDesc (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived.
- projectDesc (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.
- samplingDecl (sampling declaration) contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.
- editorialDecl (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text.
- tagsDecl (tagging declaration) provides detailed information about the tagging applied to a document.
- refsDecl (references declaration) specifies how canonical references are constructed for this text.
- classDecl (classification declarations) contains one or more taxonomies defining any classificatory codes used elsewhere in the text.
- appInfo (application information) records information about an application which has edited the TEI file.
2.3.1 The Project DescriptionTEI: The Project Description¶
- projectDesc (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.
<projectDesc>
<p>Texts collected for use in the
Claremont Shakespeare Clinic, June 1990.</p>
</projectDesc>
</encodingDesc>
2.3.2 The Sampling DeclarationTEI: The Sampling Declaration¶
- samplingDecl (sampling declaration) contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.
- the size of individual samples
- the method or methods by which they were selected
- the underlying population being sampled
- the object of the sampling procedure used
<p>Samples of 2000 words taken from the beginning of the text.</p>
</samplingDecl>
<p>Text of stories only has been transcribed. Pull quotes, captions,
and advertisements have been silently omitted. Any mathematical
expressions requiring symbols not present in the ISOnum or ISOpub
entity sets have been omitted, and their place marked with a GAP
element.</p>
</samplingDecl>
A sampling declaration which applies to more than one text or division of a text need not be repeated in the header of each such text. Instead, the decls attribute of each text (or subdivision of the text) to which the sampling declaration applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.
2.3.3 The Editorial Practices DeclarationTEI: The Editorial Practices Declaration¶
- editorialDecl (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text.
- correction
- correction (correction principles) states how and under what circumstances corrections have been
made in the text.
status indicates the degree of correction applied to the text. method indicates the method adopted to indicate corrections within the text.
Was the text corrected during or after data capture? If so, were corrections made silently or are they marked using the tags described in section 3.4 Simple Editorial Changes? What principles have been adopted with respect to omissions, truncations, dubious corrections, alternate readings, false starts, repetitions, etc.?
- correction (correction principles) states how and under what circumstances corrections have been
made in the text.
- normalization
- normalization indicates the extent of normalization or regularization of the
original source carried out in converting it to electronic form.
source indicates the authority for any normalization carried out. method indicates the method adopted to indicate normalizations within the text.
Was the text normalized, for example by regularizing any non-standard spellings, dialect forms, etc.? If so, were normalizations performed silently or are they marked using the tags described in section 3.4 Simple Editorial Changes? What authority was used for the regularization? Also, what principles were used when normalizing numbers to provide the standard values for the value attribute described in section 3.5.3 Numbers and Measures and what format used for them?
- normalization indicates the extent of normalization or regularization of the
original source carried out in converting it to electronic form.
- quotation
- quotation specifies editorial practice adopted with respect to quotation
marks in the original.
marks (quotation marks) indicates whether or not quotation marks have been retained as content within the text. form specifies how quotation marks are indicated within the text.
How were quotation marks processed? Are apostrophes and quotation marks distinguished? How? Are quotation marks retained as content in the text or replaced by markup? Are there any special conventions regarding for example the use of single or double quotation marks when nested? Is the file consistent in its practice or has this not been checked?
- quotation specifies editorial practice adopted with respect to quotation
marks in the original.
- hyphenation
- hyphenation summarizes the way in which hyphenation in a source text has been
treated in an encoded version of it.
eol (end-of-line) indicates whether or not end-of-line hyphenation has been retained in a text.
Does the encoding distinguish ‘soft’ and ‘hard’ hyphens? What principle has been adopted with respect to end-of-line hyphenation where source lineation has not been retained? Have soft hyphens been silently removed, and if so what is the effect on lineation and pagination?
- hyphenation summarizes the way in which hyphenation in a source text has been
treated in an encoded version of it.
- segmentation
- segmentation describes the principles according to which the text has been segmented, for example into sentences, tone-units, graphemic strata, etc.
How is the text segmented? If s or seg segmentation units have been used to divide up the text for analysis, how are they marked and how was the segmentation arrived at?
- stdVals
- stdVals (standard values) specifies the format used when standardized date or number values are supplied.
In most cases, attributes bearing standardized values (such as the when or when-iso attribute on dates) should conform to a defined W3C or ISO datatype. In cases where this is not appropriate, this element may be used to describe the standardization methods underlying the values supplied.
- interpretation
- interpretation describes the scope of any analytic or interpretive information added to the text in addition to the transcription.
Has any analytic or ‘interpretive’ information been provided — that is, information which is felt to be non-obvious, or potentially contentious? If so, how was it generated? How was it encoded? If feature-structure analysis has been used, are fsdDecl elements (section 18.11 Feature System Declaration) present?
<segmentation>
<p>
<gi>s</gi> elements mark orthographic sentences and
are numbered sequentially
within their parent <gi>div</gi> element
</p>
</segmentation>
<interpretation>
<p>The part of speech analysis applied throughout section 4 was
added by hand and has not been checked.</p>
</interpretation>
<correction>
<p>Errors in transcription controlled by using the
WordPerfect spelling checker.</p>
</correction>
<normalization source="http://szotar.sztaki.hu/webster/">
<p>All words converted to Modern American spelling following
Websters 9th Collegiate dictionary.</p>
</normalization>
<quotation marks="all" form="std">
<p>All opening quotation marks represented by entity reference
<ident type="ge">odq</ident>; all closing quotation marks
represented by entity reference <ident type="ge">cdq</ident>.</p>
</quotation>
</editorialDecl>
An editorial practices declaration which applies to more than one text or division of a text need not be repeated in the header of each such text. Instead, the decls attribute of each text (or subdivision of the text) to which it applies may be used to supply a cross-reference to it, as further described in section 15.3 Associating Contextual Information with a Text.
2.3.4 The Tagging DeclarationTEI: The Tagging Declaration¶
- the namespace to which elements appearing within the transcribed text belong.
- how often particular elements appear within the text, so that a recipient can validate the integrity of a text during interchange.
- any comment relating to the usage of particular elements not specified elsewhere in the header.
- a default rendition applicable to all instances of an element.
- rendition supplies information about the rendition or appearance of one or more
elements in the source text.
scheme identifies the language used to describe the rendition. - namespace supplies the formal name of the namespace to which the elements documented by its children belong.
- tagUsage supplies information about the usage of a specific element within a text.
The tagsDecl element consists of an optional sequence of rendition elements, each of which must bear a unique identifier, followed by an optional sequence of one or more namespace elements, containing a series of tagUsage elements, one for each distinct element from that namespace occurring within the outermost text element of a TEI document.
2.3.4.1 RenditionTEI: Rendition¶
- using an informal prose description
- using a standard stylesheet language such as CSS or XSL-FO
- using a project-defined formal language
- the render attribute of the appropriate tagUsage element may be used to indicate a default rendition for all occurrences of the named element
- the global rendition attribute may be used on any element to indicate its rendition, over-riding any supplied default value
