Author: TEI SIG on Libraries
Editors: Kevin Hawkins, Michelle Dalmau, and Syd Bauman
2011-10Version 3.0 (October 2011)
This document has been superseded by version 4.0.0.
This document is the third version of a document formerly known as TEI Text Encoding in Libraries: Guidelines for Best Encoding Practices, which has been updated to comply with the Text Encoding Initiative’s Guidelines for Text Encoding and Interchange (P5). These guidelines are intended for use in large, library-based digitization projects, but may be useful in other scenarios as well. This version of the Best Practices for TEI in Libraries was created and is maintained by the TEI in Libraries: Guidelines for Best Practices Working Group.
There are many different library text digitization projects, serving a variety of purposes. With this in mind, these Best Practices are meant to be as inclusive as possible by specifying five encoding levels. These levels are meant to allow for a range of practice, from wholly automated text creation and encoding, to encoding that requires expert content knowledge, analysis, and editing. The encoding levels are not strictly cumulative: while higher levels tend to build upon lower levels by including more elements, higher levels are not supersets because some elements used at lower levels are not used at higher levels—often because more specific elements replace generic elements.
| Level | Description | Example of encoding of Alger Hiss document | Display example |
| Level 1 | The text is generated through OCR, is subordinate to the page image, and is not intended to stand alone as an electronic text (without page images). Encoding is done to assist in full text searching. | Alger Hiss document | example |
| Level 2 | The text is generated through OCR and is mainly subordinate to the page image, though navigational markers (textual divisions, headings) are captured. | Alger Hiss document | example |
| Level 3 (example) | The text is created by conversion, either by way of OCR or keyboarding. Some structural elements of the text are encoded. The text may be used with or without page images. | Alger Hiss document | example |
| Level 4 (example) | The text is generated either through corrected OCR or keyboarding and is able to stand alone without page images in order for them to be read by students, scholars, and general readers. | Alger Hiss document | example |
| Level 5 (example) | The text is generated either through corrected OCR or keyboarding and is able to stand alone without page images, as in Level 4. In addition, the tagging requires substantial human intervention by encoders with subject knowledge. | (none) | example |
In these Best Practices, use of elements and attributes tends toward explicitness for ease of processing even though a human or possibly machine reader might be able to make inferences based on context. Only those elements and attributes mentioned below are recommended for use in encoding based on these Best Practices; use of other TEI elements and attributes is not recommended. Consult the full TEI Guidelines for guidance on use of elements and attributes beyond what is described below.
As guidelines rather than a specification, these Best Practices use should instead of must in nearly all cases (except where a practice is required by P5), with optional practices indicated by may. However, to encourage conformance to these Best Practices, the ODD files will generate schemas that require use of recommended elements (indicated by should).
These Best Practices specify a recommended archival storage format. Local system needs may require transformation of documents in this archival format to another XML format for use by a local indexing or delivery software.
The TEI Tite customization of the TEI Guidelines was developed as a subset of the TEI to be used as a vendor specification for outsourced encoding of the type often initiated by libraries, archives and other cultural heritage organizations. This Best Practices document was created to support in-house encoding that adheres as closely as possible to common TEI practice and library standards yet still leaves room for variation in local practice.
If a library uses TEI Tite for outsourced encoding, it should find that converting files from the TEI Tite format to a format conforming to these Best Practices is not difficult. TEI Tite files may be converted to Best Practices Level 3 with some loss of granularity; or to Level 4 with some additional markup with minimal human intervention. (The reason Level 3 does not contain as many elements as TEI Tite is to allow for use of this encoding level, whether for encoding of born-digital source documents or for upgrading Level 1 or Level 2 texts, with less human intervention than would be required by TEI Tite.)
These Best Practices are meant to complement the TEI Tite customization of the TEI Guidelines. Whereas TEI Tite is meant for vendors who need exact specifications for encoding, these Best Practices document how a library or other large-scale encoding project might create conformant TEI documents out of vendor-generated or locally-created TEI documents. TEI Tite lacks header metadata and elements for encoding textual structures of possible interest to libraries; however, once Tite documents are transformed to a TEI-conformant encoding used by an institution, these Best Practices can serve as a point of reference for developing the TEI header and applying richer markup as reflected in Level 4 or 5 of these Best Practices.
For a comparison of the TEI Tite schema to these Best Practices, see TEI Tite's Appendix A.
The goal of the TEI is interchange, not interoperability. While seamless interoperability of texts created for different purposes is an elusive goal, use of a common markup vocabulary and syntax greatly aids interchange. Nevertheless, keep in mind that others—even within your organization—may use your texts in the future for other uses than you intended in your encoding.
An encoding project should strive for internal consistency and for use of standards so that the data can be modified or enhanced in the future with ease. In cases where local practice deviates from standards, there should at least be internal consistency in the local practice.
When reformatting to digital media using any level of encoding, the electronic text should begin with the transcription of the first word on the first leaf of the original work. At lower levels of encoding, it may be impractical or undesirable to transcribe and encode certain features of the text, such as publisher’s advertisements or indexes; at Level 4 and above, the transcription should be complete. Any omissions of material found in the original work should be noted in the <editorialDecl> in the TEI header.
Encoding end-of-line, end-of-column, and end-of-page hyphenation varies considerably in the TEI community. Some capture all hyphens found on the printed page, while others remove those in the middle of words not normally hyphenated for easier implementation of full-text retrieval. If preserving hyphens, some will capture all hyphens using the same character, while others will distinguish hyphens that must be present in any case (often called hard hyphens) and those that are only present by virtue of being at the end of a line, column, or page (often called soft hyphens).
This issue is complicated by the fact that Unicode prescribes use of a soft hyphen not for a visible hyphen that might have been absent but instead for a place where a hyphen might occur. Furthermore, it includes a non-breaking hyphen, used in cases like ‘re-creation’ (meaning to create again, as opposed to recreation, meaning relaxation), in addition to a regular hyphen, which would normally count as a word boundary. In short, Unicode is oriented toward electronic text that may be processed with a computer in various ways, not toward capturing source documents.
Since OCR software relies on dictionaries to determine the probability not simply of characters but of whole words, it is often able to capture hyphenation in different ways, per the needs of a specific project.
At Levels 1 and 2, do not attempt to disambiguate different uses of hyphens. Encode all hyphens appearing in the source document using character U+2010 if possible; alternatively, use the semantically ambiguous U+002D.
At Level 3, optionally distinguish uses of the hyphen with the @break and @rend attributes on <lb>, <cb>, and <pb> elements as appropriate. At Level 4, the use of the @break and @rend attributes on these elements is mandatory.
| Colloquial name | Appearance in source document | Encoding | Note |
| Hard hyphen |
This is not a run-
on sentence.
|
This is not a run-<lb break="no" rend="keep-hyphen"/>on sentence.
|
The use of no as the value of the @break attribute indicates that the encoder considers "run-on" to be a single orthographic token (loosely speaking, a single word). |
| Hard hyphen |
This is not a run-
on sentence.
|
This is not a run-<lb break="yes" rend="keep-hyphen"/>on sentence.
|
The use of yes as the value of the @break attribute indicates that the encoder considers "run-on" to consist of two separate orthographic tokens. |
| Soft hyphen |
UTF-8 is a char-
acter encoding for Unicode.
|
UTF-8 is a char-<lb break="no"/>acter encoding for Unicode.
|
The use of no as the value of the @break attribute indicates that the encoder considers "character" to be a single orthographic token. |
| Unclear case |
Some people say TEI is a mark-
up language.
|
Some people say TEI is a mark-<lb break="maybe"/>up language.
|
The use of maybe as the value of the @break attribute indicates that the encoder is unsure whether "mark-up" is a single orthographic token. |
A filename scheme that is internally consistent should be established for the project.
If it is likely that the files will need to be used on more primitive devices (MS-DOS computers or unextended ISO 9660 CDs) it may be useful to limit names to 8 characters (limited to the 26 lower case letters of ASCII, digits, hyphens, and underscore), a dot, and an extension of 3 alphanumeric characters. Likewise, if you will access files using a version of Apple Filing Protocol (AFP) before 3.0, filenames longer than 31 bytes are likely to be corrupted, so you may wish to limit filenames to 31 single-byte (e.g., ASCII) characters.
A number of attributes take a URI (Uniform
Resource Identifier) as their value. Note that in addition to
the full form of reference defined by URI syntax, these
attributes can take a relative reference (e.g.,
filename.ext) or a fragment identifier (e.g.,
#foo).
An encoding project should use only numbered divisions (i.e., <div1>, <div2>, etc.) or unnumbered divisions (i.e., <div>) but not both. This applies both within a TEI document (i.e., within <front>, <body>, <back>, even if nested within <group> or <floatingText>) and across TEI documents in any given collection. Keep in mind that numbering of textual subdivisions starts over (at <div1>) within <floatingText> nested inside a subdivision, so any software that expects to process nested numbered divisions within a document will need to account for this.
The choice of numbered or unnumbered divisions should be documented with the <tagUsage> element in the header. See 4.6, Element Recommendations for the TEI Header, below.
Whether numbered or unnumbered divisions are used, the @type attribute of the division element is not recommended at Level 1 (because only one encoded division in the text exists), is optional at Level 2 (because the division-level metadata need not classify these divisions), is recommended at Level 3 (for broad yet useful analysis of text divisions), and is strongly recommended at Levels 4 and 5 (for full analysis of the text structure).
Page breaks should be encoded using the <pb> element, with the value of the @n attribute denoting the number of the page whose text follows this element. The <pb> element should always be contained within a text division for ease of retrieval with indexing software. For example, a page break that occurs between chapters 2 and 3 should be encoded right after the opening tag of the textual division that opens chapter 3 rather than before the closing tag of the division that ends chapter 2.
For those projects relying on the Metadata Encoding and Transmission Standard (METS), the @xml:id attribute is used as a conceptual identifier for content as opposed to an explicit pointer (i.e., @facs attribute) to a specific representation of that content. These identifiers are then used to generate a METS document that bundles the various content types (e.g., master image files, derivative image files for Web delivery, PDFs, etc.), explicitly lists all versions of the content, and defines the relationships between the constituent parts. This is achieved through the use of the <mets:fileSec> and <mets:structMap> sections of the METS document (see sample METS document for a TEI project).
These Best Practices provide recommended usage of attributes as used in the TEI header and within the body of the TEI document (within the <text> element), as evidenced by attributes used in encoding example snippets and the prose description of this document.
Scores of attributes are available for use within <text>, and a list of those recommended for use in Tite documents is included in Appendix A.
In this section is general advice on the use of particular attributes commonly needed for library encoding projects. (All of the attributes below are commonly used on various elements, but not every element requires or even allows these attributes.)
Constructing a list of acceptable attribute values for the @type attribute for each element, on which everyone could agree, is impossible. Instead, it is recommended that projects describe the @type attribute values used in their texts in the project ODD file and that this list be made available to people using the texts. It is worth noting that, at present, Roma, the web front-end editor for ODD files, does not have a mechanism for providing this documentation — it should be added to the ODD file directly. For a list of standard names and definitions of bibliographic features of printed books, see ABC for Book Collectors by John Carter (8th edition, New Castle, Del. and London: Oak Knoll Books and the British Library, 2004, available online at http://www.ilab.org/images/abcforbookcollectors.pdf).
This attribute is sometimes used to number elements for machine processing, but it often includes data represented in the source document, such as page numbers or footnote numbers. Example: pb n="456"/
These attributes are both available on a variety of elements including <persName>, <orgName> <author>, and <title>. They are used to reference external metadata about the content of the element. The @key attribute may contain any string of Unicode characters, whereas the @ref attribute contains a URI (including a relative one, as discussed above). While @key may supply any identifier, there is no mechanism internal to XML for checking that the value of this attribute is valid.
Readily available software can then check when it
encounters ref="#tgn_7012924" that
xml:id="tgn_7012924" exists elsewhere in the
document.
In general it is recommended to use @ref when the metadata object being referenced is accessible via a URI (e.g., is on the web), and @key when it is not. To avoid ambiguity in referencing external data sources, it is recommended not to use both attributes on the same instance of an element.
At levels 3 and above, the @rend and @rendition attributes may be used when it is desirable to record information about how the textual feature was displayed in the source document.
Never use these attributes on header elements: metadata is transcribed and possibly regularized, as in a catalog record, but its exact appearance is not meant to be captured.
If a project is normalizing the rendering of text objects (for example, such that all titles should be italicized, regardless of how they appeared in the source document), there is no need to use these attributes; instead, a stylesheet will determine that all titles are displayed in italics.
However, if a project is faithfully recording the rendering in the source document, one of these attributes should be used to indicate this rendering, either on all elements to be rendered differently from the surrounding text or on all elements whose rendering does not follow the default stylesheet.
Used to indicate the natural language of the content of an element. It is generally not used for children of the <text> element at Level 1 or Level 2 but is common at Level 3 and above. See the data.language datatype in the TEI Guidelines.
| Element | Description | |
| TEI xml:id="___" xmlns="http://www.tei-c.org/ns/1.0" | The root element of a TEI document. Use of the @xml:id attribute is recommended, giving the same unique identifier for the TEI document as in teiHeader/fileDesc/publicationStmt/idno . |
|
| ├ | teiHeader xml:lang="___" | The <teiHeader> contains metadata about the TEI document. The @xml:lang is recommended; it indicates the language used for the metadata describing the document. |
| ├ | <facsimile> | The <facsimile> defines sets of images that correspond with the text. This element should only be used if page images are included and if this particular mechanism for linking page images is chosen. See between encoded text and images of source documents|Linking between encoded text and images of source documents. |
| └ | The text xml:lang="___" | The <text> element contains the encoded transcription of the source document. The @xml:lang attribute is recommended; it indicates the primary language of the source document. |
The child elements of the <teiHeader> and <text> elements are described below.
Note that this is not a complete customization. It is just one specification group that is used by each of the customizations for levels 1–4.
As with any descriptive metadata, the metadata in the TEI header can serve multiple audiences. In the local context, a TEI header provides metadata about the TEI document, its source, and its provenance. The TEI header may be used for metadata exchange, to automatically create indexes (author lists, title lists) for a collection of TEI documents, and to aid in browsing heterogeneous TEI documents. TEI headers may also be used as a basis for other metadata records (such as MARC or Dublin Core), though generation of other formats may require human intervention because they often are more granular, or have different granularity, than TEI headers.
While a TEI header is often perceived as similar to or at least related to a MARC record, a TEI header does not typically have a one-to-one correspondence with a MARC record. One TEI header may be described by multiple MARC analytic records, or one MARC record may be used to describe a collection of TEI documents with individual headers. Furthermore, while a MARC record captures metadata about a bibliographic entity in a library’s collection, a TEI header records information both about an encoded text and about the source document for that encoded text. Each institution and even each project may have a different approach to the way electronic texts are created in TEI and then represented in a larger public catalog through MARC. At one institution, the same unit (e.g., a cataloging department) may be responsible for creating both TEI headers and MARC records, while at other institutions the work may be distributed among different units. Within the library domain, metadata or cataloging experts are usually required for at least review and standardization of both the TEI header and the MARC record. In order to allow automatic generation of TEI headers from MARC records and MARC records from TEI headers, some elements (like <author>) contain content not typical for TEI practice but necessary due to a lack of granularity in the MARC format.
Several other descriptive metadata schemas are prevalent within the library domain, including Dublin Core (DC), Dublin Core Qualified (DCQ), and the Metadata Object Description Schema (MODS). Each of these schemas contains elements that capture the same data as many of the elements in the TEI header. As with MARC, a variety of automated or manual workflows can be implemented to crosswalk metadata from one standard to another and provide for increased sharing of metadata about electronic texts in larger contexts. In particular, DC and MODS are common schemas used by the Open Archives Initiative (OAI) and may be particularly valuable for sharing metadata across institutions. Unfortunately, there is currently no mechanism for specifying that the content of an element should be drawn from an outside metadata source or that this outside metadata source should supplement the content of the element. In the absence of such mechanisms, users of these Best Practices may use the <idno> element to supply identifiers for outside metadata records and may supply identifiers for certain authority records using the @key or @ref attributes, allowed on certain elements.
| Element | Description | Equivalent in MARC when cataloging the TEI document | Equivalent in MARC for the source document | ||||||||||
| teiHeader xml:lang="___" | The <teiHeader> contains metadata about the TEI document. The @xml:lang attribute is recommended; it indicates the language used for the metadata describing the document. | 040 $b | n/a | ||||||||||
| ├ | <fileDesc> | The <fileDesc> contains bibliographic metadata about the TEI document. One of its child elements, <sourceDesc>, describes the source document from which the TEI document was created. | n/a | n/a | |||||||||
| │ | ├ | <titleStmt> | n/a | n/a | |||||||||
| │ | │ | ├ | title type="_" |
One or more <title> elements are required to give the title of the TEI document being created. It is suggested that titles be constructed based on the source document according to a national cataloging code.
Use of the @level attribute is not recommended since it does not apply to a TEI document in a collection.
Use of the @type attribute is recommended. It should have one of the following values as suitable in local practice:
|
|
|
|||||||
| │ | │ | ├ | <author> |
One or more <author> elements (one name per element) are used to encode the names of entities primarily responsible for the content of the TEI document—usually, the author(s) of the source document. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the form of the name from a national name authority file. Examples:
|
|
|
|||||||
| │ | │ | ├ | <editor> | If applicable, use one or more <editor> elements (one name per element) to encode the names of entities besides those in <author> elements that acted as editors of the TEI document—usually, the editor(s) of the source document. If considered appropriate by the encoding project, the editor of the TEI document should be entered here. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the form of the name from a national name authority file. Unlike in the TEI Guidelines, do not use this element for translators, illustrators, compilers, or other roles not generally considered an editor. Therefore, do not use the @role attribute. |
|
|
|||||||
| │ | │ | ├ | <respStmt> | Record the names of other persons or organizations, one responsibility or party per <respStmt>, that have responsibility for the intellectual or artistic content of the TEI document—often by transitivity from the source document—not covered by <author> and <editor>. This includes translators, illustrators, compilers, proofreaders, encoders, and those who wrote a preface or introduction. Each <respStmt> should contain either:
|
|
|
|||||||
| │ | │ | └ | <meeting> | Optionally, record the name of a meeting or conference when this name is not clear from information in other parts of the <fileDesc>. Whenever possible, establish or use the form of the name from a national name authority file. |
|
|
|||||||
| │ | ├ | <editionStmt> | This element contains information about the edition of the TEI document produced, not the source document. | 250 | n/a | ||||||||
| │ | ├ | <publicationStmt> | Use the child elements below (rather than <p>) for a prose description. | n/a | n/a | ||||||||
| │ | │ | ├ | <publisher> | The publisher is the party responsible for making the file (the TEI document, not the source document) public. |
|
n/a | |||||||
| │ | │ | ├ | <distributor> | The distributor is the party from whom copies of the file (the TEI document, not the source document) can be obtained. Often the same as <publisher>, in which case no <distributor> should be given. | 260 $b ($b is repeatable) | n/a | |||||||
| │ | │ | ├ | <authority> | Only used for a text (the TEI document, not the source document) that is not formally published, but is nevertheless made available for circulation, in which case the party who makes it available should be recorded here. | 500 | n/a | |||||||
| │ | │ | ├ | <idno> | Any unique identifier for the TEI document as determined by the publisher of the TEI document. Use of this element is recommended. Optionally use a @type attribute to indicate the type of identifier. |
|
n/a | |||||||
| │ | │ | ├ | <availability><p> | Provide a prose rights statement for the TEI document. Provide a standard license, such as one from Creative Commons, if possible. Provide information on all applicable rights: rights in the original work, rights in page images of the source document, and rights in the encoded text. | 540 | n/a | |||||||
| │ | │ | └ | date when="____"/ | Refers to the date of the first publication of the TEI document. Use the @when attribute (see [http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.datable.w3c.html att.datable.w3c class]) to aid machine processing. This element has no content. |
|
n/a | |||||||
| │ | ├ | <seriesStmt> | This element contains information about the electronic series being created. It has one recommended element (<title>) and other optional elements. | n/a | n/a | ||||||||
| │ | │ | └ | title level="s" type="_" | Required for the title of the series. Whenever possible, establish or use the form of the name from a national name authority file for the electronic series being created. Use of the @type attribute is optional, but if it is used, it should follow [http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-title.html instructions for use of this element in the full TEI Guidelines]. |
|
n/a | |||||||
| │ | └ | <notesStmt> | Optional. | 5xx | 5xx | ||||||||
| │ | └ | <sourceDesc> | Use one <sourceDesc> per source document. Metadata for the source document may be automatically generated from a MARC record. | n/a | n/a | ||||||||
| │ | └ | <biblStruct> | Use <biblStruct> with child elements arranged in the order below for ease of display according to ISBD. (This element is used instead of <bibl> to enforce structure, but <biblFull> is not used because it requires more elements than are typically available in library metadata sources. | n/a | n/a | ||||||||
| │ | ├ | <analytic> | Use this element to group together elements describing the object of encoding when it would not have a corresponding catalog record—for example, an article in a journal issue, a chapter in a book, or a poem in a collection. '''If the object of encoding would have a corresponding catalog record, omit this element and its children.''' | n/a | n/a | ||||||||
| │ | │ | ├ | <author> | One or more <author> elements (one name per element) are used to encode the name for the personal author or corporate body responsible for the creation of the intellectual or artistic content of the object of encoding. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the form of the name from a national name authority file. | n/a | n/a | |||||||
| │ | │ | └ | title level="a" type="_" |
At least one <title> element is required for the title of the object of encoding. Transcribe the title according to the national cataloging code.
Use of the @type attribute is recommended. It should have one of the following values as suitable in local practice:
|
n/a | n/a | |||||||
| │ | ├ | <monogr> | Use this element to group together the elements describing the bibliographic item that has (or would have) a corresponding catalog record. The TEI definition of this element specifies that it is used even for works that might not otherwise be considered “monographs,” so bibliographic data about a journal title would be included in this element. | n/a | n/a | ||||||||
| │ | │ | ├ | <author> | One or more <author> elements (one name per element) are used to encode the name for the personal author or corporate body responsible for the creation of the intellectual or artistic content of the source document bibliographic item, even if this creator is not the main entry in the catalog record. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the form of the name from a national name authority file. |
|
|
|||||||
| │ | │ | ├ | title level="_" type="_" |
At least one <title> element is recommended for the title of the source document bibliographic item. Transcribe the title according to the national cataloging code.
Use of the @level attribute is optional. If used, it should be used [http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-title.html as in the main TEI Guidelines].
Use of the @type attribute is recommended. It should have one of the following values as suitable in local practice:
|
|
|
|||||||
| │ | │ | ├ | <respStmt> | Statement of responsibility on the source document bibliographic item, according to the national cataloging code. Record
one responsibility or party per <respStmt>. Each <respStmt> should contain either:
If generating the <sourceDesc> from a MARC record, it will be difficult to split the content of the 245c field into <resp> and <persName> (or <orgName>) elements, so it is recommended to use title type="marc245c" instead of this element. |
245 $c | 245 $c | |||||||
| │ | │ | ├ | <meeting> | Optionally, record the name of a meeting or conference when this name is not clear from information in other parts of the <sourceDesc>. Whenever possible, establish or use the form of the name from a national name authority file. |
|
|
|||||||
| │ | │ | ├ | <edition> | Edition statement (if present) according to the national cataloging code. |
|
|
|||||||
| │ | │ | ├ | <imprint> | n/a | n/a | ||||||||
| │ | │ | │ | ├ | <pubPlace> | Place of publication from the source document bibliographic item according to the national cataloging code. Optionally remove ISBD punctuation for separating areas of the bibliographic description (such as a colon) when deriving from a MARC record. However, leave brackets that indicate supplied information or an abbreviation like "S.l." (for no place of publication). |
|
|
||||||
| │ | │ | │ | ├ | <publisher> | Name of publisher, distributor, etc. from the source document bibliographic item according to the national cataloging code. Optionally remove ISBD punctuation for separating areas of the bibliographic description (such as a comma) when deriving from a MARC record. However, leave brackets that indicate supplied information or an abbreviation like "s.n." (for no publisher). |
|
|
||||||
| │ | │ | │ | └ |
date when="____" or date notBefore="____" notAfter="____" or date from="____" or date to="____" or date from="___" to="____" |
Date of publication, distribution, etc. from the source document bibliographic item. The content of the element is the statement of this data according to the national cataloging code.
Since the content of the element according to the national cataloging code is not easily processed by machine, when possible include the following attribute(s) with valid values: ''either'' @when, ''or'' both @notBefore and @notAfter, ''or'' one or both of @from and @to.
National cataloging codes may distinguish between a possible range of dates for publication (such as "186-" for something certainly published during the 1860s) and an uncertain date of publication (such as "1864?" or "186-?" for a date or range of dates assumed by the cataloger). In the case of uncertainty, use cert="low".
If the date is unknown (for example, recorded according to the national cataloging code as "[n.d.]", use cert="unknown".
|
|
260 $c | ||||||
| │ | │ | └ | <extent> | Use of this element to describe the extent of the source document bibliographic item is recommended. If the data is generated by hand, it should include a comprehensible statement of the size of the item, such as the number of pages or leaves. If generated from a catalog record, there should be two <extent> elements: one for the extent of the item (e.g., number of pages) and other physical details, and a second one for the dimension(s). Both should be recorded according to a national cataloging code. |
|
|
|||||||
| │ | ├ | <series> | Information about the series to which the source document bibliographic item belongs, given according to the national cataloging code. If generating this data from a catalog record, it is likely that you will have only one child element: a title level="s". Use of the @type attribute on the <title> element is optional, but if it is used, it should follow instructions for use of this element in the full TEI Guidelines. |
|
|
||||||||
| │ | ├ | <note> | Optionally, use for notes about the source document bibliographic item, given according to a national cataloging code. |
|
5xx | ||||||||
| │ | ├ | <idno> |
Optionally use one or more <idno> elements to give identifiers for the source document, text, or work of the bibliographic item, whether assigned by the holding library (such as a call number), the publisher of the original document (such as an ISBN), or a standard bibliography (such as an identifier from the Short Title Catalogue or Books in Maori). Use the following values for the @type attribute if applicable, and create other values if appropriate:
|
|
|
||||||||
| │ | └ | <relatedItem> | Use this element and its children to reference a related work, if applicable. | n/a | n/a | ||||||||
| │ | └ | <bibl> | n/a | n/a | |||||||||
| │ | ├ | <author> | Optionally use one or more <author> elements (one name per element) to encode the name for the personal author or corporate body responsible for the creation of the intellectual or artistic content of the related work. Use <persName> or <orgName> when applicable. Whenever possible, establish or use the form of the name from a national name authority file. | n/a | n/a | ||||||||
| │ | └ | title type="_" |
At least one <title> element is recommended for the title of the related work. Transcribe the title according to the national cataloging code.
Use of the @level attribute is recommended. If used, it should
be used as in the main TEI Guidelines.
Use of the @type attribute is optional. It should have one of the following values as suitable in local practice:
|
740 | 740 | ||||||||
| ├ | <encodingDesc> | n/a | n/a | ||||||||||
| │ | ├ | <projectDesc><p> | Enter a description of the purpose for which the electronic file was encoded. | 500 | n/a | ||||||||
| │ | ├ | editorialDecl n="_" |
Use of the @n attribute is recommended to record the encoding level: 1 for Level 1, 2 for Level 2, etc.
Include one or more <p> elements as children with information on:
|
|
n/a | ||||||||
| │ | ├ | <tagsDecl> | n/a | n/a | |||||||||
| │ | │ | ├ | rendition xml:id="_" scheme="css" | Include one or more <rendition> elements for each unique value of a @rendition attribute (not @rend attribute) used in the body of the TEI document. The @xml:id attribute is required in order to provide an identifier to which @rendition attributes in the body refer. | n/a | n/a | |||||||
| │ | │ | └ | namespace name="http://www.tei-c.org/ns/1.0"<tagUsage> | <tagUsage> should be one of the following:
|
n/a | n/a | |||||||
| │ | └ | <classDecl>taxonomy xml:id="____"<bibl> |
Use to document classification schemes and controlled vocabularies
referenced by a @scheme attribute elsewhere in the header or
body of the TEI document. For example:
|
050-099 for call number classification schemes 6xx 2nd indicator or 6xx $2 when 2nd indicator = 7 for subject classification schemes | 050-099 for call number classification schemes 6xx 2nd indicator or 6xx $2 when 2nd indicator = 7 for subject classification schemes | ||||||||
| ├ | <profileDesc> | n/a | n/a | ||||||||||
| │ | ├ | <langUsage> | Optionally use this element and child <language> elements to list languages used in the text. This supplements the @xml:lang attribute on the <text> (which is outside the header) in cases where more than one language is used in the text. It is not expected that the <langUsage> element will contain any description of language usage. | 008/35-37 | n/a | ||||||||
| │ | │ | └ | language ident="___" | Use one or more <language> elements to indicate language(s) used in the source document. Use of the @ident attribute is required as in the full TEI guidelines. Since the value of this attribute is usually sufficient to indicate the language, the <language> element should normally have no content. In the unusual case where @ident is insufficient, provide additional information about the language as content of the element. |
|
|
|||||||
| │ | └ | <textClass> | n/a | n/a | |||||||||
| │ | ├ | classCode scheme="___" | True classification numbers as opposed to call numbers may be entered here. The value of the scheme attribute corresponds to a classification scheme defined previously in <classDecl>.
Example: scheme="#LCC" |
050-099 | 050-099 | ||||||||
| │ | └ | keywords scheme="____" | Repeat this element as many times as there are keyword schemes.
The value of the @scheme attribute is a URI for a controlled
or uncontrolled vocabulary. The URI may be absolute to a version
online or to one defined previously in <classDecl>.
Example: scheme="#LCSH" |
6xx 2nd indicator or 6xx $2 when 2nd indicator = 7 | 6xx 2nd indicator or 6xx $2 when 2nd indicator = 7 | ||||||||
| │ | └ | <term> | Use for terms from controlled or uncontrolled vocabularies as defined according to the containing <keywords> element. | 6xx | 6xx | ||||||||
| └ | <revisionDesc> | n/a | n/a | ||||||||||
| └ | change when="''YYYY-MM-DD''" who="''URI''" |
Create a <change> element to record each significant change to the TEI document, in reverse chronological order (i.e., most recent first). A prose description of the change is recorded as the content of each <change> element. This prose may contain lists for organization, and phrase-level markup (like <gi>, <ptr>, or <date>), but not paragraphs.
The date of the change should be recorded using the @when
attribute (see att.datable.w3c class).
The person who is responsible for making the change should be indicated by the @who attribute of <change>. Its value is a URI that points to a <respStmt> or <person> that encodes information about the responsible party. Note that this reference is a URI reference and not an ID/IDREF reference, and thus is not checked by validation software. Small projects sometimes take advantage of this by putting information into the URI itself, and not having a <respStmt> or <person> element. For example, the document might simply give who="#Jane_Smith", relying on human readers to understand this reference.
|
n/a | n/a | |||||||||
* Use only if TEI header metadata is based on the source document, not the encoded text.
Note that this is not a ‘TEI conformant’ customization, because it does not follow the TEI abstract model. However, this is a ‘syntactically conformant’ customization, in that documents that are valid against this scheme will also be valid against the TEI_all schema.
To create electronic text with the primary purpose of keyword searching and linking to page images. The primary advantage in using the TEI at this very strictly limited level of encoding is that a TEI header is attached to the text file.
The text is subordinate to the page image, and is not intended to stand alone as an electronic text (without page images). Level 1 texts are not intended to be adequate for textual analysis; they are more likely to be suited to the goals of a preservation unit or mass digitization initiative. Though their encoding is minimal, Level 1 texts are fully valid XML texts. In addition to taking advantage of the TEI header, these texts, while lightly encoded, can be easily combined with more richly encoded texts (that also follow these guidelines) for searching. Further encoding based on document structures or content analysis can be added to a Level 1 text at any time.
Texts at Level 1 can be created and encoded by fully automated means. Page images are scanned and processed using OCR, but the text is left uncorrected ("dirty OCR"). Page images are tagged using software that assigns page-level metadata (page number and possibly tags for page features) to each page image for display in the user interface in a list of pages. Encoding is performed automatically: markup with page-level metadata is inserted at selected points into the dirty OCR text, generating a valid XML document. This encoding is both minimal and reliable, and does not typically require extensive review of each page of each text.
| <div> or <div1> | There should be only one child of <body>: a single <div> (or <div1>). |
| <ab> | There should be only one child of the <div> (or <div1>): a single <ab> wrapping all of the OCR text. If the text is ever “upgraded” to Level 3 or higher, the <ab> element will be replaced by structural elements like <p> and <table>. |
| <pb> or <facsimile> | See the explanation above for how to link between the encoded text and images of source documents. If using <pb>, it is recommended to put the element within an <ab> element. |
For technical reasons, the TEI namespace is not displayed in examples. However, a TEI namespace declaration is required. It is typically given once on the TEI root element, e.g. TEI xmlns="http://www.tei-c.org/ns/1.0".
Note that this is a ‘syntactically conformant’ customization, in that documents that are valid against this scheme will also be valid against the TEI_all schema. However, it is unkown whether or not it is truly ‘TEI conformant’, as the TEI Guidelines do not make clear whether or not encoding of individual paragraphs is mandatory.
To create electronic text for full-text searching, linking to page images, and identifying simple structural hierarchy to improve navigation. (For example, you can create a table of contents from such encoding.)
The text is mainly subordinate to the page image, though navigational markers (textual divisions, headings) are captured. However, the text could stand alone as electronic text (without page images) if the accuracy of its contents is suitable to its intended use and it is not necessary to display low-level typographic or structural information. Use cases for Level 2 require a set of elements more granular than those of Level 1, including bibliographic or structural information below the monographic or volume level. One of the motivations for using Level 2 is to avoid expensive analysis of textual elements and/or the expense of accurate text conversion, e.g., double-keying or detailed proofreading of automatic OCR.
For the most part, Level 2 texts are not intended to be displayed separately from their page images. Level 2 encoding of sections and headings provides greater navigational possibilities than Level 1 encoding, and enables searching to be restricted within particular textual divisions (for example, searching for two phrases within the same chapter).
Level 2 generally can be created and encoded by automated means. Pagination is identified as in Level 1, and metadata for the textual divisions is created, likely based on the page images. The textual division metadata might contain the page number on which the division begins and a transcription of that division's heading. This metadata is inserted into the raw OCR at the appropriate points, forming a valid XML document. Level 2 texts do not require any special knowledge or manual intervention below the section level.
| <front>, <back> | Optional. Contains one or more <div> or <div1>. |
| <body> | Contains one or more <div> or <div1>. |
| <div1> or <div> | Unlike in Level 1, in Level 2 one <div> or <div1> is used per section of the text identified with division-level metadata. If no @type attribute is specified, a @type value of section should be presumed. |
| <head> | Recommended if headings are present. As in the TEI, this element must be the first child of a <div> or <div1>. |
Note that for technical reasons the namespace is not shown in these examples, but it should always be supplied on the root <TEI> element, e.g.: TEI xmlns="http://www.tei-c.org/ns/1.0".
Note that this is intended to be a ‘TEI conformant’ customization, per P5 section 23.3.
To create a stand-alone electronic text and identify hierarchy (logical structure) and typography without content analysis being of primary importance.
Encoding at this level provides the foundation for upgrading to higher levels of encoding. Level 3 generally requires some human editing, but the features to be encoded are determined by the logical structure and appearance of the text and not specialized content analysis.
Level 3 texts identify front and back matter, textual divisions, and all paragraph breaks. Floating texts, or sub-texts like a poem or letter embedded in the greater text, are supported in this level. The finer granularity of encoding these features, as well as figures, notes, and all changes of typography, allows a range of options for display, delivery, and searching. For example, one has the option of identifying, and therefore specifying, the display characteristics of different typographic styles, and regularizing the display and placement of note text.
Level 3 texts can stand alone as text without page images, and therefore can be uploaded, downloaded, and delivered quickly, and require less storage space than digital collections with page images. However, the simple level of structural analysis and absence of specialized content analysis reflected in Level 3 encoding may make it desirable for some, depending on project priorities, to include page images in order to provide users with a fuller set of resources.
Level 3 texts can be created by conversion from an electronic source such as an HTML file or word-processor document or from a print source, either through OCR or keyboarding. They can be generated trivially by converting from outsourced double-keyboarded texts conforming to TEI Tite, though some granularity of encoding will be lost in the translation.
| <front>, <back> | Recommended if present. |
| <div> or <div1> | At least one is recommended within each of <front>, <body>, and <back>; @type attribute is recommended. |
| <p> | Recommended for paragraph breaks in prose. |
| <lg> and <l> | Recommended for identifying groups of lines and lines, respectively. |
| <figure> and appropriate child elements | Recommended to refer to illustrative images and descriptive information about those images. |
| <floatingText> | Optionally used to indicate a floating text. |
| <note> | Recommended for notes. |
| <ptr> and <ref> | If a table of contents is encoded, recommended for linking to sections of the document. If notes are encoded at the point they occur in the text or at another point convenient when converting from a born-digital source document, recommended for encoding the point of reference. |
| <hi> | Recommended to indicate changes in typeface; @rend attribute is optional. |
| <list> and <item> | Optionally used to indicate ordered and unordered list structures. |
| <table>, <row>, and <cell> | Optionally used to indicate table structures. |
| <lb> | Optionally used to indicate line breaks. |
| <cb> | Optionally used to indicate column breaks. |
Running heads, catch words, page numbers, signatures, and other artifacts derived from printing should not be included in Level 3, with the exception of page numbers, which are recorded using the @n attribute on <pb>. If upgrading a text from Level 1 or Level 2 that was generated using OCR, discard the forme work information.
You may wish not to include front matter content such as table of contents or lists of illustrations, especially if you plan to automatically generate the contents or lists of illustrations. If you do, however, plan to manually encode the table of contents (or lists of illustrations and similar content), use a <div> (or <div1>) element with an appropriate @type attribute (e.g., div type="contents"). Within this division, use the <list> element to mark up the table of contents, list of illustrations, etc. Each list item should have a <ptr> or <ref> element with a @target attribute referencing an @xml:id attribute on the <pb> or on the <div> (or <div1>) of the referenced page or section. Use <ref> if you wish to transcribe page numbers in the table of contents; use <ptr> if you do not.
Use the <note> element to encode the text of a margin note, footnote, endnote, or other note found in the source document. This element may be used for encoding notes "inline" at the point of reference (such as where a superscript number appears), as in the Alger Hiss example below. In the case of conversion from OCR and from some born-digital source documents, this will require manual intervention to move the text of the note to the place of reference.
place="margin".Optionally combine notes that extend beyond one page into one <note>.
Note that for technical reasons the namespace is not shown in these examples, but it should always be supplied on the root <TEI> element, e.g.: TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="MBFG0236".
Note that for technical reasons the namespace is not shown in this example, but it should always be supplied on the root <TEI> element, e.g.: TEI xmlns="http://www.tei-c.org/ns/1.0".
Note that this is intended to be a ‘TEI conformant’ customization, per P5 section 23.3.
To create text that can stand alone as electronic text, identifies hierarchy and typography, specifies function of textual and structural elements, and describes the nature of the content and not merely its appearance. This level is not meant to encode or identify all structural, semantic, or bibliographic features of the text.
Finally, functionally accurate encoding in Level 4 texts allows them to be searched or displayed in sophisticated ways. For example, a searcher could limit his or her search in a dramatic text to stage directions or in a verse text to only first lines. In a political tract published by subscription, a search could be confined to names that appear in lists, thus limiting a search to names of people who subscribed to a particular volume. This ability to limit searches becomes more significant as textbases become larger, and thus is of great importance to the library community as it attempts to build into the initial design and implementation of textbases the features needed to enhance interoperability.
Text is generated by keyboarding (likely outsourced double keyboarding from page images using TEI Tite) or possibly by correcting OCR text using software that identifies spelling mistakes and consults a log from the OCR software to find regions of uncertainty in the OCR text. If converting from TEI Tite, minimal additional markup should be added, as discussed in Appendix A of TEI Tite.
Use all elements specified in Levels 1, 2, and 3 except <ab>, plus elements in the following table. Note that some of these elements are defined in Level 3 as well, but their use in Level 4 is more strict.
| <titlePage> and appropriate child elements | Recommended. |
| <group> | Recommended to encode a collection of independent texts that are regarded as a single group for processing or other purposes. |
| <div> or <div1>, <div2>, <div3>, etc. | Recommended for encoding a hierarchy of textual divisions. Use as many levels of hierarchy as needed to represent the source document. |
| <head> | Recommended if headings are present. As in TEI, this element must be the first child of a textual division. |
| <floatingText> | Recommended when a floating text is identified. |
| <list> and <item> | Recommended to indicate ordered and unordered list structures. |
| <table>, <row>, and <cell> | Recommended to indicate table structures. |
| <hi> | Recommended to indicate change in rendition when a more specific element is not being used; @rend attribute is optional. |
| <opener>, <dateline>, <salute> <closer>, <signed>, <postscript> | Recommended to indicate specific parts of letters. |
| <castList>, <castItem>, <sp>, <speaker>, and <stage> | Recommended to encode different structures in performance texts (i.e. drama). |
| <sp> and <speaker> | Recommended to encode oral history interviews. |
| <epigraph> | Recommended for encoding epigraphs found as front matter |
| quote rend="___" | Recommended for encoding blockquotes that appear outside the flow of a paragraph. In the @rend attribute, give a CSS declaration-block (such as padding-left: 0.5in;) |
| <argument> | Recommended to encode a list of topics sometimes found at the start of a chapter or other textual division. |
| <trailer> | Recommended to encode a closing title or footer at the end of a division. |
| <quote>, <said>, <mentioned>, or <soCalled> | Optional. |
| <emph>, <foreign>, <gloss>, or <term> | Optional. |
| title type="_" | Optional within the <text> (not the <teiHeader>), especially when text is typographically distinct. Optioanlly use the @type attribute with a value as given in the full TEI guidelines except for main titles. (The main value should be used, when appropriate, for <title>s within a TEI header, but is not needed for <title>s elsewhere in a document.) |
| <ptr> and <ref> | In addition to using to point to notes (as in Level 3), optionally use for identifying cross-references within the text. |
| <sic>, <corr>, or <choice> | Optionally use to encode errors or typos. |
| <add>, <del>, <gap>, and <unclear> | Optionally use to encode material that is added, marked for deletion, or is illegible, invisible, or inaudible. |
| <persName>, <placeName>, <geogName>, and <orgName> | Optionally use to encode personal, place, and organizational names used in a text. |
| <listName>, <listPlace>, and <listOrg> | Optionally use in support of personal, place, and organizational names normalization and to capture additional information about the names. Should be captured in an external TEI file or database for easier maintenance of names. |
| <listBibl> | Optionally use in support of bibliographies. Should contain a series of <bibl> elements, which may be further encoded using elements such as <author>, <title>, <publisher>, <biblScope>. |
There are many optinoal but not recommended elements at Level 4. While content for many of these elements can be identified within running prose based on changes in typography or use of quotation marks in the source document, they are not always so easily idenitified, or they may occur so often that identification of each instance is impractical. Use only those optional elements that are appropriate for your users' needs and your encoding budget.
The presence of common front matter referring to the whole collection, possibly in addition to front matter relating to each individual text, is a good indication that a given text might usefully be encoded in this way.
Names should be encoded using <persName>, <placeName>, <geogName>, and <orgName> elements with the @ref or @key attribute providing a reference to a <person>, <place>, or <org> element in an external file or database for managing name normalization and compilation of additional information such as biographical or geospatial information. See the discussion of @ref and @key above for how to choose between them.
If using @key, provide a unique internal identifier, such as in a local database.
If using @ref, an external TEI file may contain an entry for each name, grouped accordingly under <listPerson>, <listPlace>, and <listOrg>, which is uniquely identified with an @xml:id attribute. In such a case the value of the @ref attribute in the main TEI document (the transcription of the source document) references the value of the @xml:id attribute in the external file. (In the examples below, the external file is named @context.xml for ‘contextual information’ and is in the same directory as the source file, but it may be named anything and placed anywhere that can be referenced by a URI.)
When referencing external files or databases, it is strongly recommended to provide an explanation in the <editorialDecl> section of the TEI header. References to controlled vocabularies and national or local authority files may be signified by a prefix in the @xml:id attribute (e.g., tgn_0000000 for the Getty Thesaurus of Geographic Names). When referencing a controlled vocabulary be sure to specify this information in the <classDecl> section of the TEI header.
If the embedded text is more than a short quotation, use <floatingText> even if the instance is still only an excerpt of the embedded text.
Within the front matter (<front>) of a performance text, cast lists must be encoded as <castList>s, with each item in that list encoded as a <castItem>. If desired, each <castItem> may be uniquely identified with an @xml:id attribute.
Speakers in oral history interviews, i.e. interviewee(s) and interviewer(s), may be identified in the <teiHeader> as a list of <author> elements (typically each with a single <persName>) within <fileDesc> / <titleStmt>.
Use <lg> and <l> as in Level 3. In addition, use the @rend attribute to indicate lines that are indented.
Note that for technical reasons the namespace is not shown in this example, but it should always be supplied on the root <TEI> element, e.g.: TEI xmlns="http://www.tei-c.org/ns/1.0".
Level 5 texts are those that require substantial human intervention by encoders with subject knowledge. These texts might include encodings of semantic, linguistic, prosodic, or other features well beyond the basic structural elements discussed in Levels 1-4 above. They might also include elements for editorial, critical, or analytical additions; manuscript descriptions; translations; or other textual apparatus. It is impossible to make concrete recommendations for encoding at this level since the scholarly analysis required is usually specific to each project; instead, Level 5 offers the full set of P5 elements as needed.
To create deeply analytical encoded texts that might be appropriate for specific research purposes, as part of a scholarly publishing project, or for any other encoding practices in library-based text encoding.
A significant number of library-based projects engage in high-level analytical text encoding as part of their efforts in digitization, scholarly editing, academic support, or other research. Level 5 is intended to represent that work, which can take advantage of the full richness of the complete TEI Guidelines, while still acknowledging the impact of library-specific practices on encoded text that is created under the auspices of a library.
The specific influences of library practice on a Level-5 encoded text are expressed primarily in adherence to the General Recommendations and TEI Header sections above.
Because of the vast range of possibilities for Level-5 encoding, these Best Practices have chosen to provide neither a list of recommended elements nor any specific examples for this Level.
Please refer to the TEI Header section above for recommendations for the <teiHeader>, and to the General Recommendations section and the Complete TEI P5 Guidelines for element recommendations and usage examples within the <text>.
This document is the result of a group of individuals with a range of experience with TEI text encoding, which formed together under the TEI Special Interest Group on Libraries and Digital Library Federation umbrellas. We would like to thank and acknowledge all of those who have given their time and expertise to develop these Best Practices.
Lastly, we would like to thank the Digital Library Federation (DLF) for sponsoring two in-person meetings as part of the Spring 2008 Forum in Minneapolis, Minnesota, and the Spring 2009 Forum in Raleigh, North Carolina, in support of our revision work. The DLF also provided teleconferencing support for our regularly scheduled meetings.
The following is a list of attributes recommended for use in TEI documents created according to these Best Practices. It includes both attributes explicitly mentioned in these Best Practices as well as those that the contributors to these Best Practices find likely to be used with the elements mentioned.
This document was formerly known as TEI Text Encoding in Libraries Guidelines for Best Encoding Practices.
The Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange (referred to as the TEI Guidelines) were first published in 1994 and represent a tremendous achievement in electronic text standards by providing a highly sophisticated structure for encoding electronic text. Digital librarians have benefited greatly from the standardization provided by these guidelines, and the potential for interoperability and long-term preservation of digital collections facilitated by their wide adoption.
In 1998, the Digital Library Federation (DLF) sponsored the TEI and XML in Digital Libraries Workshop at the Library of Congress to discuss the use of the TEI Guidelines in libraries for electronic text, and to create a set of best practices for librarians implementing them. From this workshop, three working groups were formed, the members of which represented some of the largest and most mature digital library programs in the U.S.
Group 1 was charged to recommend some best practices for TEI header content and to review the relationship between the Text Encoding Initiative header and MARC. To this end, representatives of the University of Virginia Library and the University of Michigan Library gathered in Ann Arbor in early October 1998 to develop a recommended practice guide. This work was assisted by similar efforts that had taken place in the United Kingdom under the auspices of the Oxford Text Archive the previous year. The section on the header is based on a draft of those recommended practices. It was submitted to various constituencies for comment. In 2008 and 2009, it was heavily revised by Melanie Schlosser, Kevin Hawkins, and other members of the TEI SIG on Libraries.
At the ALA Midwinter Meeting (January 1999), the DLF task force revised a draft set of best practices, called TEI Text Encoding in Libraries: Guidelines for Best Practices (often referred to as TEI in Libraries Guidelines). The revised recommendations were circulated to the conference working group in May 1999 and presented at the joint annual meeting of the Association of Computers and the Humanities and Association of Literary and Linguistic Computing in June 1999. Version 1.0 was circulated for comments in August 1999. These guidelines were endorsed by the DLF, and have been used by many digital libraries, including those of the task force members, as a model for their own local best practices. Libraries, museums, and end-users have benefitted from a set of best practices for electronic text in a number of ways, including better interoperability between electronic text collections, better documented practices among digital libraries, and a starting point for discussion of best practices with commercial publishers regarding electronic text creation.
Written in 1998, this first iteration of TEI in Libraries Guidelines made no mention of XML, XSLT, or any of the other powerful tools that have now become common parlance and practice in creating digital documents and collections. Based on these important changes in markup technology, it came to the attention of the DLF and members of the original Task Force that the TEI in Libraries Guidelines required substantial revision. In 2002, the TEI Consortium published a new edition of the complete TEI Guidelines that conformed to XML specifications. In order to remain useful, the TEI in Libraries Guidelines had to be updated to reflect these developments.
Furthermore, librarians need more guidance than the original TEI in Libraries Guidelines provided. There are many library-specific encoding issues which need to be addressed and documented to ensure consistency. The intention of this document is to provide recommended paths of encoding for these issues.
In addition, these library guidelines have the potential to be much more useful if they can serve as a training document from which librarians can learn about text encoding and addressing particular encoding challenges. To fulfill this role, the guidelines require more examples and detailed explanations, giving documentation of the use of TEI in a library context. Librarians also need a set of standards and best practices for vendors and publishers who create electronic text for digital libraries, so that these collections adhere to the same archival standards as locally-created electronic text collections. With detailed guidelines that could serve as an encoding specification, librarians might encourage vendors to follow the principles in these standards, to facilitate the long-term preservation of commercially published electronic text collections, and more readily allow for cross-collection searching.
The group then released Version 2.1 in March 2006.
Work continued through conference calls, in which Renee McBride (University of North Carolina, Chapel Hill) and Richard Wisneski (Case Western University) also participated, and at a DLF-sponsored meeting that took place as part of the DLF Spring Forum in Raleigh, North Carolina on May 6, 2009.
In April 2009, a year after the revision work began, the significantly revamped Best Practices soon to be known as Best Practices for TEI in Libraries (version 3) were disseminated for public comment. At DLF that year, a Birds-of-a-Feather session entitled TEI Text Encoding in Libraries was held to gather in-person public feedback. Comments received at the in-person meeting, from the TEILIB-L listserv, through a survey, and by direct email were gathered and prioritized at the DLF meeting. Renee McBride (University of North Carolina, Chapel Hill) agreed to map header elements to MARC elements, and Vitus Tang (Stanford University) provided valuable comments. In addition to addressing most of the comments received, it was resolved that Syd Bauman will generate an ODD specification (One Document Does it All; schema, prose documentation, etc.) for levels 1-4, further ensuring interoperability of texts encoded according to these Best Practices.
The revised Best Practices contain updated versions of the widely adopted encoding ‘levels’ — from fully automated conversion to content analysis and scholarly encoding. They also contain a substantially revised section on the TEI header, designed to support interoperability between text collections and the use of complementary metadata schemas such as MARC. They also explore the relationship between METS and TEI and the relationship between these Best Practices and the new vendor specification, TEI Tite.
The new Best Practices also reflect an organizational shift. Originally authored by the DLF-sponsored TEI Task Force, the current revision work is a partnership between members of the Task Force and the TEI SIG on Libraries. As a result of this partnership, responsibility for the Best Practices will migrate to the SIG, allowing closer work with the TEI Consortium as a whole, and a stronger basis for advocating for the needs of libraries in future TEI releases.