The Association for Computers and the Humanities (ACH)
The Association for Computational Linguistics (ACL)
The Association for Literary and Linguistic Computing (ALLC)
Guidelines for Electronic Text Encoding and Interchange
Edited by C. M. Sperberg-McQueen and Lou Burnard
TEI P3 Text Encoding Initiative Chicago, Oxford
Copyright (c) 1990, 1992, 1993, 1994 ACH, ACL, ALLC
16 May 1994
Revised Reprint, Oxford, May 1999
In memoriam
Donald E. Walker
22 November 1928 - 26 November 1993
Introductory Note (May 1999)
Typographic corrections made
Specific changes in the DTD
Outstanding errors
Note
Acknowledgments
TEI Working Committees (1990-1993)
Advisory Board
Steering Committee Membership
Changes from TEI P1 to TEI P3
1 About These Guidelines
1.1 Structure and Notational Conventions of this Document
1.1.1 Structure
1.1.2 Notational Conventions
1.2 Underlying Principles and Intended Use
1.2.1 Design Principles of the TEI Scheme
1.2.2 Intended Use
1.2.2.1 Use in Text Capture and Text Creation
1.2.2.2 Use for Interchange
1.2.2.3 Use for Local Processing
1.3 Historical Background
1.3.1 Origin and Development of the TEI
1.3.2 Future Developments
2 A Gentle Introduction to SGML
2.1 What's Special about SGML?
2.1.1 Descriptive Markup
2.1.2 Types of Document
2.1.3 Data Independence
2.2 Textual Structure
2.3 SGML Structures
2.3.1 Elements
2.3.2 Content Models: An Example
2.4 Defining SGML Document Structures: The DTD
2.4.1 An Example DTD
2.4.2 Minimization Rules
2.4.3 Content Model
2.4.4 Occurrence Indicators
2.4.5 Group Connectors
2.4.6 Model Groups
2.5 Complicating the Issue: More on Element Declarations
2.5.1 Exceptions to the Content Model
2.5.2 Concurrent Structures
2.6 Attributes
2.7 SGML Entities
2.8 Marked Sections
2.9 Putting It All Together
2.9.1 The SGML Declaration
2.9.2 The DTD
2.9.3 The Document Instance
2.9.4 Ancillary Files
2.10 Using SGML
3 Structure of the TEI Document Type Definition
3.1 Main and Auxiliary DTDs
3.2 Core, Base, and Additional Tag Sets
3.2.1 The Core Tag Sets
3.2.2 The Base Tag Sets
3.2.3 The Additional Tag Sets
3.2.4 User-Defined Tag Sets
3.3 Invocation of the TEI DTD
3.4 Combining TEI Base Tag Sets
3.5 Global Attributes
3.6 The TEI2.DTD File
3.6.1 Structure of the TEI2.DTD File
3.6.2 Embedding Local Modifications
3.6.3 Embedding the Core Tag Sets
3.6.4 Embedding the Base Tag Set
3.6.5 Embedding the Additional Tag Sets
3.7 Element Classes
3.7.1 Classes Which Share Attributes
3.7.2 Classes Used in Content Models
3.7.3 The TEICLAS2.ENT File
3.7.4 Low-Level Element Classes
3.7.5 High-Level Element Classes
3.7.6 Elements Marked for Text Type
3.7.7 Standard Content Models
3.7.8 Components in Mixed and General Bases
3.7.9 Miscellaneous Content-Model Classes
3.8 Other Parameter Entities in TEI DTDs
3.8.1 Inclusion and Exclusion of Elements
3.8.2 Parameter Entities for Element Generic Identifiers
3.8.3 Parameter Entities for TEI Keywords
4 Characters and Character Sets
4.1 Local Character Sets
4.1.1 Characters Available Locally
4.1.2 Characters Not Available Locally
4.2 Shifting Among Character Sets
4.3 Character Set Problems in Interchange
4.4 The Writing System Declaration
5 The TEI Header
5.1 Organization of the TEI Header
5.1.1 The TEI Header and Its Components
5.1.2 Types of Content in the TEI Header
5.2 The File Description
5.2.1 The Title Statement
5.2.2 The Edition Statement
5.2.3 Type and Extent of File
5.2.4 Publication, Distribution, etc.
5.2.5 The Series Statement
5.2.6 The Notes Statement
5.2.7 The Source Description
5.2.8 Computer Files Derived from Other Computer Files
5.2.9 Computer Files Composed of Transcribed Speech
5.3 The Encoding Description
5.3.1 The Project Description
5.3.2 The Sampling Declaration
5.3.3 The Editorial Practices Declaration
5.3.4 The Tagging Declaration
5.3.5 The Reference System Declaration
5.3.5.1 Prose Method
5.3.5.2 Stepwise Method
5.3.5.3 Milestone Method
5.3.6 The Classification Declaration
5.3.7 The Feature System Declaration
5.3.8 The Metrical Declaration Element
5.3.9 The Variant-Encoding Method Element
5.4 The Profile Description
5.4.1 Creation
5.4.2 Language Usage
5.4.3 The Text Classification
5.5 The Revision Description
5.6 Minimal and Recommended Headers
5.7 Note for Library Cataloguers
6 Elements Available in All TEI Documents
6.1 Paragraphs
6.2 Treatment of Punctuation
6.3 Highlighting and Quotation
6.3.1 What Is Highlighting?
6.3.2 Emphasis, Foreign Words, and Unusual Language
6.3.2.1 Foreign Words or Expressions
6.3.2.2 Emphatic Words and Phrases
6.3.2.3 Other Linguistically Distinct Material
6.3.3 Quotation
6.3.4 Terms, Glosses, and Cited Words
6.3.5 Some Further Examples
6.4 Names, Numbers, Dates, Abbreviations, and Addresses
6.4.1 Referring Strings
6.4.2 Addresses
6.4.3 Numbers and Measures
6.4.4 Dates and times
6.4.5 Abbreviations and Their Expansions
6.5 Simple Editorial Changes
6.5.1 Correction of Apparent Errors
6.5.2 Regularization and Normalization
6.5.3 Additions, Deletions and Omissions
6.6 Simple Links and Cross References
6.7 Lists
6.8 Notes, Annotation, and Indexing
6.8.1 Notes and Simple Annotation
6.8.2 Index Entries
6.9 Reference Systems
6.9.1 Using the ID and N Attributes
6.9.2 Creating New Reference Systems
6.9.3 Milestone Tags
6.9.4 Declaring Reference Systems
6.10 Bibliographic Citations and References
6.10.1 Elements of Bibliographic References
6.10.2 Components of Bibliographic References
6.10.2.1 Analytic, Monographic, and Series Levels
6.10.2.2 Authors, Titles, and Editors
6.10.2.3 Imprint, Pagination, and Other Details
6.10.2.4 Series Information
6.10.2.5 Notes and Other Additional Information
6.10.2.6 Order of Components within References
6.10.3 Bibliographic Pointers
6.10.4 Relationship to Other Bibliographic Schemes
6.11 Passages of Verse or Drama
6.11.1 Core Tags for Verse
6.11.2 Core Tags for Drama
6.12 Overview of the Core Tag Set
7 Default Text Structure
7.1 Divisions of the Body
7.1.1 Un-numbered Divisions
7.1.2 Numbered Divisions
7.1.3 Numbered or Un-numbered?
7.1.4 Partial and Composite Divisions
7.2 Elements Common to All Divisions
7.2.1 Headings and Trailers
7.2.2 Openers and Closers
7.2.3 Arguments and Epigraphs
7.2.4 Content of Textual Divisions
7.3 Groups of Texts
7.4 Front Matter
7.5 Title Pages
7.6 Back Matter
7.7 DTD Fragment for Default Text Structure
8 Base Tag Set for Prose
9 Base Tag Set for Verse
9.1 Structure of the Base Tag Set for Verse
9.2 Structural Divisions of Verse Texts
9.3 Components of the Verse Line
9.4 Rhyme and Metrical Analysis
9.4.1 Sample Metrical Analyses
9.4.2 Segment-Level versus Line-level Tagging
9.4.3 Metrical Analysis of Stanzaic Verse
9.5 Rhyme
9.6 Encoding Procedures For Other Verse Features
10 Base Tag Set for Drama
10.1 Front and Back Matter
10.1.1 The Set Element
10.1.2 Prologues and Epilogues
10.1.3 Records of Performances
10.1.4 Cast Lists
10.2 The Body of a Performance Text
10.2.1 Major Structural Divisions
10.2.2 Speeches and Speakers
10.2.3 Stage Directions
10.2.4 Speech Contents
10.2.5 Embedded Structures
10.2.6 Simultaneous Action
10.3 Other Types of Performance Text
10.3.1 Technical Information
11 Transcriptions of Speech
11.1 General Considerations and Overview
11.1.1 Divisions
11.2 Elements Unique to Spoken Texts
11.2.1 Utterances
11.2.2 Pause
11.2.3 Vocal, Kinesic, Event
11.2.4 Writing
11.2.5 Temporal Information
11.2.6 Shifts
11.2.7 Formal Definition
11.3 Elements Defined Elsewhere
11.3.1 Segmentation
11.3.2 Synchronization and Overlap
11.3.3 Regularization of Word Forms
11.3.4 Prosody
11.3.5 Speech Management
11.3.6 Analytic Coding
12 Print Dictionaries
12.1 Dictionary Body and Overall Structure
12.2 The Structure of Dictionary Entries
12.2.1 Hierarchical Levels
12.2.2 Groups and Constituents
12.3 Top-level Constituents of Entries
12.3.1 Information on Written and Spoken Forms
12.3.2 Grammatical Information
12.3.3 Sense Information
12.3.3.1 Definitions
12.3.3.2 Translation Equivalents
12.3.4 Etymological Information
12.3.5 Other Information
12.3.5.1 Examples
12.3.5.2 Usage Information and Other Labels
12.3.5.3 Cross References to Other Entries
12.3.5.4 Notes within Entries
12.3.6 Related Entries
12.4 Headword and Pronunciation References
12.5 Typographic and Lexical Information in Dictionary Data
12.5.1 Editorial View
12.5.2 Lexical View
12.5.3 Retaining Both Views
12.5.3.1 Using Attribute Values to Capture Alternate Views
12.5.3.2 Recording Original Locations of Transposed Elements
12.5.4 Attributes for Dictionary Elements
12.6 Unstructured Entries
13 Terminological Databases
13.1 The Terminological Entry
13.2 Tags for Terminological Data
13.3 Basic Structure of the Terminological Entry
13.3.1 Nested Term Entries
13.3.2 Flat Term Entries Using Rules of Adjacency
13.3.3 Flat Term Entries Using Group and Depend Attributes
13.3.4 References between Term Entries
13.4 Overall Structure of Terminological Documents
13.4.1 DTD Fragment for Nested Style
13.4.2 DTD Fragment for Flat Style
13.5 Additional Examples of Term Entries
13.5.1 Example Term Entry from ISO 472
13.5.2 The Example Treated as a Single Term Entry in Nested Form
13.5.3 The Example Treated as Two Separate Term Entries in Nested Form
13.5.4 The Example Treated as a Flat Term Entry Using Adjacency Rules
13.5.5 The Example Treated as a Flat Term Entry Not Using Adjacency Rules
14 Linking, Segmentation, and Alignment
14.1 Pointers
14.1.1 Pointers and Links
14.1.2 Using Pointers and Links
14.1.3 Groups of Links
14.1.4 Intermediate Pointers
14.2 Extended Pointers
14.2.1 Extended Pointer Elements
14.2.2 Extended Pointer Syntax
14.2.2.1 Location Ladders
14.2.2.2 Location Terms
14.2.2.3 The ROOT Keyword
14.2.2.4 The HERE Keyword
14.2.2.5 The ID Keyword
14.2.2.6 The REF Keyword
14.2.2.7 The CHILD Keyword
14.2.2.8 The DESCENDANT Keyword
14.2.2.9 The ANCESTOR Keyword
14.2.2.10 The PREVIOUS Keyword
14.2.2.11 The NEXT Keyword
14.2.2.12 The PRECEDING Keyword
14.2.2.13 The FOLLOWING Keyword
14.2.2.14 The PATTERN Keyword
14.2.2.15 The TOKEN Keyword
14.2.2.16 The STR Keyword
14.2.2.17 The SPACE Keyword
14.2.2.18 The FOREIGN Keyword
14.2.2.19 The HYQ Keyword
14.2.2.20 The DITTO Keyword
14.2.3 Using Extended Pointers
14.3 Blocks, Segments and Anchors
14.4 Correspondence and Alignment
14.4.1 Correspondence
14.4.2 Alignment of Parallel Texts
14.4.3 A Three-way Alignment
14.5 Synchronization
14.5.1 Aligning Synchronous Events
14.5.2 Placing Synchronous Events in Time
14.6 Identical Elements and Virtual Copies
14.7 Aggregation
14.8 Alternation
14.9 Connecting Analytic and Textual Markup
15 Simple Analytic Mechanisms
15.1 Linguistic Segment Categories
15.2 Global Attributes for Simple Analyses
15.3 Spans and Interpretations
15.4 Linguistic Annotation
16 Feature Structures
16.1 Introduction
16.2 Elementary Feature Structures: Features with Binary Values
16.3 Feature, Feature-Structure and Feature-Value Libraries
16.4 Symbolic, Numeric, Measurement, Rate and String Values
16.5 Structured Values
16.6 Singleton, Set, Bag and List Collections of Values
16.7 Alternative Features and Feature Values
16.8 Boolean, Default and Uncertain Values
16.9 Indirect Specification of Values Using the
rel
Attribute
16.9.1 The Not-Equals Relation
16.9.2 Other Inequality Relations
16.9.3 Subsumption and Non-subsumption Relations
16.9.4 Relations Holding with Sets, Bags, and Lists
16.9.5 Varieties of Subsumption and Non-subsumption
16.10 Two Illustrations
17 Certainty and Responsibility
17.1 Levels of Certainty
17.1.1 Using Notes to Record Uncertainty
17.1.2 Structured Indications of Uncertainty
17.2 Attribution of Responsibility
18 Transcription of Primary Sources
18.1 Altered, Corrected, and Erroneous Texts
18.1.1 Use of Core Tags for Transcriptional Work
18.1.2 Abbreviation and Expansion
18.1.3 Correction and Conjecture
18.1.4 Additions and Deletions
18.1.5 Substitutions
18.1.6 Cancellation of Deletions and Other Markings
18.1.7 Text Omitted from or Supplied in the Transcription
18.2 Non-Linguistic Phenomena in the Source
18.2.1 Document Hands
18.2.2 Hand, Responsibility, and Certainty Attributes
18.2.3 Damage, Illegibility, and Supplied Text
18.2.4 The Use of the Gap, Del, Damage, Unclear and Supplied Tags in Combination
18.2.5 Space
18.2.6 Lines
18.3 Headers, Footers, and Similar Matter
18.4 Other Primary Source Features not Covered in These Guidelines
19 Critical Apparatus
19.1 The Apparatus Entry, Readings, and Witnesses
19.1.1 The Apparatus Entry
19.1.2 Readings
19.1.3 Indicating Subvariation in Apparatus Entries
19.1.4 Witness Information
19.1.4.1 Witness Detail Information
19.1.4.2 Witness Information in the Source
19.1.4.3 The Witness List
19.1.5 Fragmentary Witnesses
19.2 Linking the Apparatus to the Text
19.2.1 The Location-referenced Method
19.2.2 The Double End-Point Attachment Method
19.2.3 The Parallel Segmentation Method
19.3 Using Apparatus Elements in Transcriptions
20 Names and Dates
20.1 Personal Names
20.2 Place Names
20.2.1 Geo-political Place Names
20.2.2 Geographic Names
20.2.3 Relative Place Names
20.3 Organization names
20.4 Dates and Time
20.4.1 Absolute Dates and Times
20.4.2 Relative Dates and Times
21 Graphs, Networks, and Trees
21.1 Graphs and Digraphs
21.1.1 Transition Networks
21.1.2 Family Trees
21.1.3 Historical Interpretation
21.2 Trees
21.3 Another Tree Notation
22 Tables, Formulae, and Graphics
22.1 Tables
22.1.1 The TEI Table DTD
22.1.2 Other Table DTDs
22.2 Formulae
22.3 Specific Elements for Graphic Images
22.4 Overview of Basic Graphics Concepts
22.5 Graphic Image Formats
22.5.1 Vector Graphic Formats
22.5.2 Raster Graphic Formats
22.5.3 Photographic and Motion Video Formats
23 Language Corpora
23.1 Varieties of Composite Text
23.2 Contextual Information
23.2.1 The Text Description
23.2.2 The Participants Description
23.2.3 The Setting Description
23.3 Associating Contextual Information with a Text
23.3.1 Combining Corpus and Text Headers
23.3.2 Declarable Elements
23.3.3 Summary
23.4 Linguistic Annotation of Corpora
23.4.1 Levels of Analysis
23.5 Recommendations for the Encoding of Large Corpora
24 The Independent Header
24.1 Definition and Principles for Encoders
24.2 Required and Recommended Tags
24.3 Header Elements and their Relationship to the MARC Record
24.4 MARC Fields for the File Description
24.5 MARC Fields for the Encoding Description
24.6 MARC Fields for the Profile Description
24.7 MARC fields for the Revision Description
24.8 Structure of the DTD for Independent Headers
25 Writing System Declaration
25.1 Overall Structure of Writing System Declaration
25.2 Identifying the Language
25.3 Describing the Writing System
25.4 Documenting the Character Set and Its Encoding
25.4.1 Base Components of the WSD
25.4.2 Exceptions in the WSD
25.4.3 Documenting Coded Character Sets and Entity Sets
25.4.4 Documenting Transliteration Schemes
25.5 Notes in the WSD
25.6 Linkage between WSD and Main Document
25.7 Predefined TEI WSDs
25.8 Details of WSD Semantics
25.8.1 WSD Semantics: General Principles
25.8.2 Semantics of WSD Base Components
25.8.3 Multiple Base Components
25.8.4 Semantics of Exceptions
25.8.4.1 Case 1: replacement
25.8.4.2 Case 2: merger
25.8.4.3 Case 3: expansion
25.8.5 Merger of Form and Character Elements
26 Feature System Declaration
26.1 Linking a TEI Text to Feature System Declarations
26.2 The Overall Structure of a Feature System Declaration
26.3 Feature Declarations
26.4 Feature Structure Constraints
26.5 A Complete Example
27 Tag Set Documentation
27.1 The TagDoc Documentation Element
27.1.1 The AttList Documentation Element
27.2 Element Classes
27.3 Entity Documentation
28 Conformance
28.1 Definitions of Terms
28.1.1 TEI-Conformant Document
28.1.2 TEI Local Processing Format
28.1.3 TEI Interchange Format
28.1.4 TEI Packed Interchange format
28.1.5 TEI Recommended Practice
28.1.6 TEI Abstract Model
28.2 Modifications to TEI SGML Declaration
28.3 Modifications to TEI Document Type Declarations
28.4 TEI Processing Model
28.4.1 Document Capture and Reclamation
28.4.2 Local Storage Format and Application Software
28.4.3 Enrichment and Other Processing
28.4.4 Data Export
28.4.5 Data Import
28.4.6 TEI Conformance in the Processing Model
28.5 Aspects of Conformance and Document Description
28.5.1 Character Sets
28.5.2 SGML Declaration
28.5.3 SGML Document Type Declaration
28.5.4 Tag Usage and Feature Marking
28.5.5 Non-SGML Markup
29 Modifying the TEI DTD
29.1 Kinds of Modification
29.1.1 Suppressing Elements
29.1.2 Renaming Elements
29.1.3 Class Extension
29.1.4 New content models
29.2 Documenting the Modifications
30 Rules for Interchange
30.1 Negotiated Interchange
30.2 Some Simple Examples
30.3 Non-Negotiated Interchange
30.4 Notes for Implementors
31 Multiple Hierarchies
31.1 Concurrent Markup of Multiple Hierarchies
31.2 Boundary Marking with Milestone Elements
31.3 Fragmentation of Elements
31.4 Reconstitution of Virtual Elements
31.5 Multiple Encodings of the Same Information
31.6 Concurrent Markup for Pages and Lines
32 Algorithm for Recognizing Canonical References
33 Element Classes
34 Entities
35 Elements
36 Obtaining the TEI DTD
37 Obtaining TEI WSDs
38 Sample Tag Set Documentation
38.1 Tag Documentation for the TEI P Element
38.2 Tag Documentation for the TEI HI Element
38.3 Tag Documentation for the TEI Div Element
38.4 Class Documentation for the TEI Divn Class
39 Formal Grammar for the TEI-Interchange-Format Subset of SGML
39.1 Notation
39.2 Grammar for SGML Document (Overview)
39.3 Grammar for SGML Declaration
39.4 Grammar for DTD
39.5 Grammar for Document Instance
39.6 Common Syntactic Constructs
39.7 Lexical Scanner
39.8 Differences from ISO 8879
40 Bibliography
This version generated by p3x2htm on 9 Sep 99