4 The TEI core

Two core tag sets are available to all TEI documents without formality. The first defines a large number of elements which may appear in almost any kind of document, whatever kind of base tag set is in use. The second defines the header, providing something analogous to an electronic title page for the electronic text.

4.1 Elements available to all bases

The core tag set common to all TEI documents provides means of encoding with a reasonable degree of sophistication such textual features as typographically highlighted or quoted phrases, (optionally distinguishing highlighting used for emphasis, technical terms, foreign words, titles etc); quoted phrases, (optionally distinguishing amongst direct speech, quotation, glosses, cited phrases etc.); ``data-like'' phrases such as names, numbers and measures, dates and times, etc.; lists of all kinds; basic editorial changes (e.g. correction of apparent errors; regularization and normalization; additions, deletions and omissions); simple links and cross references, providing basic hypertextual features; facilities for annotation, indexing, bibliographic citations and referencing systems. There are few documents which do not exhibit some of these features; and none of these features is particularly restricted to any one kind of document. In some cases, an additional tagset is also available, providing more specialized elements for those wishing to encode aspects of these features in greater detail (for example, for verse and drama, and for names), but the elements defined in this core are believed to be adequate for most applications most of the time.

4.2 The header

The TEI scheme attaches particular importance to the provision of documentary or bibliographic information about electronic texts. Such information is essential for any satisfactory interchange of texts coming from multiple sources, or for which long term uses are envisaged.

The TEI header is one of the few mandatory elements in a TEI document. It has four major divisions which together provide a detailed syntax for the documentation of:

The first of these, the file description, contains traditional bibliographic material, detailing title, intellectual responsibility and publication or distribution information relating to an electronic text, which can readily be translated into a conventional catalogue record for use by the growing number of forward-thinking academic and public libraries now coming to terms with their new role as curators of non-print electronic materials.

Several commentators, noticing how the day to day information processing of all sectors of the economy now takes place in electronic form only, have expressed concern at the difficulties faced by librarians and archivists in handling these new forms of historical records. Others, trying to come to terms with the wealth of information in ``cyberspace'', have lamented the absence of any effective cataloguing standards for networked resources and other forms of electronic publication. For creators of language corpora, the provision of such meta-descriptive information is essential, since without it analysis of the full complexity of language use is all but impossible. The TEI Header represents a major contribution to overcoming all these problems.

Many electronic texts are essentially derivative works, created either by keying or scanning previously existing print materials, combining or modifying previously existing electronic materials, or both. The source description part of the TEI header allows an encoder to specify the source or sources from which a text has been derived, using traditional bibliographic concepts. The pedigree of a TEI-conformant text can thus be specified, in the same way as a conventional book will generally document its publishing history. A detailed formal description of changes made in producing a text can be recorded as a distinct revision history ; this is particularly useful for highly dynamic texts.

As noted above, the TEI is not a fixed encoding scheme, but offers a variety of options appropriate to different situations. Consequently, the encoding description within a TEI Header is of particular importance to users of an electronic document. It provides, in structured or unstructured form, vital information about editorial conventions or policies, design decisions and even the selection of tags actually used within the document.

The profile description is used to group together a wide range of additional descriptive information ranging from specifications of the languages used within it, the situation or social context in which it was produced, its topics or classification, to demographic or social characteristics of its authors or participants. No-one is likely to need all of these categories of information, but all of them are likely to be essential to some users.

A collection of TEI headers can also be regarded as a distinct document, and an auxiliary DTD is provided to support interchange of headers alone, for example between libraries or archives.

