Text Encoding Initiative

10. Omissions, Deletions, and Additions


In addition to correcting or normalizing words and phrases, editors and transcribers may also supply missing material, omit material, or transcribe material deleted or crossed out in the source. In addition, some material may be particularly hard to transcribe because it is hard to make out on the page. The following elements may be used to record such phenomena:

<add>
contains letters, words, or phrases inserted in the text by an author, scribe, annotator, or corrector. Attributes include:

place
if the addition is written into the copy text, indicates where the additional text is written. Sample values include inline, supralinear, infralinear, left (in left margin), right (in right margin), top, bottom, etc.

<gap>
indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible or inaudible. Attributes include:

desc
gives a description of the omitted text.
resp
indicates the editor, transcriber or encoder responsible for the decision not to provide any transcription of the text and hence the application of the <gap> tag.

<del>
contains a letter, word or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, annotator or corrector. Attributes include:

type
classifies the type of deletion using any convenient typology.
status
may be used to indicate faulty deletions, e.g. strikeouts which include too much or too little text.
hand
signifies the hand of the agent which made the deletion.

<unclear>
contains a word, phrase, or passage which cannot be transcribed with certainty because it is illegible or inaudible in the source. Attributes include:

reason
indicates why the material is hard to transcribe.
resp
indicates the individual responsible for the transcription of the letter, word or passage contained with the <unclear> element.

These elements may be used to record changes made by an editor, by the transcriber, or (in manuscript material) by the author or scribe. For example, if the source for an electronic text read

The following elements are provided for
for simple editorial interventions.
then it might be felt desirable to correct the obvious error, but at the same time to record the deletion of the superfluous second for, thus:
The following elements are provided for
<del hand="LB">for</del> simple editorial interventions.
The attribute value LB on the hand attribute indicates that ‘LB’ corrected the duplication of for.

If the source read

The following elements provided for
for simple editorial interventions.
(i.e. if the verb had been inadvertently dropped) then the corrected text might read:
The following elements <add hand="LB">are</add> provided for
<del hand="LB">for</del> simple editorial interventions.
The attribute value LB on the hand attribute indicates that ‘LB’ corrected the duplication of for.

These elements are not limited to changes made by an editor; they can also be used to record authorial changes in manuscripts. A manuscript in which the author has first written ‘How it galls me, what a galling shadow’, then crossed out the word galls and inserted dogs might be encoded thus:

How it <del hand="DHL" type="overstrike">galls</del>
<add hand="DHL" place="supralinear">dogs</add> me,
what a galling shadow

Similarly, the <unclear> and <gap> elements may be used together to indicate the omission of illegible material; the following example also shows the use of <add> for a conjectural emendation:

One hundred & twenty good regulars joined to me
<unclear><gap reason="indecipherable"/></unclear>
& instantly, would aid me signally <add hand="ed">in?</add>
an enterprise against Wilmington.

The <del> element marks material which is transcribed as part of the electronic text despite being marked as deleted, while <gap> marks the location of material which is omitted from the electronic text, whether it is legible or not. A language corpus, for example, might omit long quotations in foreign languages:

<p> ... An example of a list appearing in a fief ledger of
<name type="place">Koldinghus</name> <date>1611/12</date>
is given below. It shows cash income from a sale of
honey.</p>
<q><gap desc="quotation from ledger"
    reason="in Danish"/></q>
<p>A description of the overall structure of the account is
once again ... </p>

Other corpora (particular those constructed before the widespread use of scanners) systematically omit figures and mathematics:

<p>At the bottom of your screen below the mode line is the
<term>minibuffer</term>.  This is the area where Emacs
echoes the commands you enter and where you specify
filenames for Emacs to find, values for search and replace,
and so on.
<gap desc="diagram of Emacs screen" reason="graphic"/>
</p>

Up: Contents Previous: 9. Editorial Interventions Next: 11. Names, Dates, Numbers and Abbreviations



Date: (revised October 2004) Author: Lou Burnard (revised SPQR).
Copyright TEI 1995