Licensed under
No source: this is an original work
By levels of transcription is meant, essentially, how much of the
information in the original document is included (or otherwise noted)
by the transcriber in his or her transcription. W. W. Gregg's
distinction between
Such editions are often so close to the originals as to be all but unreadable for those unfamiliar with early palaeographical or typographical conventions, or in any case no easier to read than the originals (compare the above figures, showing the same passage in Verner Dahlerup's 1880 edition of
Forms of letters: Variant letter forms — high and round s, for example — are often distinguished in transcriptions of manuscripts and early printed materials. Texts which are semi-diplomatic or semi-normalized will generally distinguish only between variant letter forms which are felt to have a basis in phonology distinctions, which the two forms of /s/, for example, do not. For statistical purposes, however, it may be desirable to transcribe (or register in some other way) such palaeographical or typographical distinctions even where they have no apparent basis in phonology.
Punctuation: It is standard practice in diplomatic transcriptions for the punctuation of the original to be reproduced, no matter how inconsistent it may appear by modern standards. The transcriber may try to reproduce the actual signs used in the original, or may chose to use the nearest modern equivalent. In antiquity, for example, it was common to place points at varying heights in order to represent different degrees of pause, in ascending order of importance; the transcriber may chose to use an ordinary full stop instead of a mid- or high-dot. Similarly, an ordinary semi-colon may be used for the
Capitalization: In a diplomatic transcription
the capitalization of the original will normally be retained, although
in some traditions proper names are given capital letters whether
there are capitals in the original or not. In some cases it may be
desirable to distinguish between large and small capitals,
i.e. letters which have the shape of majuscules but the size of
minuscules. In Old Norse manuscripts, for example, small capitals were
frequently used to indicate geminate consonants; transcribing these as
ordinary capitals would give a false impression of the use of capitals
in the text, while replacing them with double lower-case letters would
arguably involve a unacceptable degree of normalization. Some scholars
would also recognize the existence of
Structure and layout: By structure is meant the division of the work into its constituent parts: prose works will normally comprise a number of chapters, which will contain sections and/or paragraphs; works in verse will be divisible into cantos or fits, these into stanzas, the stanzas into lines and so on. Other types of texts, letters, for example, will have their own structures. By layout is meant the arrangement of the text on the page. In medieval manuscripts and many early printed books there is rarely any connection between these two; a new chapter will not necessarily begin on a new page — except by accident. Poetry was frequently written out like prose, although sometimes the beginning of a new line or stanza would be indicated in some other way, through a sign or mark of punctuation. The more diplomatic a transcription is the more likely it is to favor the structure of the document, i.e. the layout, over the structure of the work. In strictly diplomatic transcriptions the line-, column- (if any) and page-boundaries of the original are reproduced in the transcription, while the structure of the work, if unmarked in the original, is left unmarked in the transcription. In less diplomatic transcriptions the structure of the work will be given priority, while the line-, column- and page-boundaries may indicated by means of a vertical line (single for line-boundaries, double for page boundaries, where both are indicated). In normalized and modernized texts the focus is entirely on the work itself, and there is generally no indication whatsoever of the layout of the original.
Abbreviations:The use of abbreviations, both to spare the scribe the labor of writing words which, due to their frequency generally or in a particular text, could easily be understood in an abbreviated form, and to save parchment or paper and ink, is a characteristic feature of ancient and most medieval vernacular manuscript traditions and early printed books. As an aid to the reader it is common in all but the strictest diplomatic transcriptions to expand abbreviations. When a word or phrase is abbreviated a number of letters is suppressed, and the expansion of the abbreviation thus involves supplying these letters. The letters so supplied are frequently marked in some way in the transcription, printed in italics or given in brackets, for example, but even in transcriptions which are otherwise fairly diplomatic abbreviations may be expanded silently, especially in cases where it can be argued that there is no doubt as to what they represent.
Corrections and emendations:In all but the most diplomatic of transcriptions obvious errors and omissions will be corrected — the sorts of things, it could be argued, which the scribe or compositor himself would have corrected had they been brought to his attention. An editor may also want to emend the text on the basis of readings from other witnesses, common sense or artistic inspiration,
The decisions facing the producer of an electronic transcription are essentially the same as those facing the producer of any transcription — how much information to include? And the basis for making these decisions is much the same as well, in that the choices made will depend on factors such as the amount of time available for the job, and, not least, the intended use to which the transcription will be put. The great advantage of electronic texts, however, is that many decisions need not be made, in that the transcriber can include a wide range of information in the transcription but then chose how much of it to make available to the reader, or, better still, allow the reader to chose for him- or herself how much of it he or she wishes to see. From a single marked-up text, it should be possible, if one so desired, to produce screen or print copy at any level from strictly diplomatic to fully normalized. Such mark-up would by necessity need to be fairly complex, and would almost certainly require several layers of input. And this, indeed, is another great advantage of electronic transcription over traditional print transcription: one can return again and again to one's transcription, adding further levels of mark-up.
How one proceeds will be determined to some extent by whether one is starting from scratch or working with a text already in machine-readable form. In the latter case the most logical thing to do would be to mark up the text feature by feature, beginning with the overall structure and layout, for example, and then adding mark-up for abbreviations, variant letter forms, etc. If one is starting with a blank sheet, as it were, it would perhaps make more sense to mark up these various features as one goes, or at least as many of them as one can reasonably be expected to keep track of at any one time. In this case it might be preferable to transcribe the text as it comes, including the abbreviations, variant letter forms and so on first, and then add the structural mark-up.
As mentioned above, the text of the work and the physical object carrying that text have separate structural hierarchies, both of which need ideally to be encoded, even in the most basic of transcriptions. In order to represent the former it is recommended that the chapter, section, etc. Paragraphs within these divisions can be tagged using line
) and line-group
i.e. a group of lines functioning as a formal unit), again with a stanza or couplet. Lines and line-groups can also be numbered and identified using the
Variant letter forms — and indeed any
Marking-up abbreviations and their expansions is one of the more problematic aspects of the transcription of primary sources. The theory is simple enough: one can use either the MS. is a common abbreviation for manuscript
(or rather manuscriptum
). It is a suspension, or rather two, where the first letter of each part of the compound word is given and the rest omitted; the fact that it is an abbreviation is indicated by the full stop (which is frequently omitted), and by the fact that it is written upper case (which it frequently isn't). One might want to tag this as an abbreviation and indicate the expansion as an attribute value: <abbr expan="manuscript">MS.</abbr>. Alternatively, one could give the expanded form as the content of the element and the abbreviated form as the value of the attribute: <expan abbr="MS.">manuscript</expan>. To insist on M<expan abbr="">anu</expan>S<expan abbr=".">cript</expan> strikes me at least as patently absurd — in addition to producing the incorrectly expanded form ManuScript
. But even if one is prepared to overlook this, what should one do with the form MSS. for manuscripts
(or manuscripta
), where the second S is there only to indicate that the word is plural? The superscript nasal stroke, on the other hand, has a specific graphemic reference: it stands for the letter m or n. With a word such as fratrū, it seems more natural to encode the expansion as fratru<expan abbr="&bar;">m</expan>, rather than <expan abbr="fratru&bar;">fratrum</expan>. Doing so also makes it completely explicit which letters have been supplied, and makes it possible for these letters to be displayed in italics (or round brackets), in the manner of a traditional printed edition, something which many scholars still feel to be of importance.
One solution would be to use
Alterations made to the text, whether, in the case of manuscripts, by the scribe, or in some later hand, can be encoded using
The elements
Where a word has been supplied by the editor, the illegible in the former case and omitted in the latter. Where the reading of another witness supports the reconstruction it is possible to use the
Finally, there is the question of normalization/regularization. One can use the
For the most part the elements discussed here are used to tag features in the original in a way analogous to the typographical conventions of printed editions. There are, of course, many other elements available within the TEI for tagging other things, dates, for example, or the names of persons, places and institutions, all of which are useful for search purposes and indexing. And why stop here? The TEI also provides mechanisms for associating any kind of semantic or syntactic analysis and interpretation which an encoder might wish to attach to a text, including such familiar linguistic categorizations as