HTI Style Guide for American Verse Project and Middle English Texts

  • Front Matter
  • Openers, Closers, and Bylines
  • Page Numbers
  • Bilbiographic Information for Epigraphs
  • Heads
  • Naming of DIV elements
  • Poems
  • Lines
  • Quotations
  • Milestones
  • Notes
  • Spacing
  • Corrections to Text
  • Hyphens
  • Oddities
  • HTI American Verse Imaging Guidelines
  • Indexing new texts

  • When you run into a situation that you don't know how to tag, we would suggest that you first consult the guidelines below, and if the information is not covered here, then look in the TEI Guidelines (particularly volume 2). It is also helpful to look at some of the verses using Panorama with the tags on (ctrl T) that are already online. If you do so, be aware that different texts handle the same situation differently. You may need to makes some choices. If after trying the above you still can't resolve your question, then talk to one of the HTI staff.

    Front Matter

    title page <titlepage>dedication <div type="dedication">
    title <doctitle>preface <div type="preface">
    author <docAuthor>acknowledgments <div type="acknowledgment">
    part title <titlepart>epigraph <epigraph>

    The <byline> tag refers to the primary statement of responsibility given on the title page or at the end of a work. When the author's name is given, use <docAuth> within the byline. For instance: <byline>By <docAuth>Louise Chandler Moulton</docAuth></byline>.

    Note: If dedications or acknowledgements are part of the titlepage (or another div) they should not be tagged as separate divs.

    Title pages are not always scanned, so doublecheck when you are encoding front matter that all title page information is present. Some information such as title and author may have to be typed in by hand.

    Page numbers

    Wherever the page numbers appear in the printed text (top, bottom or even side of page), the page break element appears at the top of the text for that page.

    If you can determine the page number of un-numbered pages by counting backwards from the first page number given in the text, you should do so and assign page numbers. Place them in square brackets to differentiate them from the page numbers actually present in the text. For example, if the introduction begins on page vi, you can assign the numbers i-v to the pages that preceed the introduction.

    The titlepage and verso are never assigned page numbers. The word verso should appear in place of a page number on the verso page.

    Sometimes the pagination is wrong in the original text. Leave in the page numbers of the original text and insert a note in the header that the error was noted and the original pagination was preserved.

    Page numbers should appear inside a div. For example, do not put a page numbers between two div1s -- put the page number inside the div that begins where a page break occurs.

    Bibiographic Information for Epigraphs

    Epigraphs (quotations that preceed entire texts or individual poems) often carry with them bibliographic information, such as an author's name and/or a title. Enclose these in a bibl tag. If the parts of the bibliographic information are differentiated by typeface (for example the title in bold), you should label the parts separately within the bibl tag.

    "Is it better to suffer the slings and arrows of outrageous fortune":

    <bibl> Shakespeare, Hamlet </bibl>
    <bibl> <author> Shakespeare </author> <title> Hamlet</title> </bibl>.

    Openers, Closers, and Trailers

    An opener groups together phrases appearing as preliminary matter in prose, poems, and especially letters. Use <opener> when a date and/or place is given at the beginning of a poem, introduction, or prefatory material like a foreword or preface.

    Don't break down <opener> into the subelements of dateline, byline, and saluation if those would be applicable. In Middle English texts, the openers often take the form of a sentence-long summary of the text that follows, and they may be in Latin.

    A <closer> groups together the same elements, but at the end of a division. Again, don't break them down into the various subelements.

    The <trailer> element contains a footer for a division. In the Middle English texts, a prayer (generally in Latin) at the end of a div in a religious text or "Here endeth cap. XXI" in a secular text would be trailers.


    Main title with subtitle:

    <head>The Red Wheelbarrow</head>

    <head type="sub">A meditation on chickens </head>

    Main title with a roman numeral: contain within the same head tag with a line break <lb> after the numeral.

    <head>IV<lb>Pine Trees</head>

    Main title with prefatory material:

    <head>Mending Wall</head> <opener>To my lovely wife</opener>

    <head>Howl</head> <opener>written while crossing Lake Champlain</opener>

    <head>The Walls Do Not Fall</head> <opener>April, 1943</opener>

    In the Middle English texts, there may not be heads for each obvious division. If the editor has supplied a head (as seen as running text at the top of the page or perhaps a marginal note summarizing the action), or if there is a table of contents for the manuscript where heads have been indicated for each chapter but are inconsistently included, add a type attribute of "supplied" to the head element. For example,
    <head type="supplied">Cap.ix.<br>In which King Ban slays the white hart.</head>

    Naming of DIV elements

    DIV elements are numbered to reflect their level in the hierarchy (e.g., the DIV0 might be a section which contains a group of DIV1s, each a poem) but are named on the TYPE attribute to reflect their class or function. For example, a poem may be at any of several different levels (DIV0, DIV1, DIV2, etc.) depending on its place in a hierarchy, but will always be TYPE="poem".

    The TYPE attribute for a DIVn should, if possible, reflect its named value. This is frequently something like "chapter," "dedication," or "introduction." At other times, it may be more generic and unnamed by the author or printer. For the American Verse Project, we have chosen to name higher level groups of poems TYPE="part", and portions of a poem, TYPE="section". Please, you don't need a DIV0 to hold the contents of the entire book. After all, that's what the BODY element does.


    Tag all poems as <div(insert appropriate number) type="poem">.

    Line group <lg>

    verses <lg type="verse"> The default type for complete line groups will be verse.

    If partial verses appear in prose, do not tag as <lg> unless you are sure it is a complete line group. Tag as quote <q> with lines <l>.

    Note: If a poem has only one verse, mark up only the lines and do not tag it as a line group.

    Each line <l>

    Delete any line breaks created in the text solely because of the width of the page.


    Preserve indentation at the level of the line, as it appears in the text except in the cases where it seems to be supplied by the printer solely for indicating continuity in the line (when a line of poetry takes up more than one line on the page). Indicate the indentation on the element attributes, i.e. rend=indent.

    Sometimes it is clear that a line of poetry is deliberately incomplete. You will notice this particularly with poetry that has a clear metrical scheme. For instance in dramatic verse, there are often iambic pentameter lines that are started by one speaker and finished by another. Such lines should be tagged as lines but given the attribute part. In most cases the part will be initial [i] (the beginning of a line), medial [m], (the middle of a line), or final [f] (the end of a line).

    For example:
    Ribera We are the fools of habit

    Lorenzo Pray you, sir

    would be tagged as

    Ribera<li part=i> We are the fools of habit

    Lorenzo<li part=f> Pray you sir


    Block quotes, letters in prose sections, and poems in prose sections are tagged as quotes <q> and can then include lines, paragraphs, etc.

    Note: you do not need to mark up all indirect quotes.


    In American Verse

    In American Verse texts, milestones are significant but unlabeled divisions of text such as line breaks and sections, or sometimes special page breaks.

    The most common instance for using the tag <milestone> would be for the occurance of a line of asterisks denoting an ambiguous section break or a possible missing line. Delete the asterisks and insert the tag <milestone unit=typographic n="******">. For an exact representation, use the same number of asterisks, with single spaces between them if spacing occurs, that are in the original.

    Do not use the milestone element for typographic elements that indicate the end of a natural division. For example, many texts have a centered hairline between poems, or between the head of a poem and the beginning of the first linegroup. These are not typographically or intellectually significant and are not encoded.

    In Middle English

    Milestones in Middle English are generally used for folio markings from the original manuscript. There may be two sets of milestones, generally in texts which contain transcriptions of two manuscripts. Use the folio/column/page data as the UNIT attribute and the number or number/letter combination as the N attribute. For example, a footnote reference that says Fol. 15b would be encoded as <milestone unit="folio" n="15b"> in the place where the asterisk or superscript number refering to that footnote occurs.

    Not infrequently, you will find the location of the milestone is ambiguous because the editor or printer has forgotten to insert the asterisk or number locating the milestone. Sometimes this is simply because the milestone goes at the beginning of the line on which it falls but sometimes not. When it is ambiguous, insert it where you believe it goes and make a note of it in the editorial declaration.

    When words are broken by milestones, join the word and place it before the milestone and note these instances in the editorial declaration with a slash designating the break.


    In American Verse, brief footnotes and endnotes, especially when there aren't too many of them, can be simply moved into the line and tagged with a note tag. So if the original is something like:
    <opener>For my sister*</opener>

    and the note is:
    *died Sept 18, 1812

    You can simply tag it as <opener>For my sister</opener><note>died Sept. 18, 1812</note>

    If the notes are contained in a section that does not come at the end of the text, but appear in a place where removing them and placing them inline will leave a gap in the numbering sequence, encode the section as data and do not worry about linking the notes to the appropriate text.

    In the Middle English texts, notes that indicate changes made to the manuscript, either by the scribe, by a later scribe, or by the editor are not encoded as notes. We use the elements in the TEI that identify these items specifically. Common elements used are supplied, add, del, sic, and corr. Supplied is used when the editor has supplied text, generally from another manuscript, generally due to damage and missing text in the manuscript used for the edition. The add element is used for situations such as "h is added in a later hand"; del is used similarly for mentions of deletions. Scribal errors in the manuscript, often noted as "Ms. reads "hir hir stede.", are encoded as <corr sic="hir hir">hir</corr> stede.


    White spaces around punctuation such as em dashes, colons, and periods should be deleted. (Postprocessing routines will take care of spaces around punctuation automatically.) Note: if you think the space should be reinstated in some instances, do so on a case by case basis.

    You don't need to worry about two characters spaces together as the representational tools dealing with SGML will automatically collapse them into one space.

    Corrections to Text

    When you find an error in the original text, suggest a correction, while leaving the original intact, as follows: <sic corr="write">rite</corr>.

    If the error is due to HTI scanning and is not in the original text, then simply correct the error.


    When a hyphen occurs at the end of a line to break a word that is too long, remove the hyphen and join the word. Most of the time in the American Verse texts, there will be a hyphen splice comment to call your attention to the problem (be sure to remove the comment). If the hyphen occurs at the end of the last line of the page, bring the end of the word to the previous page, before the page break marker.

    Do the same for Middle English texts where the breaks occur at the end of folios (that is, a milestone tag occurs in the middle of a word). However, note that you have done this and list the words joined and their locations -- chast/itie (folio 49r) -- in the editorial declaration in the TEIheader.

    Sometimes Middle English words have hyphens in them as normal practice, with-outen, for example. In these cases the hyphens should be retained.

    Foreign Languages

    Frequently, you will find the use of foreign phrases in your text, throughout the Middle English and particularly in the epigraphs of poems in the American Verse. If the foreign phrase occurs in the middle of a paragraph, you can surround it with the >foreign< tag, which contains a lang attribute. If it is an epigraph in a foreign language, the value of this attribute is an abbreviation of the language in question (a list of abbreviations follows). Because this abbreviation is a reference to another value, you must also declare the languages you use in the TEIheader. For example, in the case of the foreign epigraph, you would mark the epigraph
    <epigraph> <q lang="lat">"Labitur et labetur"</q> </epigraph>
    You will also need to declare the language as the last section in the header, after the encoding description <encodingdesc> For the example above, you'd insert the following:
    <profiledesc><langusage><language id="lat">Latin<language><langusage><profiledesc>


    If there are 2 sets of copyright information; 2 sets of publication information; information in the front regarding the printer of the edition; or 2 title pages, these can be handled with the tag <imprint> or <docimprint>

    If you run across other oddities and have developed a solution, please pass the information on so that it may be added to the style guide. If you run across other oddities and can't resolve them but think that they are important to be covered in our guidelines please find or mail Chris Powell (

    HTI Imaging Guidelines

    HTI Imaging Guidelines

    Many of the texts the Humanities Text Initiative works with include images. For example, a book of American poetry might include a portrait of the author in the front of the book, and it may also contain other images related to specific poems. When the electronic text is completed, it will include the appropriate images from the original text. One person at HTI manages all of the tasks associated with images, but these guidelines should give anyone at HTI the basics about our imaging process.

    Which images does HTI use?

    How are images represented in SGML texts?

    OK, enough basic information! My text has images, so what do I do now?


    Find Andrew Midkiff, or send email to


    Example 1. A page from a book of American verse containing an image.

    <img src="bell.gif"

    This image will be sacnned, and named pierp-bell.gif.

    Example 2. The entity declaration in the SGML header.

    <!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite 1.0//EN" [
    <?STYLESPEC "UM HTI American Verse"                                  ">
    <?NAVIGATOR "UM HTI American Verse"                                  
    <!ENTITY % ISOlat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN">
    <!ENTITY % ISOlat2 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN">
    <!ENTITY % ISOnum  PUBLIC "ISO 8879:1986//ENTITIES Numeric and       
           Special Graphic//EN">
    <!ENTITY % ISOpub  PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN">
    < !NOTATION gif   PUBLIC
    "+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION CompuServe
     Graphic Interchange Format//EN"   "GIF" >
    <!ENTITY pierp-bell SYSTEM "images/pierp-bell.gif" NDATA GIF> ]>

    Example 3. The entity reference within the SGML markup.

    <DIV0 TYPE="poem" ID="DIV0.11">
    <P><FIGURE ENTITY="pierp-bell"></FIGURE></P>
    <LG TYPE="stanza" ID="LG94">
    <L ID="L551">THE LIBERTY BELL—the Liberty Bell—</L>
    <L ID="L552">The Tocsin of Freedom and Slavery's knell</L>

    Created by Jason P. Williams for the Humanities Text Initiative on April 17, 1996. Last revised by Chris Powell January 29, 1998.


    Adding new works to the collection

    As new works are completed (i.e., marked up, proofed, markup reviewed, and associated images created), they are added to the collection. This process includes two principal steps: adding links from the Browse page, and re-indexing the collection to facilitate searching the new texts and displaying them in HTML.

    Put the files on the HTI Web server

    Most files will be in Netware and will need to be exported from Author/Editor. Some will occasionally be on the HTI UNIX servers (possibly saltmine) as normalized SGML. In the case of files in Netware,

    1. FTP files over to HTI
    2. Open up the files in EMACS
    3. Replace the DOCTYPE declaration provided by A/E and insert the generic "amverse" DOCTYPE (includes processing instructions for stylesheet and navigator):
      1. "C-k" to remove first line
      2. "C-x i amverse.doctype" to insert contents of amverse.doctype
    4. Parse the text again using sgmls by issuing the command: teiparse {filename}

    Edit the index.html file in /web/english/amverse/texts 

    1. Open index.html in EMACS (using the X-terminal or the aixterm/xterm on WinNT will allow you to cut and paste with the first and second mouse buttons)
    2. Add new title(s). This can be a copy and paste operation from existing entries, but it should reflect the new ID (until the "book bag" mechanism is in place) and title page/verso links of the new text. Check to make sure all links work by testing with a web browser. Note that the HTML link will not work until the collection has been re-indexed (see below). The SGML link will work, and should make use of the amverse style sheet.

    Re-indexing the corpus 

    1. cd to /prep/amverse
    2. run the program called "prep"; this concatenates the SGML files, indexes the text, and creates the SGML-based "region" files
    3. edit the data dictionary (amverse.dd), inserting the file make.special.dd.frag between the </REGION> and </REGIONS> tags
    4. run the program called "amv-install"; this puts all of the files in their appropriate places. Check several of the HTML links in the index.html file to make sure that the system isn't broken (e.g., because of bad path names or incomplete indexing); also try some searches of content that would be in the new texts.