Licensed under
No source: this is an original work
Epigraphy is the study of texts that are inscribed onto durable materials, typically (though not always) stone. These texts include honorary and memorial inscriptions (on statue bases or gravestones), laws and decrees, and even graffiti. Although stone does not rot or burn, it can be broken or worn, and inscribed stones were often re-used as building materials. As a result, many inscriptions are fragmentary texts.
Here we focus on Greek and Latin epigraphy. In the city-states of classical Greece and the Hellenistic world, from the sixth through at least the second century BC, laws and decrees were regularly inscribed on stone stelæ and displayed publicly. In the Roman world, starting in about the third century BC, although decree stelæ were less common, funerary and memorial inscriptions were quite common. These inscriptions of all types are an essential primary source for knowledge about the ancient world. Not only do they tell us about laws, alliances, and famous people, but they also give insight into daily life. The grave inscriptions, for example, can be mined for information about life expectancies, family sizes, and occupations. In addition, inscriptions preserve early forms of the Greek and Latin languages. Even the spelling errors give valuable information about pronunciation.
Epigraphy has been studied systematically since the Renaissance. Inscriptions from various parts of the classical world have been collected into corpora and published in large print volumes, starting in the 19th century. The most important corpora are the
Epigraphers have a standard convention for marking up texts
in print, called the Leiden convention.
A brief description of the Leiden system may be useful. In this system, angle brackets enclose letters or words that are not present in the inscription but which, in the editor's judgement, should be. Square brackets enclose letters or words that are in the inscription but are wrong. In both of these cases, the error could be a spelling or transcription mistake on the part of the stone-cutter, or the editor could be normalizing the language or dialect. Commentary printed along with the text will generally explain which is the case.
Letters are printed with underdots when they are not complete or not clear. This can happen when the stone has been broken or worn. For example, suppose a particular Greek capital letter might be alpha, lambda, or delta. In this case the editor will print the one which is most likely, with a dot under it. Sometimes context makes clear that only one of these letters is really possible. In our example, if the previous letter is nu and the following one is xi, the letter in between must be a vowel, so the disputed triangular form must be an alpha. In such cases, some editors will print the alpha with an underdot, since it would not be unambiguously legible on its own, but others will print an unmarked alpha, since in the particular context there is no ambiguity.
Some ancient inscriptions include deliberate erasures, for
example to remove the name of a
Spacing, punctuation, and capitalization are all added or
adjusted silently, as are accents and breathings for Greek.
Readers are generally simply expected to know that ancient
writing conventions are different from modern. The editor's
commentary, however, will usually indicate the appearance of
the original text. In particular, many classical Greek stones
are written in
Unfortunately, the Leiden convention is neither universal nor flawless. Since it was only devised in 1931, many published texts pre-date it and use similar, but different conventions. Square brackets may denote additions rather than subtractions, round parentheses may denote additions rather than expansion of abbreviations, and so on. In addition, any given book may or may not include a key to its particular markup system, so readers must become aware of different publishers' preferred styles. Moreover, because a fully-marked text can be cumbersome to read, some publications omit the more complicated markings. This is especially likely in texts intended for beginners and students.
A more important problem, however, is that epigraphical
markup, like most markup, encodes an editor's judgement about
the text, which may be more or less certain. An editor's
supplement could be nearly certain, as in the case of the
alpha-lambda-delta example above, or it could be pure
conjecture, or anything in between. The Leiden system does
not include a mechanism for indicating the editor's confidence
in the proposed text. Normally this is discussed in the
commentary, of course, but certainty is not encoded in the
text itself. Moreover, editors differ in their application of
markup. Some will mark any supplement or emendation, even if
it is beyond any reasonable doubt; others will leave the
obvious emendations unmarked—and the determination of
what is
The first part of this problem, that the markup scheme in general use does not provide a way to encode certainty, can be addressed with the use of the TEI, as we will see below. The second part, that editors will disagree about what is obvious and what is arguable, cannot be solved by any particular markup scheme. We hope, though, that a scheme that provides a structured way to encode the discussion of certainty will ultimately help readers understand what editors know about a given text.
Although we focus on epigraphy here, it should be note that the editorial and markup conventions for papyrology are similar. Textual criticism also uses similar signs in the text, and the commentary in an epigraphic or papyrological publication may include an apparatus criticus. As a result, all three fields face similar issues and have similar needs. A convention for using the TEI in epigraphy will be largely applicable to papyrology. The use of the TEI is well understood in more conventional textual criticism, which in classics involves establishing a text from several manuscripts, usually dating from the 10th - 15th centuries AD for Greek texts, sometimes also rather earlier for Latin texts. The TEI tags that represent choices among variant readings are less useful in editing inscriptions and papyri, where there is generally only one copy of the text and we do not have the luxury of variants to choose from, but they can be used in collating prior editors' treatments of an inscription.
Over the last few years, several of the major epigraphic
corpora have begun digitization projects. The epigraphic
community also hopes to create a unified database of
information about all known Greek and Latin inscriptions. A
digitized corpus of inscriptions can include several different
representations of the inscriptions:
Many projects also find it convenient to store meta-data about the inscriptions in a database, to facilitate searching. The most useful meta-data fields include the date of the inscription, its language, the types of letter forms in use in it, where it was found, what material it is on, and its size.
Some digitization projects complement existing print projects. Here the goal is to provide more complicated searching than is possible with print indices, and to help produce the printed texts. The digital corpora may be made available on CD with appropriate programs to search and display the texts and images. Some projects plan to distribute their texts on CD instead of in print. Still other projects hope to make their inscriptions available over the web. Whatever the proposed dissemination mechanism, however, all these digitization projects face similar problems.
Digital epigraphy projects recently came together at the Second International Workshop on Digital Epigraphy, held at King's College, London, in July 2002. This workshop, hosted by the EpiDoc Aphrodisias Pilot Project (EPAPP), was part of a continuing discussion among epigraphers about standards and practices for semantic markup, in support of both electronic and print publication. EPAPP is a collaboration between the Aphrodisias team and the EpiDoc Group, based at the Ancient World Mapping Center, University of North Carolina. EpiDoc itself is a set of guidelines and a TEI DTD intended for use by epigraphers. It is likely to become a standard for those epigraphic projects that choose to use XML. EpiDoc was presented to the wider epigraphical community in an informal session at the
Over a dozen different epigraphic digitization projects
were represented at the July 2002 workshop, covering all of
the classical world and including researchers from half a
dozen different countries. Space does not permit detailed
discussion of all of them here; the workshop program,
including links to the various projects' home pages, is
available on line at
Of the projects represented at this workshop, very few are using XML yet. The Aphrodisias project, which hosted the workshop, is digitizing inscriptions from the Greek city of Aphrodisias in Asia Minor. Although Aphrodisias was first settled as early as the third millennium BC and the city continued to exist well into the thirteenth century AD, the project focuses on the inscriptions of late antiquity (see Roueché). This project is the first major test case for the EpiDoc guidelines and DTD. The inscriptions are transcribed using this DTD, then transformed to Leiden markup in HTML for web presentation. A database links the texts, information about the stones that contain them, photographs, and a site plan. The project began as an on-line reprint of
At Oxford, the Center for the Study of Ancient Documents is publishing the inscriptions from Roman Britain, which are generally on small tablets of wood or lead. Because the history of the Roman presence in Britain is part of the school curriculum there, the on-line publication of these texts must be accessible to children as young as eight or nine years old as well as to scholars. Publication therefore includes print volumes, an on-line edition, and an on-line exhibition for non-specialists. The texts are marked up in EpiDoc, and archaeological data and other meta-data are stored in a relational database. The texts are converted to Leiden-like markup for HTML display, with some adjustment for the limited capacities of HTML: since standard HTML does not include underdots, for example, these letters will instead be shown in a lighter color. This corpus includes the well-known Vindolanda tablets, from the military base at Vindolanda near Hadrian's Wall in northern England. The tablets, small pieces of wood with text in ink, date from around the end of the first century AD; they include letters, military records, financial accounts, and other types of documents. Archaeological work continues at Vindolanda, so the corpus of texts continues to grow. As a result, the CSAD team expects to update the on-line edition of these inscriptions with new texts and the corresponding photographs, transcriptions, and translations, as well as with corrections or changes to earlier readings based on new materials. The print edition, naturally, will not change, and the editors must decide whether the on-line edition is the printed text with a separate list of corrections, or the most current text with a separate page of change history.
The oldest and largest epigraphic corpora, on the other hand, have so much material that conversion to XML would be prohibitively difficult. Both
Some general considerations emerge from comparison of the various projects. Whether they are using databases, XML, or hand-written notes, all epigraphic projects have structured data, and whether they are marked up in Leiden, TEI, or an older system, all epigraphic texts are structured as well. When a new project begins, or an existing one considers digitization, it faces several basic questions, some editorial and some technological.
The first question an epigraphic project must answer is whether the text or the physical object on which it is written should be considered the primary focus. That is, does the project study texts which happen to be written on stones, or stones which happen to contain texts? Either approach is possible. Often projects begin from the stones because one stone may contain several different inscriptions. If the text is primary, then information about the stone itself will be repeated (or referenced repeatedly) with each of the texts it contains. It is rare, on the other hand, for a given text to appear on more than one inscription, so it is rarely necessary to repeat information about a text in the records for different stones. A project whose main concern is with language, however, might prefer to treat the texts as primary and the stones as secondary.
The next major question is the scope of the collection. A project might work on inscriptions from a particular place or time, like the city of Aphrodisias or Roman Britain, or might try to catalog all known inscriptions in a given language, as the major corpora
After these fundamental editorial questions come the technological questions, of which the most important may be how to store the meta-data about the inscriptions, the objects that contain them, and the project's photographs, transcriptions, and editions. Although basic information about a TEI text goes into its TEI header—who transcribed it, whether it has been published before, and so on—it may also be convenient to store this information in the same place and format as the meta-data about the rest of the project's collections. These issues are not specific to epigraphical projects, of course, but common to any project that deals with texts, photographs, and physical objects. Many epigraphic projects, like many more general digital libraries, use relational databases for their meta-data. Some store the actual texts in the same database records, marked up in Leiden style or in XML. A sufficiently robust database system can even store photographs.
The technical question most relevant to the present chapter
is how to encode and store the text.[4
indicates a left double square
bracket, ]4
its matching right bracket, or #322
indicates the chi-rho symbol.
If a project decides to use XML, it must then determine what DTD (or schema) to use. As in every other humanities discipline, the basic question is whether to use a general DTD, like the TEI, or to write a project-specific one. Exactly the same issues arise in the design of the database tables or other organizational schema for meta-data. Some projects want databases or DTDs that are extremely specific to the types of inscriptions they are dealing with. For example, the projects that work largely or exclusively with funerary inscriptions want a standard way to record the age and sex of the person being memorialized, while projects that work with legal texts do not need this. Other projects prefer not to write and maintain their own DTD. The EpiDoc TEI guidelines are a good compromise here: the EpiDoc DTD is the TEI, with a few epigraphically oriented modifications made using the standard TEI mechanisms. There are also projects that use their own versions of the TEI, for example the project working on the Protestant Cemetery in Rome (Rahtz).
A key incentive for using XML is the ability to exchange data with other projects. Epigraphic corpora may overlap, as the time periods or geographical areas they focus on may intersect. It is therefore convenient to be able to divide the labor of photographing, cataloging, and editing the inscriptions, and that means the resulting data must be in compatible forms. Using the same DTD in the same way makes this relatively easy. While projects that store their texts as word-processor files with Leiden markup can also share data, they must agree explicitly on the details of text layout, file formats, and character encodings.
Text management must also take into account the writing
systems used in the corpus. If a project is only dealing with
inscriptions in Latin written in the Roman alphabet,
The first approach to the writing system problem is often to use different fonts, as one might with a word processor. This approach is appealing since if the project ever wants to print its texts, it will sooner or later need fonts for the different scripts anyway. It is also analogous to the way texts are presented in print: we recognize that an inscription is in Greek because we see it printed in the Greek alphabet. There are long-standing conventions for the use of boldface, spaced type, and other typographic devices to represent the quasi-Roman alphabets used by the other ancient languages of Italy, like Oscan or Umbrian. Yet the font-based approach assumes that all the software that will manipulate a given text can recognize font-change markers. Some database packages do not allow change of font within a single text field, for example, and some export or interchange formats strip font information.
Unicode is a better approach when the scripts of interest
are all supported, which will be the case for any script still
in use by a living language (for example, Greek or Hebrew).
Hieroglyphic and cuneiform characters are not currently part
of the Unicode standard, however, and even in supported
scripts some particular old characters may not be available.
In Greek inscriptions, for example, numerals are often symbols
composed from the first letter of the word for the number;
fifty
would be represented as π for
With XML, it is possible to define either elements or
entities for unsupported characters. If the DTD contains an
element called, say, <char
type="acrophonic 50" font="numfont" pos="123"/>,
where numfont
names a (hypothetical) font in which this
character is available and pos
is the character
position of that character in that font. Alternatively, the
project might define an entity like
Because so many epigraphy projects deal with large numbers of small texts, whereas literary projects in the classics more often have relatively few larger texts (for example, a few dozen dramas or a couple of epics), epigraphers have been quick to recognize the benefits of digitization for searching and for global manipulation of a corpus. Although many digital epigraphy projects pre-date XML, they are beginning to adopt it, and EpiDoc is emerging as a method.
The EpiDoc initiative, under the leadership of Tom Elliott
of the Ancient World Mapping Center, University of North
Carolina, is working out ways to encode epigraphic data with
the TEI. EpiDoc's basic assumption is that Ancient
epigraphic texts ought to be widely available in digital form
for sharing and use in a variety of environments for a variety
of scholarly and educational purposes. Individuals,
organizations and projects require digital epigraphic texts
for personal or internal use as well; if standard tools and
formats were available, such needs would be more easily
met
(EpiDoc Collaborative). The obvious standard for
sharing and presenting texts is XML. Rather than writing a
DTD for epigraphy from scratch, moreover, the EpiDoc group
uses the TEI because TEI has already addressed many of the
taxonomic and semantic challenges faced by epigraphers,
because the TEI-using community can provide a wide range of
best-practice examples and guiding expertise, and because
existing tooling built around TEI could easily lead to early,
effective presentation and use of TEI-encoded epigraphic
texts.
The EpiDoc approach has already been adopted by several epigraphic projects, and others are considering it. As noted above, Aphrodisias and the Roman Britain corpus use EpiDoc for their texts. The Dêmos project, directed by Christopher Blackwell of Furman University, is a library of materials about Athenian democracy which will include Greek inscriptions, marked up with EpiDoc, among the primary sources. Epigrapher Michael Arnush of Skidmore College is writing translations and commentaries for these inscriptions. The corpus of Macedonian and Thracian inscriptions being compiled at KERA, the Research Center for Greek and Roman Antiquity at Athens, is beginning to use the TEI and may choose to use EpiDoc.
The main product of the EpiDoc Collaborative is a set of
guidelines detailing how to use the TEI for epigraphy in a
standard way. There is also an EpiDoc DTD, which is an
extension of the TEI in the standard way, restricting the
allowable values for certain attributes, suppressing unused
elements, and adding a very small number of additional
elements. The guidelines suggest what features to mark, which
of a set of complementary tags to use for them (for example
The current version of the guidelines document is not
complete; several sections remain to be written, and some are
being revised based on experience. The basic philosophy of
the guidelines, however, is clear. The simplest rule is that
whatever is actually on the stone is in the content of the
elements, while editorial changes and additions are in
attributes. Thus EpiDoc prefers we are not re-writing
TEI,
as the EpiDoc guidelines state (sec. 6). Finally,
everything that can be expressed in the Leiden system, or
other similar schemes, must be expressible in EpiDoc.
Moreover, there must be a one-to-one match between markup
elements in the Leiden system (symbols and character
formatting) and those in EpiDoc, so that the two markup
schemes will be mechanically interconvertible.
An EpiDoc text is structured as a series of un-numbered
The EpiDoc group is also working on tools, for example XSL
stylesheets, to facilitate working with EpiDoc texts; these
tools can be found at the EpiDoc home page at the Ancient
World Mapping Center. One tool that will be particularly
important to wide acceptance of EpiDoc is a transformer that
can convert between Leiden format and EpiDoc XML in either
direction; this is currently under development. This tool
will help projects convert their existing texts to EpiDoc
format, and it will also promote the use of EpiDoc as an
exchange mechanism: two projects that do not want to convert
their own holdings to XML can nonetheless use XML to give
texts to each other. An additional desideratum is an editor
with specific support for EpiDoc, as opposed to a general XML
editor that can read the DTD, by analogy with the HTML editors
that have the HTML DTD built in and do not claim to provide
general support for other DTDs. Such an editor could be
tailored to the needs of epigraphers rather than general
users, and should help overcome the perception among some
epigraphers that XML is
Although the guidelines and DTD are primarily the work of Tom Elliott and his colleagues at UNC, the wider community has been involved from the beginning. Even before the first version of the DTD was prepared, EpiDoc existed in the form of a mailing list, bringing together epigraphers, historians, and humanities computing specialists to discuss how EpiDoc might work. Discussions on this list have ranged from basic philosophical questions to highly technical implementation details.
All the disputes about what you mark—like the
underdot in the example above with nu, triangular broken
letter, xi—don't go away as a result of encoding the
texts in XML instead of in typographic form. One advantage of
structured markup, however, is that editors can, if they
choose, encode more information about how certain a particular
feature is. The date of an inscription, for example, can be
encoded as a range of possible dates. EpiDoc includes the TEI
I am
95% certain of this letter, 83% certain of this letter, and
only 37% certain of this letter
seemed too complicated,
and it was decided that editors should be encouraged to put
these details into the commentary—as they have always
done. The advance of EpiDoc over the Leiden system here is
simply that the editor can note certainty in a standard way in
the markup, not
Other philosophical debates include how much can be assumed from applications that will work with EpiDoc texts, how best to handle characters that are not part of Unicode and will not be added, and how to handle the necessarily imprecise dates given for ancient texts. The archives of the mailing list trace the progress of the guidelines, and the guidelines themselves embody the collective wisdom of a group of practicing epigraphers and XML specialists.
The epigraphic community has a long-established practice of using semantic markup. The markup systems in use have evolved over the past four hundred years, but until relatively recently have always involved special typographical symbols in the text—brackets, underdots, and so on. Some epigraphers see XML as a natural transformation of what they have always done, with all the additional benefits that come from standardization within the community.
The EpiDoc guidelines are emerging as one standard for digital epigraphy with the TEI. EpiDoc is not the only possible way to use the TEI for epigraphic texts, of course, but the tools, documentation, and examples that are growing up around it will make it a good place for new digitization projects to start.