The TEI has laid the foundation for the humanities digital library, but what are the implications for the next generation of librarians, scholars and students? How can we reconcile the need for individual scholarly interpretation of texts with the requirements for "standardization" in a generalized environment? Is it possible to build a library of texts with "layered" markup and what would this mean for working practices? How can TEI be used alongside other standards and markup schemes? This presentation will offer some thoughts on the future of digital scholarship and the role that the TEI might play in it.
While P4 does offer resources for the transcription of speech (#11) and for some kinds of linguistic analysis (e.g. #15), the basic problem with linguistic interviews is that they are essentially not documents. Today they are, first of all, sound recordings, and various kinds of information and encoding can be derived from them, only one of which is a text transcription covered by TEI. A central question for use of TEI with linguistic interviews is how a text transcription is related to other kinds of digital information (e.g. sound files, acoustical plots, maps), closely followed by the question of how TEI encoding might best be implemented with other layers of text encoding (e.g. lexical, phonetic, grammatical encoding for analysis; survey-specific encoding; document structure encoding for alternate organizational units such as breath groups or prompt/response objects).
Although the TEI Guidelines provide an outstanding framework for a number of textual applications, they stop short of creating a broad framework for digital libraries. Page image collections, compound documents, and mixed format repositories have become the everyday business of digital libraries throughout the world. The TEI must remain a vital part of that world, and if it is to do so, it must make the Guidelines relevant to the digital library practitioner.
The TEI has always had an ambiguous relation to public standards for markup technologies such as SGML and now, XML. On the one hand, it relies on these standards both for its own implementations and (accordingly) for many of its core assumptions. On the other hand, its own purposes and aims suggest that it should not be identified with these standards, but rather has to adopt a strategic policy of standards conformance as a means to an end. If the TEI is not XML by definition, what is it? I will argue that a more nuanced and long-term view of the relation of TEI to XML can help guide us in many of the immediate questions we face, such as the TEI's relation to various XML technologies (schema technologies, stylesheets etc.), TEI infrastructure ("Son of ODD"), training for users at all levels, development and refinement of tag sets and applications, and the growth and sustainability of an inclusive, vital community that can continue to sponsor innovation and progress while maintaining the TEI's commitment to such classic objectives as fostering interchange and facilitating top-notch scholarship in the humanities.
The TEI's most vociferous detractors have not objected to its rigor, its scope, and certainly never to its objectives. The problems posed by the TEI tend to be concentrated on how one should handle documents once one has prepared them. The very positive push towards XML and associated style sheets has all but silenced those who have been concerned about limited options for its display. What remains, in my opinion, is an implementation hurdle -- how to provide easy cross-document searching with extensive fielded capability. At the University of Chicago, we have effectively side-stepped this hurdle by parsing TEI encoded documents into a standard format, which we call ATE (ARTFL Text Encoding), before loading them. After parsing, we load them into PhiloLogic, our full-text and retrieval analysis system. The latest implementations of PhiloLogic are based on a general model of textual objects that combines related sets of structured database tables (SQL) to manage textual objects with full-text searching. To handle more complex representations of textual objects we have written special "extractors" for building related SQL tables. Based on our experience with PhiloLogic, I will in this talk explore a model for developing a complete full-text engine that could leverage the full power of TEI-encoded documents from start to finish.