TEI Members Meeting 2003: Presentation Abstracts
- Nancy Ide: Fifteen (and a half) years of the TEI: A retrospective and prospective
- Patrick Durusau: What is a tree really?
- Dr. Thomas Burch and Dr. Ruth Cristmann: Using the TEI guidelines in digitizing the Deutsche Wörterbuch and other German dictionaries
- Stuart Brown: A Topic Map for the TEI
- Vincent Quint: XML, beyond the markup language
Nancy Ide: Fifteen (and a half) years of the TEI: A retrospective and prospective
The TEI was the first major effort to develop standards for representing electronic data, and its contribution cannot be underestimated: it provided, in part, the impetus for the development of XML, and is today still widely consulted as the first source for encoding textual data. However, from the perspective of the broader encoding community, which nowadays includes, notably, computational linguists, who annotate corpora for wide-scale linguistic analysis to support language processing applications, the TEI's success is, at best, partial. In particular, the TEI has not provided adequate support for encoding the kinds of resources this community relies on, including corpora and linguistic annotations, lexical resources, etc., with the result that numerous projects (including large national and international projects) have developed encoding conventions for these phenomena on their own, and largely without consulting the TEI.
In this presentation, I will attempt to outline the reasons why the TEI has not been successful in meeting the needs of the computational linguistics community, followed by some recommendations for the TEI's future activities, including ways to collaborate with others involved in similar or related enterprises. More generally, I will outline several shifts in focus and perspective I feel are needed in order for the TEI to fall more squarely in line with evolving standards (e.g., W3C standards, ISO work) as well as the major impact that the move toward web-based resources has had, and will continue to have, on representation requirements for electronic data.
Patrick Durusau: What is a tree really?
Every user of markup languages, from GML forward, has been schooled in the cant of "descriptive" versus "procedural" markup. Procedural markup is panned in ISO 8879 as "inflexible" and requiring a user to change procedural markup in order to change the presentation of their document. Changing procedural markup to affect presentation? Sounds a lot like changing descriptive markup in order to have a different tree!
Does that mean that a tree is a particular presentation of descriptive markup? Or perhaps more precisely, a procedural representation of descriptive markup?
Techniques for transforming structured but non-SGML/XML files into SGML/XML have been long known in the markup community. Such techniques should also produce multiple valid instances of SGML/XML from a single file. One possible file format for TEI P5 is proposed that allows unlimited (including overlapping) descriptive markup, while retaining, with preprocessing, compatibility with XML parsers.
Dr. Thomas Burch and Dr. Ruth Cristmann: Using the TEI guidelines in digitizing the Deutsche Wörterbuch and other German dictionaries
In this talk we will show how the TEI guidelines are used to encode the Deutsche Wörterbuch (=DWB) of the brothers Grimm which has been freely available on the Internet for over a year (www.DWB.uni-trier.de). With its 33 volumes this dictionary is both the most extensive and the most significant dictionary of the German language. It was compiled between 1838 and 1961 by many generations of scholars and offers a nearly exhaustive overview of the makeup and development of the German language from its beginning to the time when the last volume was completed.
For the markup of the German Dictionary we are employing the TEI guidelines for the encoding of dictionaries. The complete dictionary data is being structured and marked up in SGML/XML during an intensive markup period. This is being done partly automatically, and partly semi-automatically, by exploiting specific text features - predominantly typographical features - to develop programs which apply the markup in a consistant and controllable way.
Using the SGML/XML encoded DWB as a starting point we will transfer the experiences made during this project in order to encode other german dictionaries, especially dictionaries of the main German dialects and regional languages. This approach aims at setting up an interlinked network of all important dictionaries of the German language. The network includes the dictionaries of the older stages of German, especially the dictionaries of Middle High German, the digital component of which formed the starting point of our use of the TEI guidelines in 1996 (www.MWV.uni-trier.de).
Stuart Brown: A Topic Map for the TEI
The TEI Guidelines define recommendations for the encoding of all kinds of textual material of all kinds in all languages from all times. The system is widely used in the humanities. Two problems arise from the modularity and extensibility of the TEI Guidelines:
-Many TEI encoders are minimally experienced in SGML/XML and often have the initial work of developing a suitable view of the DTDs by an external party. Documentation of views procedures vary in their nature and expressibility; references are usually organised around elements and classes rather than the text features which they are used to mark up.
-Whilst the Tag Set Documentation DTD provides a mechanism for describing local extensions, there exist no associated processes with which different TSDs may be compared or even the ability to assert that element A in TSD B corresponds to element B in TSD Y.
The project consists of the development of a topic map, expressed in the XML Topic Maps syntax modelling the features of the TEI DTDs, and the development of open source tools for the processing thereof, and guidelines for those wishing to implement their own processes.
- a simple mechanism for providing documentation of local views of the TEI DTDs (demonstrated);
- an easy route to a highly navigable localised element/feature reference (demonstrated);
- a text feature-based element finder (demonstrated);
- a framework for sharing implementation notes and extension mechanisms within the TEI community.
Vincent Quint: XML, beyond the markup language
W3C has celebrated the fifth birthday of XML early this year. What made this event important is not only the strong impact of XML on the encoding and interchange of structured information, but also the number of new technologies associated with XML that have emerged in the last few years. The XML markup language comes now with several other languages to describe document models, to specify style, to define transformations, to query complex structures, to address parts of documents, to set hypertext links, etc.
The talk provides an overview of these technologies, reviewing both recent work and work in progress. It focuses on the new possibilities offered to structured documents and dataed documents and data. Two of them are highlighted, modularity and dynamics. They are illustrated by the evolution of some markup languages, including (X)HTML, that are evolving from a static, monolithic SGML document type to a dynamic, modular and extensible family of XML modules.