TEI Internationalization proposal
Contents
- Summary
- Approach to translation
- Organization and Language Coverage
- Translation infrastructure
- Work Plan and Deliverables
Summary
The Text Encoding Initiative Guidelines ( http://www.tei-c.org/ ) have been widely adopted by projects and institutions in many countries in Europe, North America, and Asia, and are used for encoding texts in dozens of languages. The TEI community is broadly international and multilingual, and its geographical reach increases every year. However, the complex encoding of texts at which the TEI excels requires a close understanding of the 522 available elements, and non-English speakers are at a considerable disadvantage in learning and using the Guidelines. The TEI needs to be made more accessible to the international community it seeks to serve.
Ideally, the Guidelines themselves and the TEI tag set would be translated into as many languages as there are user communities. However, translation on this scale is well beyond the resources not only of the TEI but also of most funding bodies. A more realistic approach, which we propose here, is to develop an infrastructure for translation, through which individual language communities may easily produce versions of the essential portions of the Guidelines into other languages. The TEI Guidelines are written in a modular way that makes this infrastructure straightforward to develop. With comparatively modest initial funding, we can produce the necessary framework and sponsor the translation of five initial high-priority languages, drawing on existing efforts already under way in the TEI community. Funding of this proposal will enable us to coordinate these efforts and bring them to prompt and consistent completion.
- a working architecture for delivering an internationalized TEI
- an interface for translators to use in creating translated versions of components of the TEI Guidelines
- translations of the TEI documentation into five languages which the TEI considers to be the highest priority
- a framework for coordinating further translation efforts to be undertaken on a volunteer basis, or to be supported by further funding if available
Approach to translation
- The detailed descriptive prose of the Guidelines chapters and TEI Lite documentation.
- The element, attribute names and suggested attribute values which are put into DTDs and schemas. Thus instead of <addrLine> , the TEI user might prefer to write <líneaDirección> , <ligneAdresse> , <linDireccio> or <AdressZeile> . The TEI has an established system for recording such translations, and preserving the relationship to the original names so that document instances can be put back into canonical form.
- The summary technical descriptions of elements or attributes. Thus instead of ‘contains a single TEI-conformant document, comprising a TEI header and a text, either in isolation or as part of a teiCorpus element.’, the Spanish-speaking user might find it more helpful to read ‘contiene un único documento TEI, compuesto de una cabecera TEI (TEI header) y un cuerpo de texto (text), aislado o como parte de un elemento corpusTei (teiCorpus)’
- The examples of usage for each element. ‘Internationalization’ of these could take the form of simple translation, but in practice localisation would be considerably more useful. Localisation involves choosing examples originating in the target language, which illustrate the element's usage more effectively for a native speaker than a translated example could do.
- There have already been six ‘traditional’ translations of the TEI Lite ( http://www.tei-c.org/Lite/ ) documentation into other languages. These have not covered translation of the element names or technical reference documentation. They are in wide use, however, and have created a need for more extensive translations of the Guidelines themselves.
- The French Groupe d'experts n° 8 within CN 357 (Commission de normalisation «Modélisation, production et accès aux documents») of the CG 46 (Commission générale «Information et documentation») at AFNOR has an interest in TEI translation. Amongst other goals, this group intends to translate the definitions of the TEI elements and attributes into French. So far, they have worked in a ‘traditional way’ on some chapters of the P4 and P5 versions. Dissemination of the resulting French version of these chapters is very limited.
- Some ‘formal’ work has also been undertaken on translating element and attribute names; Alejandro Bia and Arno Mittelbach have prepared translation sets for Catalan, Spanish, and German. This work is integrated into the Roma ( http://www.tei-c.org/Roma/ ) application, allowing users to create tailored schemas in one of the supported languages. However, while this work is useful, it is not by itself the most effective way to proceed. For example, many of the element names are in an abbreviated form of English (eg <respStmt> ) which are not easy to translate sensibly, and because the abbreviated names are relatively easy to recognize for people used to reading Latin script. Furthermore, unless the reference descriptions are also translated, the element names by themselves do not give a clear idea of what the element is for. Using <infoResp> instead of <respStmt> is not as helpful as translating the description ‘supplies a statement of responsibility for someone responsible for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.’
Study of the work described aboves suggests that translating the reference descriptions, followed by translation of element names, is likely to be the most effective way to promote the TEI and support its use in non-English-speaking communities. Translation or localisation of examples would be an important further step for some communities.
Organization and Language Coverage
| Chinese | Marcus Bingenheimer | Chung-hwa Institute of Buddhist Studies, Taipei |
| French | Veronika Lux | University of Nancy |
| German | Werner Wegstein | Wuerzburg University |
| Hindi | Paul Richards | UGS (The PLM Company) |
| Italian | Fabio Ciotti | University of Roma |
| Japanese | Ohya Kazushi | Tsurumi University, Yokohama |
| Polish | Radoslaw Moszczynski | Warsaw University |
| Romanian | Dan Matei | CIMEC - Institutul de Memorie Culturala, Bucharest |
| Slovenian | Tomaž Erjavec,Matija Ogrin | Jozef Stefan Institute, Ljubljana |
| Spanish | Manuel Sánchez | Miguel de Cervantes Digital Library |
| Tibetan | Linda Patrik, Tensin Namdak | www.nitartha.org |
We therefore propose to invite collaborating institutions to work with us on each language. For the initial set of languages, a small budget will be allocated to each in roughly equal proportions, to be expended in whatever way will have the most impact in the local context. The funding may be used to pay graduate students, to pay a supervisor to organize volunteer translators and check their work, to pay a single translator, or to support travel and participation costs in group meetings. In all cases, the funding serves not as full payment for a translation, but as support which makes it possible for local efforts to be completed successfully, and supervised so that they are consistent. The collaborators will supervise the creation of the translations. The initial translation will then undergo a check by a second collaborator. The final results will undergo a check by the TEI editors before final acceptance. All of the work will be done through a web interface to a central data repository (see below), so that translators can see the intermediate results of their work, and so that the TEI editors and Council can monitor progress and check results as they emerge.
- French: University of Nancy
- Spanish: Cervantes Digital Library
- German: University of Würzburg
- Chinese: Chung-hwa Institute of Buddhist Studies, Taipei
- Japanese: Tsurumi University
Translation infrastructure
- Develop a user-friendly environment through which translators can get access to the individual text units for translation, and upload translated text units. This environment will take the form of a web interface to the underlying data repository in which the TEI documentation is stored. The interface will include features to allow for review of translated material, and viewing of the translations in their context, so that translators can see the ongoing results of their work. This will also enable language communities to begin testing and commenting on the translations as soon as they are begun, rather than awaiting a compilation process once everything is complete.
-
Make the necessary technical developments to Roma (the tool
which compiles the TEI source data to produce specific schemas and
documentation) so that it permits more advanced and flexible
localizations of the TEI documentation. For example, the user may be
permitted to generate any of the following combinations (where
‘names’ means tag and attribute names and attribute values, and
‘descriptions’ means the contents of the
<gloss>
and
<desc>
elements in the TEI source):
- canonical: English names, descriptions in English
- local descriptions: English names, descriptions in chosen language)
- local names: names designed to make sense to a speaker of the chosen language, descriptions in English
- fully localized: both names and descriptions in chosen language
- Enhance the delivery infrastructure for TEI (XSLT stylesheets to make HTML and PDF output) so that string constants are available in the chosen language as well as the text. String constants are recurring phrases with specific technical meanings which can easily be translated and then substituted as necessary without regard for the context where they appear. Thus the reference document for an attribute might read ‘Obligatoriu dacă se potriveşte’ for Romanians instead of ‘Mandatory when applicable’.
The infrastructure work will be undertaken by the Research Technologies Service at Oxford University Computing Services.
Work Plan and Deliverables
- January 2007:
- Begin work on web interface for translation workflow. Collaborators convene translation teams.
- February 2007:
- Web interface for workflow is ready for use. Translation teams begin translation process.
- August 2007:
- Materials are substantially complete, and review process has begun.
- September 2007:
- Final revisions to TEI delivery infrastructure and to Roma complete.
- October 2007:
- All materials have been reviewed locally and are ready for final review by TEI Council and editors. Demonstration of system at TEI members meeting.
- December 2007:
- Project is completed.
- A web-based interface for submitting translations.
- For a minimum of five languages, a translation of each element and attribute name, and their associated description text, maintained as part of the core TEI P5 source.
- Support in the Roma schema-generation tool for producing schemas and reference documentation in localized form.
- Translation of TEI documentation into French, Spanish, German, Chinese, and Japanese: £8500
- Development of translation environment and web interface; adaptation of Roma tool; and enhancements to TEI stylesheets: £1500
- Total project cost: £10,000


