TEI: TEI for Linguists SIG

Piotr Bański3 November 2010The Text Encoding Initiative


removed Eleonora Litta Modignani Picozzi as co-convenor per Piotr Banski


The TEI recommendations relevant for linguists are scattered across several chapters of the Guidelines, especially the chapters: Transcriptions of Speech (ch. 8), Dictionaries (ch. 9), Language Corpora (ch. 15), Simple Analytic Mechanisms (ch. 17) and Feature Structures (ch. 18). There used to be an effort to build a possible chapter on corpus encoding, but the effort was not conclusive. However, TEI-based annotation schemes are not as widely known among linguists as most TEI-insiders would expect. Consequently, it is not used very often in linguistics. This SIG addresses this situation and promotes the TEI Guidelines within this large group of researchers.


Within linguistics, the use of “Language Resources” increases constantly. The aim of the SIG “TEI for Linguists” is to provide a common, uniform set of recommendations for the encoding of Language Resources with TEI markup. This relates to both “item-based” resources (lexica, ontologies) and “text-based” resources (corpora). These types of resources or also regarded as static resources, when contrasted with dynamic language resources like parsers, taggers etc. Moreover, the SIG “TEI for Linguists” wants to become a forum for scholars who want to consider the use of TEI markup schemes for some of the diverse linguistic tasks. Some of the main tasks of this SIG are:

  • the identification of issues that “Ordinary Working Linguists” would like the TEI to be able to handle for them (e.g. the encoding of corpora and lexica, but also ‘everyday encoding of linguistic structures’ for the purpose of teaching or theorizing);
  • the promotion of modules that could be used in linguistic subdisciplines, e.g. computational linguists could make use of the feature structures module or phoneticians could use (and extend) the TEI module for the transcription of speech;
  • the creation of a module (or a set of them) for linguistic description, as a separate chapter of the Guidelines or a set of ODDs to enable linguists to to use the TEI encoding more easily;
  • interfacing with other TEI SIGs whenever they brush against linguistic issues (Ontology SIG for lexical databases, Overlap SIG for corpus encoding, Tools SIG for language description and processing tools);
  • the possibility of collaboration with researchers working within ISO TC 37 SC 4 (http://www.tc37sc4.org/).

  • Piotr Banski, University of Warsaw
  • Andreas Witt, Institut für Deutsche Sprache, Mannheim
Mailing list and wiki space

Visit the homepage of the mailing list for the SIG or its space at TEI Wiki.


The first meeting of the SIG will be held during the TEI Members Meeting in Zadar.