Written from scratch
Brown University proposes to renew its TEI host status for a term
of four years, from January 2005 through January 2009. Within Brown,
the primary locus of the host activities will continue to be the Women
Writers Project, a long-standing TEI project engaged in intensive
research on text encoding of early printed books, currently producing
TEI documentation under a grant from the US National Endowment for the
Humanities. The WWP would host the TEI with collaboration and sharing
of the host fee from the Center for Digital Initiatives at the Brown
University Library, and possibly also from Brown’s Scholarly
Technology Group. All three groups share a strong interest in the TEI
from varying perspectives, and are actively engaged in TEI research on
topics including digital libraries and best practices, scholarly text
archives, documenting TEI extensions, using TEI for web sites and
small scholarly projects, developing TEI publication tools and
systems, and exploring their use in scholarly research. Brown is also
home to several other TEI-related projects and projects which are
considering using the TEI, including the Modernist Journals Project
The Brown staff who will provide effort to the TEI Consortium are:
- Julia Flanders is the Director of the Women Writers Project. She is also the Associate Director for Textbase Development at Brown’s Scholarly Technology Group, and engages in consulting on TEI projects at STG. As a private consultant and through her work at the WWP, she regularly provides TEI training and consulting.
- Syd Bauman is a programmer/analyst at the Women Writers Project, where he sustains the WWP’s encoding research effort and provides all technical implementation of the WWP’s TEI work. He has served as North American Editor of the TEI Guidelines since January 2002, and before that served as interim editor.
In addition to Brown’s $5000 cash host contribution, Brown will commit to the following in-kind contributions:
- The WWP will contribute Julia Flanders’ time to perform the duties of TEI Chair, 10% FTE for as long as she remains in that role.
- The WWP will contribute Syd Bauman’s time to perform the duties of North American Editor at a level of 12% FTE, until any change is made to the level of payment or effort for the Editors, at which point this contribution can be adjusted or renegotiated.
In addition to these commitments, the WWP staff will continue to perform TEI training and publicity work, recruiting, and assistance with TEI grant proposal writing.
This document presents a proposal from three key institutions (LORIA, ATILF and INIST) located in Nancy (France) to act jointly as a new European host for the TEI consortium. The three institutions combine wide ranging expertise in fields central to our vision of the TEI’s mission. This vision is also presented here, since it underlies the specific technical and managerial proposals we put forward for consideration by the Consortium.
The proposed host would be coordinated by the Loria laboratory in conjunction with two complementary institutions, ATILF and INIST: together these institutions provide a coherent pool of expertise with well-defined connexions to other institutions at national and international levels. Each has long-standing experience in document management activities (spoken corpora, textual corpora, grey literature, dictionary, terminology, etc.) and related research (natural language processing, lexicology, linguistics, information extraction). They have also been involved in joint activities for many years, which ensures their capacity to act together in making this proposal.
Loria has been involved in TEI related activities for many years and has progressively developed a strong expertise in this domain. Among those activities, we can mention just a few to illustrate our general interests:
- The Silfide project (1994-1996), an on-line concordancing environment fully based on TEI encoded texts;
- The European Telri project which led to the production in 1998 of a qTEI based corpus of 22 translation of Plato’s Republic;
- The Asila project (2001-2003), which was an opportunity to compile several legacy corpus of transcribed oral dialogue coming from various French research laboratories;
- The Freebank project (2004-), that aims to provide a free environment for depositing and accessing on-line linguistic resources (e.g. annotated corpora).
Since 2000, Loria has taken a leading position in international standardization, taking responsibility for coordination of ISO standard 16642 (Terminological Markup Framework), and also chairing ISO committee TC 37/SC 4 (Language Resource Management).
Loria will contribute to TEI consortium activities by contributing its technical expertise in the domains of data modeling and corpus management. It will also act as the administrative centre for the three institutions.
Loria will also work in close collaboration with the headquarters of INRIA to develop a general framework for using the TEI guideline as a basis for its annual scientific report.
The laboratory conducts researches along three main axes:
- History of Language: Research activities are based on the diachronic approach which is essential to language study (Middle and Renaissance French, Etymology and History of the Lexicon, Meta-lexicology);
- Modern and Contemporary Languages: comparative approach to languages (Contrastive and multilingual studies, Dialectology and Regionalisms, Interface between grammar, semantics and discourse);
- Computerized Linguistics: definition, management and use of computerized linguistic resources in automated language processing (Development of computerized tools, databases and resources, linguistic resources and metadata, natural language processing).
At a national level, ATILF is the reference platform of linguistic resources supported by the CNRS Humanities and Social Sciences Department. It maintains computerized dictionaries and encyclopedias (Trésor de la Langue Française, Dictionnaires de l’Académie française, Encyclopédie de Diderot et d’Alembert, historical dictionaries of the French language), textual databases (Frantext and tagged Frantext), grammatical tagger (WinBrill) and linguistic databases (Historical database of French vocabulary) represent the core of the resources distributed by the laboratory.
As a major European resurce provider, ATILF will contribute to the TEI consortium real life use cases and the ability to evaluate new technical proposals. Its extensive experience includes both the management of legacy data and the creation of new textual or lexicographic data.
INIST is the CNRS national centre for scientific and technical information. It is a repository of large document collections available to the public through document delivery services. It also provides indexing services of literature in Science, Technology, Medicine, Humanities and Social Sciences designed to contribute to bibliographic databases, as well as information services available online or on a variety of electronic media. The document holdings at INIST cover the core international scientific and technical literature.
In addition, INIST produces two bibliographic databases, FRANCIS and PASCAL, covering the core scientific and technical literature; PASCAL is a multidisciplinary, multilingual bibliographic database that covers the core world literature in Science, Technology and Medicine back to 1973; FRANCIS is a multidisciplinary, multilingual database that covers the Humanities and Social Sciences back to 1972. INIST also produces the BHA database in partnership with the Getty Research Institute.
INIST is gradually enhancing its functions from those of the traditional librarian towards those involved in the support of digital archives for grey literature (e.g. dissemination of digital PhD theses). It will thus complement the contributions made by other members of the Nancy Group in exploring the applicability of TEI to scientific publications and reports. Work will also we done on designing and testing a new terminology chapter for the TEI.
We see the TEI community as a unique arena where people with potentially differing scientific and technical backgrounds can share their methods and experience in the domain of textual encoding and management. This community of experts has along the years learnt to share a common language, which in a way (as can be seen from the discussion on the TEI list) goes beyond the sole sharing of the TEI tagset. The TEI community has also shown along the years that it could be at the forefront of available technologies, by, on the one hand, making the good choices when adopting such standards as SGML right from its start, but also by contributing heavily to the evolution of such technologies (e.g. the influence of the TEI work on XML and related standards).
As a consequence, our bid is based on the assumption that the TEI community should carry on attracting the best expertise and relating itself to the most advanced initiatives worldwide in order to keep this leading position on the technological scene. To this purpose, we center our bid on potential scientific and technical progress that can be made in close collaboration with other ongoing standardization efforts.
Grouping these three institutions corresponds to a wish to combine their respective expertise on order to be able to provide a strong technical contribution to the TEI activities. In particular, we want to take a leading role in the following activities:
- Experimenting the ODD platform to explore its various extensions mechanisms and contribute to their dissemination;
- Revising the transcription of speech, in order to add some essential descriptors which are currently obviously missing (e.g. related to the management of turns) and provide some more elaborate guidelines as to the representation of temporally synchronized events (e.g. to facilitate the mapping to other formats such as Annotation graphs, Transcriber, etc.);
- Designing a new chapter that would replace the outdated P4 Terminology chapter, in order to make it compliant to a) the ISO standard 16642 (Terminology Markup Framework) and b) with the TBX syntax recommended by LISA (Localization Industry Standard Association). This future chapter would only consider a core set of descriptors (aka data categories) as recommended by the TEI, and would make full use of the extension mechanisms provided by the ODD framework to allow users to describe more elaborate models;
- Validating, and possibly updating, the current P5 Print dictionary chapter, in close collaboration with dictionary projects worldwide (e.g. the Grimm dictionary in Trier).
One of the important aspects of our technical involvement is to ensure that the TEI developments are closely related to other international initiatives, in the continuity of the links established with W3C work. More specifically, we want to achieve a close collaboration between the TEI and ISO committee TC 37 on the following topics:
- Ensure the synchronization between the work done in the SO working group and the LAF (linguistic annotation framework) project within IISO/TC 37/SC 4. In particular, there is a need for the two resulting documents (future TEI chapter and ISO standard) to share, at least, the same pointing and linking mechanisms and, when possible, the same underlying syntax (e.g. TEI attributes);
- Carry on the joint TEI-ISO work on feature structures, which is close to its termination for the FS representation part, by developing a joint proposal for FS declaration;
- Influence the on-going work on lexical structures in ISO/TC 37/SC 4 (project LMF Ð lexical markup framework) so that the print dictionary chapter could be seen in the close future as an LMF application;
- Specify and implement the link between the TEI tagset and the ISO/TC 37 Data Category Registry so that the latter provides the semantics for language related descriptors used by the former;
- Confront the current TEI header with other metadata initiatives (IMDI, OLAC ISO/TC 37/SC 4 ad hoc group on metadata description) in order to provide mappings from and to them, and possible some TEI extensions to incorporate part of their contributions (e.g. OLAC roles).
To do so, we will suggest to organize (as we have done in the recent months) meetings at our premises, or more practically, at AFNOR (close to downtown Paris and CDG airport).
Our application as TEI host is based on the idea that we can stand as a center of competence for TEI related activities at local, national and international (esp. European) levels. Indeed, we want to take actions to:
- Create a core group of local experts within our three institution, who can be at the root of new TEI related projects;
- Take the lead of some national initiatives in France to network the research team which are involve in corpus gathering or lexicon creation activities. This will be done through the organization of dissemination and teaching activities; work has started in this direction with the creation of two national interest groups on grey literature management and spoken corpus transcription.
- Network European centers of competence in the domain of digital linguistic resources to foster more competence sharing and, incidentally, attract more members to the TEI consortium.
Specific actions will be taken to establish strong links with German sites which by represent the highest potential of TEI users in EU at present. The excellent feedback received recently (8-9 Oct. 20004) during a tutorial given by Lou Burnard and Laurent Romary at Würzburg seems to be a promising start for future collaborations.
This activity will obviously comprise preparing and presenting tutorials.
Director: Hélène Kirchner
- Laurent Romary, Directeur de Recherche, already member of the TEI council and chairman of ISO committee TC 37/SC 4 will coordinate the activities presented in this bid;
- Matthieu Quignard, Chargée de Recherche, will lead the lead of the speech transcription revisiong activity;
- Isabelle Kramer, Ingénieur expert, will directly be in charge of the editorial work related to the new terminology chapter.
Director: Jean-Marie Pierrel
- Etienne Petitjean, Ingénieur de Recherche, will contribute to develop tools for textual and lexical access;
- Zina Tucznac, Ingénieur d’étude, will contribute to evaluate the use of the TEI header for textual archive management and see the possibles bridges with such initiatives as OLAC or IMDI;
- Evelyne Jacquey, Chargée de Recherche, will work on using the print dictionary framework to new lexicographic projects;
- Susanne Salmon-Alt, Chargée de Recherche, will provide her expertise in designing amendments and extensions to the print dictionary chapter.
Director: Robert Duval
- Xavier Polanco, contribute in organizing tutorials and dissemination events in relation to the TEI;
- Veronica Lux, will bring her expertise on XML to work on application guidelines of the TEI to grey literature (comprising bibliographical aspects);
- Michèle Bonthoux, will contribute to the design and evaluation of terminological aspects; (indexation – termino).
The key staff at Oxford who will provide services and support for the Consortium are:
- Lou is Assistant Director of Oxford University Computing Services, and European Editor of the TEI Guidelines. He has been at the epicentre of TEI work since its inception, and has left his mark on almost everything the TEI has done. He undertakes regular consultancy and teaching about markup, especially in the area of corpus linguistics. Outside the TEI Consortium, he is curently concentrating on Xaira, a TEI-XML text searching engine, designed for language corpora.
- Sebastian is Information Manager for OUCS,
and manager of OSS
Watch (the JISC-funded
Open Source Advisory Service). He has been involved with the TEI since 1999, when he instituted a conversion of the resources of OUCS to TEI XML markup. He took a major part in the conversion of the TEI to XML for P4, and leads the language redesign for TEI P5. For OUCS, he maintains complex stylesheets and other tools for authoring local web sites in TEI.
- Judy is administrator for the Research Technologies Services and has been preparing TEI accounts for the last 3 years
The RTS, under the leadership of Mike Fraser, combines an unusually
broad range of expertise, involving most areas of text encoding,
humanities computing, metadata and digital libraries as well as
undertaking leading-edge collaborative research in the application of
IT to research support. Other relevant national facilities now hosted
by the RTS include the Oxford Text Archive, the Humbul Humanities Hub
and the OSS Watch Open Source Advisory Service. The RTS has a high
national and international reputation and a successful track record in
bidding for research contracts from the JISC and other funding
bodies. It also welcomes visiting researchers and research
Since the start of the TEI Consortium, Oxford has maintained one of the two mirrored web sites for the Consortium. It currently hosts a Perforce repository for the TEI web site and its archive.
As part of their role as a TEI Host, staff at Oxford would expect to continue to
- Take a major part in TEI workgroups and committees
- Regularly teach text encoding using the TEI
- Provide TEI consultancy in the UK and Europe, in particular to those research communities with which we have close links
- Contribute to the development of an enhanced online presence for the TEI
in kind contribution, we will be developing software
and web services for the TEI; we expect the practical production aspects of P5
to be a major activity during 2005.
The University of Virginia proposes to continue serve as a host of the TEI Consortium for a four year term. Virginia played a critical role in establishing the Consortium, largely through the efforts of John Unworth, then director of the Institute for Advanced Technology in the Humanities (IATH), but also with the assistance of staff in IATH and the Library. Over the course of the four years that Virginia has served as host, the use of TEI has continued to increase. It is being used for large scale production of electronic texts in the Library, and in a wide variety of humanities research projects, many of them generously funded through private foundations and government grants. A new unit, the Electronic Imprint of the University of Virginia Press, has expanded the local use of TEI into publishing. TEI plays a critical role in a wide variety of digital initiatives. The University thus remains firmly committed to assist in the long term maintenance and development of the standard.
The Library and IATH will share hosting duties. Both units will share evenly the $5000 annual membership dues and be jointly responsible for coordinating $5000 annual in-kind contributions of staff time and support. Provided we continue to have a host representative on the Board, the host representative will rotate between the two units on a two-year cycle. Beyond these arrangements, humanities researchers at Virginia associated with the Electronic Imprint, Rossetti Archive, the Virginia Center for Digital History, and the NINES project have all expressed interest in becoming more involved with the TEI and assisting in meeting the in-kind contribution.
In partial fulfillment of our hosting responsibilities, Virginia proposes to assume responsibility for the ongoing maintenance of the TEI Web site. Given the desire of the Board to redesign the Web Site, Virginia will initially analyze the existing Web Site, gather suggestions from the Board on desired revisions and extensions of content and functionality, and provide a detailed design proposal for consideration and discussion by the Board. Virginia will take full responsibility for implementing a design approved by the Board, and subsequently assume responsibility for maintaining the site for the duration of the host appointment. Virginia will continue to host the Council and Board listservs.
In addition to these two commitments, staff at Virginia will continue to assist in writing and developing grant proposals, particularly those targeted at U.S. foundations and funding agencies; recruiting new members; and serving as needed and appropriate on working groups and committees.