For inclusion in the TEI Application Page

Information provided by Tomaz Erjavec, via this very file, 2004-11-24
MultilingualEnglishBulgarianCzechEstonianHungarianRomanianSloveneSerbianCroatianResianRussianLanguage CorporaMorphosyntax21 September 2007Chris Ruotolo Converted to TEI P5 24 November 2004

Tomaz ErjavecSubstantial update from webpage by Tomaz Erjavec, to synch with V3.

17 December 2001

Stuart BrownSubstantial update from webpage as requested by Tomaz Erjavec. Have contacted

him again as it is not clear from pages which institute is hosting it.

23 August 1996

WPCreated file

  • URL:


The resources are a multilingual dataset for language engineering

research and development. This dataset contains, for Bulgarian, Croatian, Czech,

English, Estonian, Hungarian, Lithuanian, Resian, Romanian, Russian, Slovene, and

Serbian, some, or all of the following language resources:

  • the morphosyntactic specifications, lexica, and annotated

    “1984” corpus;

  • parallel and comparable text and speech corpora;
  • and associated documentation.

The complete corpora as well as the documentation are encoded in TEI P4.

The project was a spin-off of MULTEXT

and ran from ’95 to ’97. developed language resources for six

languages: Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene, as well as

for English, as the hub language of the project. It also

adapted existing tools and standards to these languages. The main results of the

project were an annotated multilingual corpus and lexical resources for the seven languages.

The extended results of the project were made available in 1998, first on CD-ROM and

then via TRACTOR, the TELRI Research Archive of Computational Tools and Resources.

In the scope of the Concede project, a new release was made available in 2002; it

contained only the (updated and corrected) morphosytntactic resources from the first

release. This second release was made freely available for research use via the Web.

Finally, the third release was made in 2004 – it updates and brings together the

first two, adds new languages, and make the move from SGML to XML, in particular to

TEI P4 – this work was supported by the TEI task force on SGML to XML migration.

Version 3 is also available via the Web, from the home page of the project.

For further information on the project, its results and their

exploitation you can consult the annotated bibliography of , available

in HTML and various other formats from the project Web page.

(from the WWW page)


Tomaž ErjavecJožef Stefan InstituteJamova 39SI-1000 LjubljanaSloveniaTel: +386 1 477-3507Fax: +386 1 425-1038Email: tomaz.erjavec@ijs.si