TEI: The FIDA Corpus of Slovene Language

For inclusion in the TEI Application Page

Notification by email from Tomaz Erjavec
Language CorporaSlovene21 September 2007Chris Ruotolo Converted to TEI P5 18 December 2001

Stuart BrownEntry created

  • Host: DZS, General Publishing
  • Funders:
    • DZS, General Publishing
    • Amebis Software Company
  • URL:


FIDA, the Corpus of Slovene Language, represents a reference corpus of the Slovene language and was compiled within the framework of a joint project involving four partners; two from the academic/research sphere and two commercial ones: the Faculty of Arts (University of Ljubljana), the Jožef Stefan Institute, the DZS, General Publishing and the Amebis software company. Corpus compilation started in spring 1997 and was concluded by the end of 2000. The project was funded by the two commercial partners.

The corpus contains just over 100 million words of contemporary Slovene texts, encompassing a broad range of Slovene language variants and registers as found in the Slovene press, complemented by some texts from the Internet and speech transcripts.

The corpus represents contemporary Slovene from the second half of the 20th century, with the majority of texts having been produced in the 90s. It is composed of written texts and texts originally produced as written for speaking purposes; speech transcripts – parliamentary proceedings – are the only spoken component of the corpus.

The corpus is not freely available and can be accessed only with a valid username and password.

– FIDA Corpus WWW page


Simon Krek, Project Co-ordinatorDZS, General PublishingMestni trg 26LjubljanaSloveniaEmail: fida@dzs.si

Marko Stabej, Vojko Gorjanc, Corpus EditorsFaculty of ArtsAškerčeva 2LjubljanaSloveniaEmail: marko.stabej@guest.arnes.si, vojko.gorjanc@guest.arnes.si

Tomaž Erjavec, Corpus Encoding ExpertJožef Stefan InstituteDepartment of Intelligent SystemsJamova 39LjubljanaSloveniaEmail: tomaz.erjavec@ijs.si