The FIDA Corpus of Slovene Language


FIDA, the Corpus of Slovene Language, represents a reference corpus of the Slovene language and was compiled within the framework of a joint project involving four partners; two from the academic/research sphere and two commercial ones: the Faculty of Arts (University of Ljubljana), the Jožef Stefan Institute, the DZS, General Publishing and the Amebis software company. Corpus compilation started in spring 1997 and was concluded by the end of 2000. The project was funded by the two commercial partners.

The corpus contains just over 100 million words of contemporary Slovene texts, encompassing a broad range of Slovene language variants and registers as found in the Slovene press, complemented by some texts from the Internet and speech transcripts.

The corpus represents contemporary Slovene from the second half of the 20th century, with the majority of texts having been produced in the 90s. It is composed of written texts and texts originally produced as written for speaking purposes; speech transcripts – parliamentary proceedings – are the only spoken component of the corpus.

The corpus is not freely available and can be accessed only with a valid username and password.

– FIDA Corpus WWW page


Simon Krek, Project Co-ordinator
DZS, General Publishing
Mestni trg 26
Marko Stabej, Vojko Gorjanc, Corpus Editors
Faculty of Arts
Aškerčeva 2
Tomaž Erjavec, Corpus Encoding Expert
Jožef Stefan Institute
Department of Intelligent Systems
Jamova 39

Copyright TEI Consortium. Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 Unported license and a BSD 2-Clause license.
Last recorded change to this page: 2007-09-21  •  For corrections or updates, contact webmaster AT