JOS corpora of Slovene

General description: The JOS project developed Slovene annotated corpora and associated resources meant to facilitate development of Human Language Technologies for the Slovene language. The main results are the JOS morphosyntactic specifications (tagset definition), two annotated corpora, and two Web services. The developed resources are available under the Creative Commons licences.

Implementation description: The corpora and morphosyntactic specifications are encoded in TEI P5 using the additional modules for
corpora, linking, analysis and iso-fs plus a few local extensions.

Related resources: Links to papers describing the corpora are given at

Copyright information: The corpora are distributed under the Creative Commons, Attribution, Non-commercial licence.


Tomaž Erjavec
Department of Knowledge Technologies
Jožef Stefan Institute
Jamova cesta 39
1000 Ljubljana

Copyright TEI Consortium. Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 Unported license and a BSD 2-Clause license.
Last recorded change to this page: 2011-12-02  •  For corrections or updates, contact webmaster AT