- Host: Institute of Informatics. University of Warsaw
- Other institutions involved: Present-day form of the corpus is the result of collaborative effort of several persons with different affiliations, both volunteers and supported financially by various grants. Details are referenced in the editorial declaration.
- URL: http://www.mimuw.edu.pl/polszczyzna/pl196x/index_en.htm
Description: The original purpose of the corpus was to create a general frequency dictionary of contemporary Polish. The work started in 1967. Partial results were published between 1972 and 1977, the completed dictionary in 1990. The corpus was later augmented in various respects, both by manual editing and automated procedures. Corpus data contain 10,000 samples divided into 5 parts: essays, news, scientific texts, fiction and plays. Every sample is approximately 50 words long, they all come from texts published between 1963 and 1967 and contain bibliographic description of its source. Each word is tagged with its base form and some morphological properties. Sentence boundaries are also marked.
Implementation description: TEI P4
Other Related Resources: Corpus documentation: http://www.mimuw.edu.pl/polszczyzna/pl196x/doc/index_en.htm
Access: GNU General Public Licence for corpus data, GNU Free Documentation Licence for corpus documentation.
Katedra Lingwistyki Formalnej UW
Tel: (48) 22 5520918
Fax: (48) 22 5520918