Polish language of the XX century sixties

Description: The original purpose of the corpus was to create a general frequency dictionary of contemporary Polish. The work started in 1967. Partial results were published between 1972 and 1977, the completed dictionary in 1990. The corpus was later augmented in various respects, both by manual editing and automated procedures. Corpus data contain 10,000 samples divided into 5 parts: essays, news, scientific texts, fiction and plays. Every sample is approximately 50 words long, they all come from texts published between 1963 and 1967 and contain bibliographic description of its source. Each word is tagged with its base form and some morphological properties. Sentence boundaries are also marked.

Implementation description: TEI P4

Other Related Resources: Corpus documentation: http://www.mimuw.edu.pl/polszczyzna/pl196x/doc/index_en.htm

Access: GNU General Public Licence for corpus data, GNU Free Documentation Licence for corpus documentation.


attn.: Janusz S. Bień
Katedra Lingwistyki Formalnej UW
Browarna 8/10
00-311 Warszawa
Tel: (48) 22 5520918
Fax: (48) 22 5520918
Email: jsbien@uw.edu.pl