TEI: Polish language of the XX century sixties

For inclusion in the TEI Application Page

Form posted from TEI website on 2008-12-1
Language CorporaPolish 1 Dec 2008attn.: Janusz S. Bień Created using newproj webform

  • Host: Institute of Informatics. University of Warsaw
  • Other institutions involved: Present-day form of the corpus is the result of collaborative effort of several persons with different affiliations, both volunteers and supported financially by various grants. Details are referenced in the editorial declaration.
  • URL:

Description: The original purpose of the corpus was to create a general frequency dictionary of contemporary Polish. The work started in 1967. Partial results were published between 1972 and 1977, the completed dictionary in 1990. The corpus was later augmented in various respects, both by manual editing and automated procedures.

Corpus data contain 10,000 samples divided into 5 parts: essays, news, scientific texts, fiction and plays. Every sample is approximately 50 words long, they all come from texts published between 1963 and 1967 and contain bibliographic description of its source. Each word is tagged with its base form and some morphological properties. Sentence boundaries are also marked.

Implementation description: TEI P4

Other Related Resources: Corpus documentation:

Access: GNU General Public Licence for corpus data, GNU Free Documentation Licence for corpus documentation.


attn.: Janusz S. BieńKatedra Lingwistyki Formalnej UWBrowarna 8/1000-311 Warszawa Tel: (48) 22 5520918Fax: (48) 22 5520918Email: jsbien@uw.edu.pl