TEI: Szeged Corpus: a natural language processed Hungarian corpus

For inclusion in the TEI Application Page

Form posted from TEI website on 2004-4-27
Language CorporaHungarian21 September 2007Chris Ruotolo Converted to TEI P5 27 May 2004

Zoltán AlexinCreated using newproj webform

  • Host: University of Szeged, Department of Informatics
  • Other institutions involved: 1. Research Institute for Linguistics at the Hungarian Academy of Sciences, Department of Corpus Linguistics, 2. MorphoLogic Ltd. Budapest
  • URL:

Description: The Szeged Corpus is a manually annotated natural language corpus, currently comprising 1.2 million words plus 225 thousand punctuation marks. Texts of the corpus derive from six different topic areas: short business news, daily news, fiction, law, texts related to computer science, and compositions of 14 to 16 year-old students. Corpus texts have gone through different phases of natural language analysis, such as morpho-syntactic analysis, POS tagging, shallow syntactic parsing, and semantic annotation. Current works aim at a more detailed syntactic analysis of the texts, including the annotation of adverbial, preverbal, postpositional, and adjectival structures and the identification of verbs and their argument structures. With this, the consortium intends to lay the foundation of a Hungarian treebank which is planned to be enriched with detailed semantic information as well at a later stage. Different versions of the Szeged Corpus are publicly available after on-line registration and can be used for educational and research purposes free of charge. For more information visit the http://www.inf.u-szeged.hu/hlt web site.

Implementation description: The format of the corpus files is XML and their inner structure is first described by the TEIXLITE DTD, then TEI P4 DTD.

Other Related Resources: Hungarian National Corpus (http://corpus.nytud.hu/mnsz/index_eng.html) TELRI Corpus (http://www.telri.bham.ac.uk/)


Zoltán AlexinUniversity of Szeged, Department of Informatics H-6720 Szeged, Árpád tér 2. HungaryTel: +36 62 544 222/3411Fax: +36 62 546 397Email: alexin@inf.u-szeged.hu