The English-Norwegian Parallel Corpus
- Host: University of Oslo
"The aim of the project is (1) to compile corpora of parallel texts in different languages and prepare them for computer processing; (2) to develop tools for analysing parallel texts; and (3) to carry out studies of the structure and communicative use of the languages based on the corpus.
The English-Norwegian Parallel Corpus consists of extracts of 10,000–15,000 words from English and Norwegian original texts and their translations (English to Norwegian and Norwegian to English). There are 100 English texts and 100 Norwegian texts aligned at the sentence level, in all approximately 2.6 million words. The texts are encoded in accordance with the TEI conventions (see the ENPC manual for details). An automatic alignment program has been produced by Knut Hofland, Norwegian Computing Centre for the Humanities, Bergen, and a browser by Jarle Ebeling, Department of British and American Studies, University of Oslo.
When the English-Norwegian Parallel Corpus was completed, work began to include other languages, mainly German, Dutch, and Portuguese. The extension of the corpus to include other languages, especially German, later resulted in a new project, Languages in Contrast, and a new corpus, called the Oslo Multilingual Corpus. The texts in the new corpus, which also includes French, are aligned at sentence level and are encoded according to the TEI conventions.
Because of copyright restrictions, the corpora cannot be distributed to researchers outside the universities of Oslo and Bergen."
– Stig Johansson
Department of British and American Studies
Faculty of Arts, University of Oslo
Norwegian Computing Centre for the Humanities, Bergen