Corpus Encoding Standard (CES)

For inclusion in the TEI Application Page

E-Mail from Nancy Ide, September, 1997
European languages (non-specified) Language Corpora 20 September 2007 Chris Ruotolo Removed broken link; converted to TEI P5 10 December 2001

Stuart BrownMinor edit; URLs checked and OK.

5 June 2000

Frances CondronUpdated information on the CES, and changed links to next project.

14 October 1997

WPCreated file

Host: Vassar College Department of Computer Science URL:

Description:

MULTEXT, along with EAGLES and the Vassar/CNRS collaboration (supported by the U.S. National Science Foundation), have developed a Corpus Encoding Standard that will "serve as a widely accepted set of encoding standards for corpus-based work... The CES is specifies a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive representation (marking of structural and typographic information) as well as general architecture (so as to be maximally suited for use in a text database). It also provides encoding specifications for linguistic annotation, together with a data architecture for linguistic corpora." The CES is available in SGML, and XML.

Contact:

Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York 12601 USA Tel : +1 914 437 5988 Fax : +1 914 437 7498 Email: ide@cs.vassar.edu