TEI Character Encoding Workgroup Chris Ruotolo September 2007

Licensed under

Created from scratch.

16 September 2007 Updated and converted to P5

The TEI Character Encoding Workgroup, chaired by Christian Wittern, began its work in 2003. The group completed its work in 2005.

Resources Draft Documents for P5 Replacement draft for TEI P5/CH Replacement draft for TEI P5/WD Draft Papers CE01: Terms of Reference for the TEI Workgroup on Character Encoding CE W 01: [DRAFT] Chapter 4: Languages and Character sets CE W 02: XSLT-based proof of concept for solutions discussed at Tuebingen meeting CE W 03: A collection of use cases for extensions to the basic character set of a document CE W 04: Language and script identification (Additional comments from PD) CE W 05: Semantics for characters and linguistic features CE W 06: Extending the document character set CE W 07: Private use characters in XML CE W 08:An analysis of topics in P4 chapter 4 and CE W 01. CE W 09: Language identification: draft for inclusion in P5/CH CE W 12: Report from Sanskrit Workgroup Meetings and Reports CE M 01 Minutes of Workgroup Meeting in Nancy, 05-06 Nov 2003 CE R 01 Report to the TEI Members Meeting in Chicago, Oct 2002 CE M 01 Minutes of Workgroup Meeting in Tuebingen, 23-24 Jul 2002
Background Documents and Links Design Of An Electronic Method For Describing Writing Systems (Eric S. Albrights thesis) The Text in the Age of Digital Reproduction (Draft paper by Christian Wittern) (TEI-C) P4: The XML Version of the TEI Guidelines (W3C) Character Model for the World Wide Web 1.0 (W3C, Unicode Consortium) Unicode in XML and other Markup Languages Jukka Korpela: A tutorial on character code issues Some use cases Typographic Regularization in the WWP Textbase A proposal for ACH/ALLC 2001 by Jacqueline H. Russom and Sydney D. Bauman (Scholarly Technology Group, Brown University) How to refer to characters/glyphs not in the document character set The SVG Specification uses an element AltGlyph to refer to variant glyphs MathML uses an element <mglyph> for "presentation glyphs". Unicode has specific and generic Variation Selectors (U+FE00~U+FE0F), see (Unicode Consortium) Standardized Variants. The usage of these is also discussed in the document Unicode in XML and other Markup Languages mentioned above. Character semantics Unicode defines character semantics in the Unicode Character Database (UCD, available at UnicodeData.txt; here is an explanation of its contents: Unicode Data File Format, see also: (Unicode Consortium, UTR Draft) Unicode Technical Report #23 CHARACTER Properties (Unicode Consortium, TUS Annex 21) Case Mappings (Unicode Consortium, UTR Draft) Unicode Technical Report #30 Character Foldings (Unicode Consortium, TUS Annex 15) Unicode Normalization Forms