<?tei:SOF wsdchar.tag ?>
<!-- Revisions:                                               -->
<!-- 93-09-20 : MSM : correct datatype of class               -->
<!-- $Id: wsdchar.tag,v 1.3 93/09/22 19:27:31 msmcq Exp $ -->
<!-- $Log:	wsdchar.tag,v $
# Revision 1.3  93/09/22  19:27:31  msmcq
# changes as of publication of WD
# 
# Revision 1.2  1993/09/17  17:00:15  lou
# changed descs for consistency
#
# Revision 1.1  1993/08/03  02:27:04  msmcq
# tagdoc files:  resynching uicvm and onions
# -->
<!-- ********************************************** character -->
<tagDoc id=WSDCHAR usage=opt>
<gi>character</gi>
<desc>defines one unit in a writing system, supplementing or overriding
information provided in the base coded character sets, writing system
declarations, and entity sets.
<attList>
<!-- .................................................. class -->
<attDef usage=req>
<attName>class
<desc>describes the function of the character using a prescribed
classification.
<datatype>(lexical | punc | lexpunc | digit
| space | DL | LD | dia | joiner | other)
<valList type=closed>
<val>lexical <desc>character is used in writing words (lexical items) of
the language (includes members of syllabaries and ideographic systems,
as well as composite letter-plus-diacritic combinations)
<val>punc    <desc>character is a punctuation mark which does not appear
within lexical items
<val>lexpunc <desc>character can appear as a normal punctuation
mark, but can also appear within a lexical item (and should usually,
when occurring between two lexical characters, be treated as
lexical---in English, hyphen and apostrophe are typically treated as
members of this class)
<val>digit   <desc>character is
an Arabic decimal numeral (0, 1, ... 9) (does not
include superscript numbers, circled numbers, numeric dingbats, etc.)
<val>space   <desc>character represents  some form of white space
(space character, horizontal or vertical tab, newline, etc.)
<val>dl      <desc>character is a diacritic applying to the following
lexical character
<val>ld      <desc>character is a diacritic applying to the preceding
lexical character
<val>dia     <desc>character is a diacritic which is explicitly joined to
a lexical character by a joiner character
<val>joiner  <desc>character is used to join a diacritic to the lexical
character to which it applies (in some encoding schemes, the
backspace  control character may be used as a joiner; in others, a
graphic character is used for the same function)
<val>other   <desc>character does not fall into any of the other classes
(dingbats and other unusual characters fall here)
</valList>
<default>lexical
<eg><![ CDATA [
 
]]>
</eg>
 
<remarks><p>The classification of characters provided by this
attribute serves both informative and normative purposes:  it helps
identify the character being described, and the classification is used
to define the meaning of the special character-class codes in the TEI
extended pointer syntax described in chapter <xref target=SA>.
 
</remarks>
</attDef>
</attList>
<exemplum><eg><![ CDATA [
 
]]>
</eg></exemplum>
 
<remarks><p>The notion of <soCalled>characters</soCalled> as units in a
writing system is widely spread, but not consistently defined; the
<gi>character</gi> element should be used to identify whatever units the
encoder wishes to distinguish as the meaningfully distinct graphic units
of the writing system.  In most cases, these will correspond to the
units of coded character sets, but that this is not a requirement:
a-umlaut, for example, may be treated as one character or two, depending
on the user's preference, regardless of how the coded character set in
use treats it.  In most cases, also, the units distinguished by the
<gi>character</gi> element will be the <soCalled>graphemic</soCalled>
units of the writing system in question; however, since experts disagree
on whether items like umlaut (let alone a given set of Chinese
characters with regional variations in China, Korea, and Japan) are best
treated as distinct graphemes or not, the association of
<gi>character</gi> elements with the graphemes of a writing system
provides at most a heuristic device for making reasonable decisions,
rather than a definitive unambiguous test.
 
<p>Different forms of the same <soCalled>character</soCalled> may be
distinguished for whatever reason, as in the three-R example of chapter
<xref target=CH>.  In this case the different letter forms are
distinguished by documenting them in different <gi>form</gi> elements;
the fact that the different letter shapes do not make a lexical
difference in the text may be expressed by grouping all three letter
forms under the same <gi>character</gi> element.  (Alternatively, the
three forms may be treated as three distinct characters, for convenience
or for whatever reason, by defining a distinct <gi>character</gi>
element for each.)
 
</remarks>
<part type=aux name=wsd>
<classes>
<dataDesc>May contain one or more description elements (optional), a
series of one or more <gi>form</gi> elements identifying different forms
of the character, and an optional series of notes.
<elemDecl>- O  (desc*, form+, note*)
</elemDecl>
<xref target=WDCSEX>
</tagDoc>
<?tei:EOF wsdchar.tag ?>
