<?tei:SOF wsdform.tag ?>
<!-- $Id: wsdform.tag,v 1.1 93/08/03 02:27:04 msmcq Exp $ -->
<!-- $Log:	wsdform.tag,v $
# Revision 1.1  93/08/03  02:27:04  msmcq
# tagdoc files:  resynching uicvm and onions
#  -->
<!-- *************************************************** form -->
<tagDoc id=WSDFORM usage=mwa>
<gi>form</gi>
<name>letter form</name>
<desc>identifies one letter form taken by a particular character in a
writing system declaration.
<attList>
<!-- ................................................. string -->
<attDef usage=opt>
<attName>string
<desc>gives the byte string used to encode the letter form in the text.
<datatype>CDATA
<valDesc>any string of characters (often a single byte)
<default>#IMPLIED
<eg><![ CDATA [
 <form string='a/'>
    <desc>lowercase Greek alpha with acute accent</desc>
 </form>
]]>
</eg>
 
<remarks><p>If the character is encoded only using entity
references, then the value of <att>string</att> should be ''
(the empty string).
 
<p>In coded character sets which use character-set shifting (e.g. JIS
0208), the <att>string</att> attribute should typically contain the
required shift characters, in order to render the value unambiguous.  In
such a case, there is no expectation that every occurrence of the
character will be immediately preceded by the shift sequence; processing
software is responsible for understanding the shift mechanism and acting
accordingly.
 
<p>The same string value may not appear on more than one <gi>form</gi>
elements (except the empty string), unless each occurrence is
associated with a different coded character set.
 
</remarks>
</attDef>
<!-- ........................................... codedCharSet -->
<attDef usage=opt>
<attName>et
<name>coded character set
<desc>specifies which base coded character set the <att>string</att>
value occurs in.
<datatype>IDREF
<valDesc>a reference to the SGML identifier of a <gi>et</gi>
element in the current writing system declaration.
<default>#IMPLIED
<eg><![ CDATA [
 
]]>
</eg>
 
<remarks><p>If more than one <gi>et</gi> is specified as a
base component of the writing system declaration, then it is expected
that character-set shifting is in use, as described in ISO 2022 or some
equivalent.  In this case, each <gi>form</gi> element which has a value
for the <att>string</att> attribute should also identify, by means of
the <att>codedCharSet</att> attribute, which identifies which coded
character set actually contains the string in question.  Proper shifting
among character sets is the responsibility of the user.
 
</remarks>
</attDef>
<!-- .............................................. entityStd -->
<attDef usage=opt>
<attName>entityStd
<name>standard entity name
<desc>gives the name of one or more entities defined for this character
form in some standard entity set(s).
<datatype>ENTITIES
<valDesc>One or more valid SGML entity names declared in the document
type definition of the WSD; the entity must also be included in an
entity set mentioned in an <gi>entitySet</gi> declaration in the current
writing system declaration or in some base writing system referred to by
a <gi>baseWsd</gi> element.
<default>#IMPLIED
<eg><![ CDATA [
 <form entityStd='thorn'>
   <desc>lowercase Old English/Icelandic thorn</desc>
 <form>
]]>
</eg>
<remarks><p>If the same letter form is defined by more than
one public entity set, more than one value may appear in this
attribute.
 
<p>The same entity name may not appear in the <att>entityStd</att> or
<att>entityLoc</att> attributes of more than one <gi>form</gi> element.
 
</remarks>
</attDef>
<!-- .............................................. entityLoc -->
<attDef usage=opt>
<attName>entityLoc
<name>local entity name
<desc>gives one or more entity names used locally for this character
form.
<datatype>ENTITIES
<valDesc>One or more valid SGML entity names declared in the document
type definition of the WSD; the entity must also be included in an
entity set mentioned in an <gi>entitySet</gi> declaration in the current
writing system declaration or in some base writing system referred to by
a <gi>baseWsd</gi> element.
<default>#IMPLIED
<eg><![ CDATA [
 <form entityStd='thorn' entityLoc='t'>
   <desc>lowercase Old English/Icelandic thorn</desc>
   <note>The standard entity name is 'thorn'; the local entity 't'
         is used for brevity and legibility.</note>
 <form>
]]>
</eg>
<remarks>
<p>The same entity name may not appear in the <att>entityStd</att> or
<att>entityLoc</att> attributes of more than one <gi>form</gi> element.
</remarks>
</attDef>
<!-- .................................................. ucs-4 -->
<attDef usage=opt>
<attName>ucs-4
<name>universal-character-set code
<desc>gives the position of the character form in the thirty-two bit
<soCalled>universal character set</soCalled> defined by ISO 10646.
<datatype>CDATA
 
<valDesc>one or more sets of two or four two-digit hexadecimal numbers
giving a valid ISO 10646 code point for the character form; for
legibility the two-digit hexadecimal numbers should be separated by
hyphens.  If more than one UCS-4 code is associated with a given
character form, the two UCS-4 codes should be given separated by blanks.
If the character form is associated with a sequence of UCS-4 codes (e.g.
a base character followed by one or more non-spacing diacritics), then
the components of the sequence should be separated by '+'.
 
<default>#IMPLIED
<eg><![ CDATA [
 
]]>
</eg>
 
<remarks>
 
<p>The same UCS-4 code (or sequence) may not appear within more
than one <gi>character</gi> element within the writing system
declaration.  It may however appear on several forms of the same
character.
 
<p>Multiple UCS-4 codes can be given for a single character; this allows
sequences treated as distinct by ISO 10646 to be documented as referring
to a single <soCalled>character</soCalled> as defined by the WSD (e.g.
<q>lowercase a-umlaut</q> and <q>lowercase a</q> plus <q>umlaut</q>).
 
<p>If a single UCS-4 code is to be treated as relating to two distinct
<soCalled>characters</soCalled> as defined by the WSD (e.g. to reverse
the effects of Han unification on some character), then one of the
<gi>character</gi> elements should be associated with the UCS-4 code in
the normal way, and the others should call attention to the relevant
UCS-4 code by a comment in a <gi>note</gi> element.
 
</remarks>
</attDef>
<!-- ............................................... afiicode -->
<attDef usage=opt>
<attName>afiicode
<name>AFII code
<desc>gives one or more codes associated with this letter form by the
Association for Font Information Interchange.
<datatype>CDATA
<valDesc>any valid AFII identifier.
<default>#IMPLIED
<eg><![ CDATA [
 
]]>
</eg>
 
<remarks><p>The AFII tables are designed as an inventory of
<term>glyphs</term> (identifiably distinct shapes, leaving differences
of font design out of account---one character may be associated with
several glyphs, and each glyph with items in several different fonts).
Because the same glyph may be associated with more than one character
(in some fonts, for example, the lowercase letter L and the digit 1
share the same glyph), the value of <att>afiiCode</att> is used for
informational purposes only and need not be unique within the writing
system declaration.
 
</remarks>
</attDef>
</attList>
<exemplum><eg><![ CDATA [
 
]]>
</eg></exemplum>
 
<remarks><p>The <gi>form</gi> element documents one form of a character;
in most cases, there will be only one.  If more than one form is given,
in general, they are to be regarded as free variants of the character
unless otherwise specified in the notes.
 
<p>The distinction between <gi>character</gi> and <gi>form</gi> makes it
possible to distinguish, in an encoding, among different letter forms
(which may have historical, aesthetic, linguistic, or other
significance) without having to claim that the different forms
constitute different <soCalled>characters</soCalled> in any normal
sense.  (Using the technical terms occasionally encountered, the
<gi>form</gi> element can be used to record each <term>allograph</term>
of a given character or <term>grapheme</term>.)  The concepts of
<soCalled>character</soCalled> and <soCalled>letter form</soCalled>,
however, vary from analyst to analyst; the decision to treat a given set
of forms as a single character or as a set of characters is not always
obvious, and may require the application of considerable learning and
judgement.  The <gi>note</gi> element should be used to record the
reasoning behind any particularly difficult decision.
 
</remarks>
<part type=aux name=wsd>
<classes>
<dataDesc>May contain a series of description element, optionally one or
more figure elements showing the character form in question, and
optionally a series of notes.
<elemDecl>- O  (desc+, (figure | extFigure)*, note*)
</elemDecl>
<xref target=WDCSEX>
</tagDoc>
<?tei:EOF wsdform.tag ?>
