<?xml version="1.0" encoding="UTF-8"?>
<!-- © TEI Consortium. Dual-licensed under CC-by and BSD2 licenses; see the file COPYING.txt for details. -->
<?xml-model href="https://jenkins.tei-c.org/job/TEIP5-dev/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
<div xmlns="http://www.tei-c.org/ns/1.0" type="div1" xml:id="WD" n="25">
<head>Characters, Glyphs, and Writing Modes</head>
<!-- to become : head>Characters, Glyphs, and Writing Systems</head-->

<p>Chapter <ptr target="#CH"/> introduced the fundamental notions of
language identification and character representation in an encoded TEI
document. In this chapter we discuss some additional issues relating
to the way that written language is represented in a TEI document. In
sections <ptr target="#WDNE"/> and <ptr target="#D25-20"/> we
introduce markup which may be used to represent and document
non-standard characters, that is, written symbols for which no
codepoint exists in Unicode. The same markup may be used to annotate
existing characters according to their visual or other properties, and
thus process them as distinct glyphs (see section <ptr target="#D25-30"/>), or to define new characters or glyphs (section
<ptr target="#D25-40"/>).  We also provide recommendations concerning
the Unicode Private Use Area (<ptr target="#D25-50"/>. Finally, in
section <ptr target="#WDWM"/> we
discuss ways of documenting the writing mode used in a source text,
that is, the directionality of the script, the orientation of
individual characters, and related questions. </p>



<div type="div2" xml:id="WDNE"><head>Is Your Journey Really Necessary?</head>
<p>Despite the availability of Unicode, text encoders still
sometimes find that the published repertoire of available
characters is inadequate to their needs. This is particularly the
case when dealing with ancient languages, for which encoding
standards do not yet exist, or where an encoder wishes to
represent variant forms of a character or <term>glyphs</term>.
The module defined by this chapter provides a mechanism to satisfy
that need, while retaining compatibility with standards.
</p>
<p>When encoders encounter some graphical unit in a document which is
to be represented electronically, the first issue to be resolved
should be <q>Is this really a different character?</q> To determine
whether a particular graphical unit <emph>is</emph> a character or
not, see <ptr target="#D4-42"/>. </p>
<p>If the unit is indeed determined to be a character, the next
question should be <q>Has this character been encoded already?</q>
In order to determine whether a character has been encoded,
encoders should follow the following steps:
<list rend="numbered">
  <item><p>Check the Unicode
  web site at <ptr target="https://www.unicode.org"/>, in particular the page <ref target="https://unicode.org/standard/where/">"Where is my
  Character?"</ref>, and the associated character code charts.
  Alternatively, users can check the latest published version of
  <title>The Unicode Standard</title> (<ref target="#CH-BIBL-3">Unicode Consortium (2006)</ref>), though the web site is
  often more up to date than the printed version, and should be
  checked for preference.</p>
<p>The pictures (<soCalled>glyphs</soCalled>) in the Unicode code
charts are only meant to be representative, not definitive. If a
specific form of an already encoded character is required for a
project, refer to the guidelines contained below under <ref target="#D25-30">Annotating Characters</ref>. Remember that your
encoded document may be rendered on a system which has different fonts
from yours: if the specific form of a character is important to you,
then you should document it. </p></item>
  <item>Check the Proposed New Characters web page (<ptr target="https://unicode.org/alloc/Pipeline.html"/>) to see whether
  the character is in line for approval.</item>

<item>Ask on the Unicode email list (<ptr target="https://www.unicode.org/consortium/distlist.html"/>) to
see whether  a proposal is pending, or to determine whether this
character is considered eligible for addition
to the Unicode Standard.  </item>

</list> </p>
<p>Since there are now over 130,000 characters in Unicode,
chances are good that what you need is already there, but it might
not be easy to find, since it might have a different name in
Unicode. Editors working with East Asian writing systems should consult
the <ref target="https://unicode.org/charts/unihan.html">Unihan Database</ref>.
Look again, this time at other sites, preferably ones which also provide searches based on scripts and languages. For example <ptr target="https://www.chise.org"/> (for CJK characters) or <ptr target="http://www.eki.ee/letter/"/> (for non-CJK characters) .
Take care, however, that all the
properties of what seems to be a relevant character are consistent
with those of the character you are looking for. For example, if
your character is definitely a digit, but the properties of the
best match you can find for it say that it is a letter, you may
have a character not yet defined in Unicode.</p>
<p>In general, it is advisable to avoid Unicode characters generally
described as presentation forms.<note place="bottom">Specifically,
characters in the Unicode blocks Alphabetic Presentation Forms, Arabic Presentation Forms-A, Arabic Presentation Forms-B, Letterlike Symbols,and Number Forms.</note> However, if the character you are looking for is being used in a notation (rather than as part of the orthography of a language) then it is quite acceptable to select characters from the Mathematical Operators block, provided that they have the appropriate properties (i.e. <code>So</code>: Symbol, Other; or <code>Sm</code>:
Symbol, Math).</p>
<p>An encoded character may be precomposed or it may be formed
from base characters and combining diacritical marks. Either will
suffice for a character to be "found" as an encoded character. If there are several possible Unicode characters to choose amongst,
it is good practice to consult other colleagues and practitioners to
see whether a consensus has emerged in favour of one or other of
them.</p>
<p>If, however, no suitable form of your character seems to exist, the
next question will be: <q>Does the graphical unit in question
represent a variant form of a known character, or does it represent a
completely unencoded character?</q> If the character is determined to
be missing from the Unicode Standard, it would be helpful to submit
the new character for inclusion (see <ptr target="https://unicode.org/pending/proposals.html"/>). For assistance
on writing or submitting a proposal, potential proposers can contact
the UC Berkeley Script Encoding Initiative (<ptr target="http://linguistics.berkeley.edu/sei/"/>).</p>
<p>These guidelines will help you proceed once you have
 identified a given graphical unit as either a variant or an
 unencoded character. Determining this will require knowledge of
 the contents of the document that you have. The first case will
 be called <emph>annotation</emph> of a character, while the
 second case will be called <emph>adding</emph> of a new
 character. How to handle graphical units that represent variants
 will be discussed below (<ptr target="#D25-30"/>)
 while the problem of representing new characters will be dealt
 with in section <ptr target="#D25-40"/>.</p>
<p>While there is some overlap between these requirements,
distinct specialized markup constructs have been created for each
of these cases. These constructs are presented in section <ptr target="#D25-20"/>
below.  </p>
   </div>
   <div type="div2" xml:id="D25-20">
<!--
<head>Markup constructs for representing non-standard characters</head>
<p>The gaiji module provides a mechanism to declare characters
additional to those available from the document character set. XML
allows for a document (or document component) to declare its
encoding, thus restricting the characters that can be encoded
directly within it without using numeric character references. For
example, an XML document which begins <code>&lt;?xml version="1.0"
encoding="iso-8859-1"?&gt;</code> can include non-ISO-8859-1
characters only by representing them as numeric character
references. In such a case, it might be convenient to declare as
additional characters some characters already defined by
the Unicode Standard. Generally speaking, however, the document character set
will be Unicode, and this mechanism will be needed only for
characters not defined by the Unicode Standard. </p>
-->
<head>Markup Constructs for Representation of Characters and Glyphs</head>
<p>An XML document can, in principle, contain any defined Unicode
character. The standard allows these characters to be represented
either directly, using an appropriate encoding (UTF-8 by default), or
indirectly by means of a <term>numeric character reference</term> (NCR), such as
<code><![CDATA[&#196;]]></code> (A-umlaut). The encoder can also restrict the
range of characters which are represented directly in a document (or
part of it) by adding a suitable encoding declaration. For example, if
a document begins with the declaration <code><![CDATA[<?xml
encoding="iso-8859-1"?>]]></code> any Unicode characters which are not
in the ISO-8859-1 character set must be represented by NCRs. </p>
 <p>The <mentioned>gaiji</mentioned> module defined by this
 chapter adds a further way of representing specific characters
 and glyphs in a document. (Gaiji is from Japanese <seg xml:lang="ja">外字</seg>, meaning <gloss>external
 characters</gloss>.) This allows the encoder to distinguish
 characters and glyphs which Unicode regards as identical, to add
 new nonstandard characters or glyphs, and to represent Unicode
 characters not available in the document encoding by an
 alternative means.</p>
<p>The mechanism provided here consists functionally of two parts:
 <list rend="numbered">
  <item>an element <gi>g</gi>, which serves as a proxy for new
   characters or glyphs</item>
  <item>elements <gi>char</gi> and <gi>glyph</gi>, providing information about such characters or glyphs; these elements are stored in the
   <gi>charDecl</gi> element in the header.</item>
 </list>
</p>
<p>When the gaiji module is included in a schema, the
<gi>charDecl</gi> element is added to the <ident type="class">model.encodingDescPart</ident>
class, and the <gi>g</gi> element is added to the phrase class. These
elements and their components are documented in the rest of this
section. </p>
<p>The Unicode standard defines properties for all the characters it
defines in the <ref target="https://unicode.org/ucd/">Unicode Character Database </ref>, knowledge of which is usually built into text processing systems. If the
character represented by the <gi>g</gi> element does not exist in Unicode at
all, its properties are not available. If the character represented is
an existing Unicode character, but is not available in the document
character set recognized by a given text processing system, it may
also be convenient to have access to its properties in the same way.
The <gi>char</gi> element makes it possible to store properties
for use by such applications in a standard way.</p>
<p> The list of attributes (properties) for characters is modelled on
those in the Unicode Character Database, which distinguishes
<term>normative</term> and <term>informative</term> character
properties. The Unicode Consortium also maintains a separate set of character properties specific to East Asian characters in the <ref target="https://www.unicode.org/charts/unihan.html">Unihan database</ref> which TEI fully supports. Lastly, non-Unicode properties may also be supplied.
Since the list of properties will vary with different versions of the
Unicode Standard, there may not be an exact correspondence between
them and the list of properties defined in these Guidelines.</p>
<!-- TODO Phase 5 Preceding sentence needs to mention update mechanisn #1805 -->
<p>Usage examples for these elements are given below at <ptr target="#D25-30"/> and <ptr target="#D25-40"/>.  The gaiji module
itself is formally defined in section <ptr target="#WSD-DEF"/>
below. It declares the following additional elements:
<specList>
<specDesc key="charDecl"/>
<specDesc key="g" atts="ref"/>
</specList>
The <gi>charDecl</gi> element is a member of the class <ident type="class">model.encodingDescPart</ident>, and thus becomes
available within <gi>encodingDesc</gi> when this module is included in
a schema.  The <gi>g</gi> element is the only member of the class
<ident type="class">model.gLike</ident>: this class is referenced as
an alternative to plain text in almost every element which contains
plain text, thus permitting the <gi>g</gi> element also to appear at
such places when this module is included in a schema.
</p>
<p>The following elements may appear within a <gi>charDecl</gi>
 element:
<specList>
<specDesc key="desc"/>
<specDesc key="char"/>
<specDesc key="glyph"/>
 </specList>
</p>
<p>The <gi>char</gi> and <gi>glyph</gi> elements have similar contents
and are used in similar ways, but their functions are different. The
<gi>char</gi> element is provided to define a character which is not
available in the current document character set, for whatever reason,
as stated above. The <gi>glyph</gi> element is used to annotate a
character that has already been defined somewhere (either in the
document character set, or through a <gi>char</gi> element) by
providing a specific glyph that shows how a character appeared in the
original document. This is necessary since Unicode code points refer
not to a single, specific glyph shape of a character, but rather to a
set of glyphs, any of which may be used to render the code point in
question; in some cases they can differ considerably.</p>
<p>The <gi>glyph</gi> element is provided for cases where the encoder
wants to specify a specific glyph (or family of glyphs) out of all
possible glyphs. Unfortunately, due to the way Unicode has been
defined, there are cases where several glyphs that logically belong
together have been given separate code points, especially in the blocks
defining East Asian characters. In such cases, <gi>glyph</gi> elements
can also be used to express the view that these apparently distinct
characters are to be regarded as instances of the same character (see
further <ptr target="#D25-30"/>).</p>
<p>The Unicode Standard recommends naming conventions which should be
followed strictly where the intention is to annotate an existing
Unicode character, and which may also be used as a model when
creating new names for characters or glyphs<note place="bottom">It should be noted, however, that this naming convention cannot meaningfully be applied to East Asian characters; the typical Unicode descriptions for these characters take the form <q>CJK Unified Ideograph <code>U+4E00</code></q>, where <code>U+4E00</code> is simply the Unicode code point value of the character in question.  In cases where no Unicode code point exists, there is little hope of finding a name that helps to identify the character. Names should therefore be constructed in a way meaningful to local practice, for example by using a reference number from a well-known character dictionary or a project-specific serial number.</note>:</p>
<p>Within both <gi>char</gi> and <gi>glyph</gi>, the following elements are available:
<specList>
<specDesc key="gloss"/>
<specDesc key="unicodeProp"/>
<specDesc key="unihanProp"/>
<specDesc key="localProp"/>
<specDesc key="desc"/>
<specDesc key="mapping"/>
<specDesc key="figure"/>
<specDesc key="note"/>
</specList>
</p>

<p>Four of these elements (<gi>gloss</gi>, <gi>desc</gi>,
<gi>figure</gi>, and <gi>note</gi>) are defined by other TEI
modules, and their usage here is no different from their usage
elsewhere. The <gi>figure</gi> element, however, is used here only to
link to an image of the character or glyph under discussion, or to
contain a representation of it in SVG. The <gi>figure</gi> element may
contain more than one <gi>graphic</gi>
element, for example to provide images with different
resolution, or in different formats, or may itself be repeated. As
elsewhere, the <att>mimeType</att> attribute
of <gi>graphic</gi> should be used to specify
the format of the image.</p>
<p>The <gi>mapping</gi> element is similar to the standard TEI
<gi>equiv</gi> element. While the latter is used to express
correspondence relationships between TEI concepts or elements and
those in other systems or ontologies, the former is used to express
any kind of relationship between the character or glyph under
discussion and characters or glyphs defined elsewhere. It may contain
any Unicode character, or a <gi>g</gi> element linked to some other
<gi>char</gi> or <gi>glyph</gi> element, if, for example, the
intention is to express an association between two non-standard
characters. The type of association is indicated by the
<att>type</att> attribute, which may take such values as
<code>exact</code> for exact equivalences, <code>uppercase</code> for
uppercase equivalences, <code>lowercase</code> for lowercase
equivalences, <code>standard</code> for standardized forms, and
<code>simplified</code> for simplified characters, etc., as in the
following example: <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-20-egXML-xv" source="#NONE"><charDecl>
<char xml:id="aenl">
<localProp name="name" value="LATIN LETTER ENLARGED SMALL A"/>
<localProp name="entity" value="aenl"/>
<mapping type="standard">a</mapping>
</char>
</charDecl>
</egXML>
</p>
<p>The mapping element may also be used to represent a mapping of the
character or (more likely) glyph under discussion onto a character
from the private use area as in this example:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-20-egXML-oy" source="#NONE"><charDecl>
<glyph xml:id="z103">
<localProp name="name" value="LATIN LETTER Z WITH TWO STROKES"/>
<mapping type="standard">Z</mapping>
<mapping type="PUA">U+E304</mapping>
</glyph>
</charDecl>
</egXML>
</p>
<p>A more precise documentation of the properties of any character or
glyph may be supplied using one of the three <soCalled>property</soCalled> elements: <gi>localProp</gi>, <gi>unicodeProp</gi>, or <gi>unihanProp</gi>; these are described in the next section.</p>
<div type="div3" xml:id="ucsprops"><head>Character Properties</head>
<p>The Unicode Standard documents <soCalled>ideal</soCalled>
characters, defined by reference to a number of
<term>properties</term> (or attribute-value pairs) which they are said
to possess. For example, a lowercase letter is said to have the value
<code>Ll</code> for the property <code>General_Category</code>. The
Standard distinguishes between <term>normative</term> properties
(i.e. properties which form part of the definition of a given
character), and <term>informative</term> or <term>additional</term>
properties which are not normative. It also allows for the addition of
new properties, and (in some circumstances) alteration of the values
currently assigned to certain properties. When making such
modifications, great care should be taken not to override standard
informative properties for characters which already exist in the Unicode
Standard, as documented in <ref target="#CH-eg-02">Freytag (2006)</ref>.</p>
<!-- TODO phase 6 insert comment about validation of values #1805 -->
<p>The <gi>unicodeProp</gi>, <gi>unihanProp</gi>, and
<gi>localProp</gi> elements allow a TEI encoder to record information
about a character or glyph:
<specList>
  <specDesc key="unicodeProp" atts="name value"/>
  <specDesc key="unihanProp" atts="name value"/>
  <specDesc key="localProp" atts="name value"/>
</specList>
</p>
<p>Where the information concerned relates to a property which has
already been identified in the Unicode Standard, use of the
appropriate Unicode property name with <gi>unicodeProp</gi> is
strongly encouraged. The use of available Unihan property names with
<gi>unihanProp</gi> is similarly encouraged. Validation rules for
property names <!-- and values --> according to Unicode conventions
are incorporated into the TEI schemas. Where neither of these
standards suffices use <gi>localProp</gi>.</p>
<!-- Phse 3-5 TODO add @version in here and override possible values for localProp #1805 -->
<p>The three elements for recording Unicode or locally defined properties belong to the <gi>att.gaijiProp</gi> class. This class defines two required attributes for record key-value pairs for character properties:
<!-- TODO phase 3: add version #1805 -->
<specList>
<specDesc key="att.gaijiProp" atts="name value"/>
</specList>
For each property, the encoder must supply both a
<att>name</att> and a <att>value</att>. In cases of boolean properties TEI requires an explict <val>true</val> or <val>false</val> <att>value</att> attribute:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="ucsprops-egXML-xw" source="#NONE">
  <unicodeProp name="Ideographic" value="false"/>
</egXML>
</p>
<p>For convenience, we list here some of the normative character
properties and their values. For full information, refer to chapter 4 of <title>The Unicode Standard</title>, or the online documentation of the Unicode Character Database.
<list type="gloss">
	 <label>General_Category</label> <item>The general
	  category (described in the Unicode Standard chapter 4 section 5) is an assignment to some
	  major classes and subclasses of characters.  Suggested
	  values for this property are listed here:
<table xml:id="ucsprops-table-re">
<row><cell><code>Lu</code></cell><cell>Letter, uppercase</cell></row>
<row><cell><code>Ll</code></cell><cell>Letter, lowercase</cell></row>
<row><cell><code>Lt</code></cell><cell>Letter, titlecase</cell></row>
<row><cell><code>Lm </code></cell><cell>Letter, modifier</cell></row>
<row><cell><code>Lo</code></cell><cell>Letter, other</cell></row>
<row><cell><code>Mn</code></cell><cell>Mark, nonspacing</cell></row>
<row><cell><code>Mc</code></cell><cell>Mark, spacing combining</cell></row>
<row><cell><code>Me</code></cell><cell>Mark, enclosing</cell></row>
<row><cell><code>Nd</code></cell><cell>Number, decimal digit</cell></row>
<row><cell><code>Nl</code></cell><cell>Number, letter</cell></row>
<row><cell><code>No</code></cell><cell>Number, other</cell></row>
<row><cell><code>Pc</code></cell><cell>Punctuation, connector</cell></row>
<row><cell><code>Pd</code></cell><cell>Punctuation, dash</cell></row>
<row><cell><code>Ps</code></cell><cell>Punctuation, open</cell></row>
<row><cell><code>Pe</code></cell><cell>Punctuation, close</cell></row>
<row><cell><code>Pi</code></cell><cell>Punctuation, initial quote</cell></row>
<row><cell><code>Pf</code></cell><cell>Punctuation, final quote</cell></row>
<row><cell><code>Po</code></cell><cell>Punctuation, other</cell></row>
<row><cell><code>Sm</code></cell><cell>Symbol, math</cell></row>
<row><cell><code>Sc</code></cell><cell>Symbol, currency</cell></row>
<row><cell><code>Sk</code></cell><cell>Symbol, modifier</cell></row>
<row><cell><code>So</code></cell><cell>Symbol, other</cell></row>
<row><cell><code>Zs</code></cell><cell>Separator, space</cell></row>
<row><cell><code>Zl</code></cell><cell>Separator, line</cell></row>
<row><cell><code>Zp</code></cell><cell>Separator, paragraph</cell></row>
<row><cell><code>Cc</code></cell><cell>Other, control</cell></row>
<row><cell><code>Cf</code></cell><cell>Other, format</cell></row>
<row><cell><code>Cs</code></cell><cell>Other, surrogate</cell></row>
<row><cell><code>Co</code></cell><cell>Other, private use</cell></row>
<row><cell><code>Cn</code></cell><cell>Other, not assigned</cell></row>
</table>
	 </item>
	 <label>Bidi_Class</label>
<item>This property applies to all Unicode characters. It governs the
application of the algorithm for bi-directional behaviour, as further
specified in Unicode Annex 9, <title>The Bidirectional
Algorithm</title>. The following 21 different values are currently
defined for this property:
<table xml:id="ucsprops-table-ag">
<row><cell><code>L</code></cell><cell>Left-to-Right</cell></row>
<row><cell><code>R</code></cell><cell>Right-to-Left</cell></row>
<row><cell><code>AL</code></cell><cell>Right-to-Left Arabic</cell></row>
<row><cell><code>EN</code></cell><cell>European Number</cell></row>
<row><cell><code>ES</code></cell><cell>European Number Separator</cell></row>
<row><cell><code>ET</code></cell><cell>European Number Terminator</cell></row>
<row><cell><code>AN</code></cell><cell>Arabic Number</cell></row>
<row><cell><code>CS</code></cell><cell>Common Number Separator</cell></row>
<row><cell><code>NSM</code></cell><cell>Nonspacing Mark</cell></row>
<row><cell><code>BN</code></cell><cell>Boundary Neutral</cell></row>
<row><cell><code>B</code></cell><cell>Paragraph Separator</cell></row>
<row><cell><code>S</code></cell><cell>Segment Separator</cell></row>
<row><cell><code>WS</code></cell><cell>Whitespace</cell></row>
<row><cell><code>ON</code></cell><cell>Other Neutrals</cell></row>
<row><cell><code>LRE</code></cell><cell>Left-to-Right Embedding</cell></row>
<row><cell><code>LRO</code></cell><cell>Left-to-Right Override</cell></row>
<row><cell><code>RLE</code></cell><cell>Right-to-Left Embedding</cell></row>
<row><cell><code>RLO</code></cell><cell>Right-to-Left Override</cell></row>
<row><cell><code>PDF</code></cell><cell>Pop Directional Format</cell></row>
<row><cell><code>LRI</code></cell><cell>Left-to-Right Isolate</cell></row>
<row><cell><code>RLI</code></cell><cell>Right-to-Left Isolate</cell></row>
<row><cell><code>FSI</code></cell><cell>First Strong Isolate</cell></row>
<row><cell><code>PDI</code></cell><cell>Pop Directional Isolate</cell></row>
</table></item>
	 <label>Canonical_Combining_Class</label> <item>This
	  property exists for characters that are not used
	  independently, but in combination with other characters, for
	  example the strokes making up CJK (Chinese, Japanese, and Korean) characters.  It
	  records a class for these characters, which is used to
	  determine how they interact typographically. The following
	  values are defined in  the Unicode Standard: (see <ref target="https://www.unicode.org/reports/tr44/#Canonical_Combining_Class_Values">Unicode
Character Database: Canonical Combining Class Values</ref>); these were taken from version 12.1:
<table xml:id="ucsprops-table-wa">
<row><cell><code>0</code></cell><cell>Spacing, split, enclosing, reordrant, and Tibetan subjoined </cell></row>
<row><cell><code>1</code></cell><cell>Overlays and interior </cell></row>
<row><cell><code>7</code></cell><cell>Nuktas </cell></row>
<row><cell><code>8</code></cell><cell>Hiragana/Katakana voicing marks </cell></row>
<row><cell><code>9</code></cell><cell>Viramas </cell></row>
<row><cell><code>10</code></cell><cell>Start of fixed position classes </cell></row>
<row><cell><code>199</code></cell><cell>End of fixed position classes </cell></row>
<row><cell><code>200</code></cell><cell>Below left attached </cell></row>
<row><cell><code>202</code></cell><cell>Below attached </cell></row>
<row><cell><code>204</code></cell><cell>Below right attached </cell></row>
<row><cell><code>208</code></cell><cell>Left attached (reordrant around single base character) </cell></row>
<row><cell><code>210</code></cell><cell>Right attached </cell></row>
<row><cell><code>212</code></cell><cell>Above left attached </cell></row>
<row><cell><code>214</code></cell><cell>Above attached </cell></row>
<row><cell><code>216</code></cell><cell>Above right attached </cell></row>
<row><cell><code>218</code></cell><cell>Below left </cell></row>
<row><cell><code>220</code></cell><cell>Below </cell></row>
<row><cell><code>222</code></cell><cell>Below right </cell></row>
<row><cell><code>224</code></cell><cell>Left (reordrant around single base character) </cell></row>
<row><cell><code>226</code></cell><cell>Right </cell></row>
<row><cell><code>228</code></cell><cell>Above left </cell></row>
<row><cell><code>230</code></cell><cell>Above </cell></row>
<row><cell><code>232</code></cell><cell>Above right </cell></row>
<row><cell><code>233</code></cell><cell>Double below </cell></row>
<row><cell><code>234</code></cell><cell>Double above </cell></row>
<row><cell><code>240</code></cell><cell>Below (iota subscript) </cell></row>
</table></item>
<label>Decomposition_Mapping</label>
  <item>This property is defined for characters,
	  which may be decomposed, for example to a canonical form
	  plus a typographic variation of some kind. For such characters the Unicode standard  specifies both
	  a decomposition type and a decomposition mapping
	  (i.e. another Unicode character to which this one may be
	  mapped in the way specified by the decomposition type). The
	  following types of mapping are defined in the Unicode Standard:
<table xml:id="ucsprops-table-ru">
<row><cell><code>font</code></cell><cell>A font variant (e.g. a blackletter form)</cell></row>
<row><cell><code>noBreak</code></cell><cell>A no-break version of a space or hyphen</cell></row>
<row><cell><code>initial</code></cell><cell>An initial presentation form (Arabic)</cell></row>
<row><cell><code>medial</code></cell><cell>A medial presentation form (Arabic)</cell></row>
<row><cell><code>final</code></cell><cell>A final presentation form (Arabic)</cell></row>
<row><cell><code>isolated</code></cell><cell>An isolated presentation form (Arabic)</cell></row>
<row><cell><code>circle</code></cell><cell>An encircled form</cell></row>
<row><cell><code>super</code></cell><cell>A superscript form</cell></row>
<row><cell><code>sub</code></cell><cell>A subscript form</cell></row>
<row><cell><code>vertical</code></cell><cell>A vertical layout presentation form</cell></row>
<row><cell><code>wide</code></cell><cell>A wide (or zenkaku) compatibility character</cell></row>
<row><cell><code>narrow</code></cell><cell>A narrow (or hankaku) compatibility character</cell></row>
<row><cell><code>small</code></cell><cell>A small variant form (CNS compatibility)</cell></row>
<row><cell><code>square</code></cell><cell>A CJK squared font variant</cell></row>
<row><cell><code>fraction</code></cell><cell>A vulgar fraction form</cell></row>
<row><cell><code>compat</code></cell><cell>Otherwise-unspecified compatibility character</cell></row>
</table>
</item>
	 <label>Numeric_Value</label> <item>This property applies for
	 any character which expresses any kind of numeric value. Its
	 value is the intended value in decimal notation.</item>
<label>mirrored</label> <item>The mirrored
	 character property is used to properly render characters such
	   as U+0028, <code>OPENING PARENTHESIS</code> independent of
	 the text direction: it has the value <code>Y</code>
(character is mirrored) or <code>N</code> (code is not mirrored).</item>
	</list></p>
<p>The Unicode Standard also defines a set of informative (but non-normative) properties for Unicode characters. If encoders wish to provide such properties, they should be included using the Unicode name. If a Unicode name exists for a given character this should always be used, however encoders may also supply locally defined names. To tag a Unicode name, use <tag>unicodeProp name="Name"</tag> (or <tag>unihanProp name="Name"</tag>). For names specified elsewhere or specified locally use <gi>localProp</gi>.</p>
</div>
   </div>
   <div type="div2" xml:id="D25-30">
<head>Annotating Characters</head>
<p>Annotation of a character becomes necessary when it is desired
to distinguish it on the basis of certain aspects (typically, its
graphical appearance) only.  In a manuscript, for example, where
distinctly different forms of the letter <mentioned>r</mentioned> can be recognized, it
might be useful to distinguish them for analytic purposes, quite
distinct from the need to provide an accurate representation of the
page. A digital facsimile, particularly one linked to a
transcribed and encoded version of the text, will always provide a
superior visual representation (for information on how to link a
digital facsimile to a transcribed text see <ptr target="#PHFAX"/>), but cannot be used to support arguments based
on the distribution of such different forms. Character annotation
as described here provides a solution to this problem.<note place="bottom"> It should be kept in mind that any kind of text
encoding is an abstraction and an interpretation of the text at
hand, which will not necessarily be useful in reproducing an exact
facsimile of the appearance of a manuscript.</note> </p>

<p>Assuming that we wish to distinguish the variant glyphs from the
standard representation for the character concerned, we will need to
define distinct <gi>glyph</gi> elements, one for each of the forms of
the letter we wish to distinguish: <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-30-egXML-ju" source="#NONE"><charDecl>
  <glyph xml:id="r1">
  <localProp name="name" value="LATIN SMALL LETTER R WITH ONE FUNNY STROKE"/>
  <localProp name="entity" value="r1"/>
 <figure><graphic url="r1img.png"/></figure>
 </glyph>
  <glyph xml:id="r2">
  <localProp name="name" value="LATIN SMALL LETTER R WITH TWO FUNNY STROKES"/>
  <localProp name="entity" value="r2"/>
  <figure><graphic url="r2img.png"/></figure>
 </glyph>
</charDecl> </egXML>
 With these definitions in place, occurrences of these two special
 <mentioned>r</mentioned>s in the text can be represented using the element <gi>g</gi>:
 <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-30-egXML-na" source="#NONE">
<p>Wo<g ref="#r1">r</g>ds in this
  manusc<g ref="#r2">r</g>ipt are sometimes
  written in a funny way.</p> </egXML></p>
<p>
 As can be seen in this example, the <gi>glyph</gi> element pointed
 to from the <gi>g</gi> element will be interpreted as an
 annotation on the content of the element <gi>g</gi>.  This mechanism
 can be used to represent common manuscript abbreviations or ligatures, as in the
 following examples:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-30-egXML-yz" source="#NONE"><p> ... <g ref="#Filig">Fi</g>lthy riches...</p>
<!-- in the charDecl -->
  <glyph xml:id="Filig">
   <localProp name="Name" value="LATIN UPPER F AND LATIN LOWER I LIGATURE"/>
   <figure><graphic url="Filig.png"/></figure>
 </glyph>
</egXML>
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-30-egXML-rd" source="#NONE"><p> ... <abbr><g ref="#per">per</g></abbr> ardua</p>
<!-- in the charDecl -->
  <glyph xml:id="per">
   <localProp name="Name" value="LATIN ABBREVIATION PER"/>
   <figure><graphic url="per.png"/></figure>
 </glyph>

</egXML>
(In fact the Unicode Standard does provide a character to represent the
  <code>Fi</code> ligature; the encoder may however prefer not to
  use it in order to simplify other text processing  operations,
  such as indexing).</p>
<p>With this
 markup in place, it will be possible to write programs to analyze
 the distribution of the different letters <mentioned>r</mentioned> as well as produce
 more <soCalled>faithful</soCalled> renderings of the original. It
 will also be possible to produce normalized versions by simply ignoring
 the annotation pointed to by the element <gi>g</gi>.  <!-- To make
 this kind of processing more efficient, the "type" attribute on
 <gi>g</gi> can be used, with an enumeration of different
 types and their usage documented in the TEIHeader.-->
</p>
<p>For brevity of encoding, it may be preferred to predefine
internal entities such as the following:
 <eg xml:space="preserve"><![CDATA[<!ENTITY r1 '<g ref="#r1">r</g>' >
<!ENTITY r2 '<g ref="#r2">r</g>' >]]></eg>
which would enable the same material to be encoded as follows:
 <eg xml:space="preserve"><![CDATA[<p>Wo&r1;ds in this manusc&r2;ipt are
  sometimes written in a funny way.</p> ]]></eg>
</p>
<p>The same technique may be used to represent particular
abbreviation marks as well as to represent other characters or
glyphs. For example, if we believe that the r-with-one-funny-stroke is
being used as an abbreviation for <code>receipt</code>, this might be
represented as follows:<eg><![CDATA[<abbr>&r1;</abbr>]]></eg></p>
<!-- should become a choice element some time --><p>Note however that this technique employs markup objects to
provide a link between a character in the document and some
annotation on that character. Therefore, it cannot be used in
places where such markup constructs are not allowed, notably in
attribute values.
</p>
<!-- TODO Phase 5 add alternative unihanProp mechanism to define the relationship #1805 -->
<p>Since the need to use these constructs to annotate or define
characters occurs frequently in Chinese, Korean, and Japanese
documents, here are some issues that are specific to these
documents. There are two slightly different versions of the
problem. In the first case, due to the way Unicode is defined,
there are occasions when more than one glyph is defined for a
character. In such an occasion, one might want to retain the
character as used, but add information in a way so that a
normalizer (for search or indexing operations) could take
advantage of this information. To achieve this, we simply define
within a <gi>charDecl</gi> element a <gi>glyph</gi> that has two
<gi>mapping</gi> elements, as shown here:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-30-egXML-rw" xml:lang="zh" source="#NONE">
  <charDecl>
	<glyph xml:id="u8aaa">
	  <mapping type="Unicode">說</mapping>
	  <mapping type="standard">説</mapping>
	</glyph>
  </charDecl>
</egXML>
The first of these <gi>mapping</gi>s, of type <val>Unicode</val>,
simply maps our glyph to the code point where Unicode defined it.
The other one, of type <val>standard</val>, encodes the fact that
in our view, this glyph is a variation of the standard character
given in the content of the element. We could then use this
<gi>glyph</gi> element's unique identifier <val>u8aaa</val> to
refer to it from within a text as follows.
  <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-30-egXML-jz" xml:lang="zh" source="#NONE">
  <g ref="#u8aaa">說</g>
</egXML>
</p>
<p>A slightly different, but related problem occurs when we have
multiple variants, none of which has been defined in Unicode. In
this case, we need to define one as a new character using
<gi>char</gi>, and the others as glyphs using <gi>glyph</gi>.
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-30-egXML-vj" source="#NONE">
  <charDecl>
	<char xml:id="newchar1">
	  <!-- more properties here -->
	</char>
	<glyph xml:id="varofnewchar1">
	  <!-- more properties here -->
	  <mapping type="standard"><g ref="#newchar1"/></mapping>
	</glyph>
  </charDecl>
</egXML>
The <gi>char</gi> defines a new character, while the
<gi>glyph</gi> element then defines a variant glyph of this newly
defined character. Additional properties should be specified in
order to make these both identifiable.</p>
  </div>

   <div type="div2" xml:id="D25-40">
<head>Adding New Characters</head>
<p>The creation of additional characters for use in text encoding
is quite similar to the annotation of existing characters. The
same element <gi>g</gi> is used to provide a link from the
character instance in the text to a character definition provided
within the <gi>charDecl</gi> element. This character definition
takes the form of a <gi>char</gi> element.  The element <gi>g</gi>
itself will usually be empty, but could contain a code point from
the Private Use Area (PUA) of the Unicode Standard, which is an
area set aside for the very purpose of privately adding new
characters to a document.  Recommendations on how to use such PUA
characters are given in the following section.</p>
<p>In some circumstances, it may be desirable to provide a single
precomposed form of a character that is encoded in Unicode only as a
sequence of code points. For example, in Medieval
Nordic material, a character looking like a lowercase letter Y with a
dot and an acute-accent above it may be encountered so frequently that
the encoder wishes to treat it as a single precomposed character with
one single coded value. In the
transcription concerned, the encoder enters this letter as
<code><![CDATA[&ydotacute;]]></code>, which  when the
transcription is processed can then be expanded in one of three ways,
depending on the mapping in force. The entity reference  might be
translated into the sequence of corresponding Unicode code points
or into some locally-defined PUA character
(say <code><![CDATA[&#xE0A4;]]></code>) for local
processing only. Both these options have disadvantages; the former
loses the fact that the sequence of composed characters is regarded as
a single object; the second is not reliably portable.
Therefore, the recommended
representation is to use the <gi>g</gi> element defined by
the module defined in this chapter: <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-40-egXML-xc" xml:lang="und" source="#NONE"><g ref="#ydotacute"/></egXML>. This makes it possible for the encoder to
provide useful documentation for the particular character or glyph so referenced:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-40-egXML-fh" source="#NONE"><char xml:id="ydotacute">
   <localProp name="Name" value="LATIN SMALL LETTER Y WITH DOT ABOVE AND    ACUTE"/>
   <localProp name="entity" value="ydotacute"/>
 <mapping type="composed">&amp;#x0079;&amp;#x0307;&amp;#x0301;</mapping>
 <mapping type="PUA">U+E0A4</mapping> </char></egXML> This
 definition specifies the mapping between this composed character
 and the individual Unicode-defined code points which make it
 up. It also supplies a single locally-defined property
 (<soCalled>entity</soCalled>) for the character concerned, the
 purpose of which is to supply a recommended character entity name
 for the character.
</p>
<p>The composition rules for ideographic characters typically require more complex rules than the <code><![CDATA[&ydotacute;]]></code> above. For these cases Unicode provides dedicated symbols to capture the composition in Ideographic Description Sequences (IDS). Encoders are strongly encouraged to provide IDS for each variant ideograph in the header component of the gaiji module to faciliated greater human and machine readability of rare or unencoded characters, as in the following example:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-40-egXML-db" source="#NONE">
<glyph xml:id="U507D-var">
  <!-- more properties here -->
  <mapping type="IDS">⿻人為</mapping>
  <mapping type="standard">偽</mapping>
</glyph>
</egXML>
The composition rules and further examples appear in <ref target="https://www.unicode.org/versions/Unicode11.0.0/ch18.pdf#G28626">Chapter 18.2: Ideographic Description Characters</ref> of the Unicode Standard. Editors should be aware that different sequences can accurately describe the same character. In the example, the character "人" (U+4EBA) could have been substituted with "亻" (U+4EBB). Local preferences about how sequences are constructed should be documented in the <gi>encodingDesc</gi> of the corresponding TEI header (see <ptr target="#HD5"/>). Additionally, a number of online services, such as <ref target="https://chise.org">CHISE</ref>, offer querying and retrieving characters via IDS, which facilitates a greater degree of stability across different applications.</p>
<p>Under certain circumstances, Chinese Han characters can be written
within a circle. Rather than considering this as simply an aspect of the rendering, an encoder may wish to treat such circled characters as entirely distinct derived characters. For a given character
(say that represented by the numeric-character reference <code><![CDATA[&#x4EBA;]]></code>)
the circled variant might conveniently be represented as
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-40-egXML-tw" xml:lang="und" source="#NONE"> <g ref="#U4EBA-circled"/></egXML>, which references a
definition such as the following:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="D25-40-egXML-xo" source="#NONE"><char xml:id="U4EBA-circled">
  <unicodeProp name="Decomposition_Mapping" value="cicle"/>
  <localProp name="Name" value="CIRCLED IDEOGRAPH 36"/>
  <localProp name="daikanwa" value="36"/>
  <mapping type="standard">
   &amp;#x4EBA;
  </mapping>
  <mapping type="PUA">
   &amp;#xE000;
  </mapping>
 </char></egXML></p>
<p>In this example, the <soCalled>circled ideograph</soCalled>
character has been defined with two mappings, and with two
properties. The two properties are the Unicode-defined
character-decomposition which specifies that this is a circled
character, using the appropriate terminology (see <ptr target="#ucsprops"/> above) and a locally defined property known as
<soCalled>daikanwa</soCalled>. The two mappings indicate firstly that the standard form of this character is the character <code><![CDATA[&#x4EBA;]]></code>, and secondly that the character used to represent this character locally is the PUA character  <code><![CDATA[&#xE000;]]></code>. For convenience of local processing this PUA character may in fact appear as content of the <gi>g</gi> element. In general, however, the <gi>g</gi> element
will be empty.</p>
   </div>
   <div type="div2" xml:id="D25-50">
<head>How to Use Code Points from the Private Use Area</head>
<p>The developers of the Unicode Standard have set aside an
area of the codespace for the private use of software vendors,
user groups, or individuals.  As of this writing (Unicode 12.1),
there are around 137,000 code points available in this area, which
should be enough for most needs. No code point assignments will be made
to this area by standard bodies and only some very basic default
properties have been assigned (which may be overridden where
necessary by the mechanism outlined in this chapter). Therefore,
unlike all other code points defined by the Unicode Standard, PUA code points should
<emph>not</emph> be used directly in documents intended for blind interchange.
<!--Instead of using PUA code points directly in the document content,
entity references should be used.  This will make it easier for
receiving parties to find out what PUA characters are used in a
document and where possible code point clashes with local use on
the receiving site might occur.--></p>
<p>In the two previous examples, we mentioned that the variant
characters concerned might well be assigned specific code points from
the PUA. This might, for example, facilitate the use of a particular
font which displays the desired character at this code point in the
local processing environment.  Since however this assignment would be
valid only on the local site, documents containing such code points are
unsuitable for blind interchange.  During the process of preparing
such documents for interchange, any PUA code points should be replaced by an appropriate use of the <gi>g</gi> element,  such as <tag>g ref="#xxxx"</tag>, thus associating the character required
with the documentation of it provided by the referenced  <gi>char</gi> element.  The PUA character
used during the preparation of the document might be recorded in the
<gi>char</gi> element, as shown in the example in <ptr target="#D25-40"/>, or retained as content of the <gi>g</gi> element. However, since there is no requirement that the same PUA
character be used to represent it at the receiving site, and since it
may well be the case that this other site has already made an
assignment of some other character to the original PUA code point, it is best practice to remove the locally-defined PUA character. It is to be expected that a further translation into the
local processing environment at the receiving site will be necessary
to handle such characters, during which variant letters can be
converted to hitherto unused code points on the basis of the
information provided in the <gi>char</gi> element.</p>
<p>This mechanism is rather weak in cases where DOM trees or
parsed XML fragments are exchanged, which may increasingly be the
case.  The best an application can do here is to treat any
occurrence of a PUA character only in the context of the local
document and use the properties provided through the <gi>char</gi>
element as a handle to the character in other contexts.  </p>
<p>In the fullness of time, a character may become standardized, and
thus assigned a specific code point outside the PUA. Documents which
have been encoded using the mechanism must at the least ensure that
this changed code point is recorded within the relevant <gi>char</gi>
element; it will however normally be simpler to remove the
<gi>char</gi> element and replace all occurrences of <gi>g</gi>
elements which reference it by occurrences of the newly coded
character. </p>
   </div>


<div type="div2" xml:id="WDWM">
   <head>
 Writing Modes</head>

<p>The scripts used for writing human languages vary not only in the
glyphs they use, but also in the way (or ways) that those glyphs are
arranged on the writing surface. For the majority of modern languages,
writing is arranged as a series of lines which are to be read from top
to bottom. Within each line, individual characters are frequently
presented from left to right (English, Russian, Greek), but there are
also several widely-used scripts which run right-to-left (Arabic,
Hebrew). Writing in which the lines of glyphs are presented vertically
and read from right to left is also often encountered, notably in
East Asian scripts (Sinitic characters, Japanese Kana, Korean
Hangul, Vietnamese chữ nôm). In many cases, a language normally uses
the same <term>writing mode</term> (we use this term to
refer to the orientation of individual glyphs within a line and the
order in which glyphs and lines should be read), but there are exceptions in which
the same language may appear in different modes, for example either
vertically or horizontally. Many East Asian scripts were traditionally
written from top to bottom within the line, with their lines sequenced
from right to left. Although modern Japanese, Chinese, and Korean are
often written horizontally, the traditional vertical writing mode is
still widely used. There are also comparatively rare cases of ancient
scripts written with lines running left to right, each line being read
top to bottom (Ancient Uighur, classical Mongolian and Manchu), or
scripts such as Ogham where the writing direction may start from the
bottom left and run around the edge of an inscribed object.</p>

<p>When different languages are combined, it is possible that
different writing modes will be needed: for example, in Hebrew text,
running right to left, sequences of Latin digits still run left to
right. When different writing modes are available for the same
language, it may be that different glyphs will be preferred when the
script is used in different modes. For example, when Japanese is
written horizontally, the Unicode character U+3001, the
<soCalled>ideographic comma</soCalled>, is used in preference to
Unicode character U+FE11, the vertical mode comma. This ensures that
the comma appears in the correct position relative to the surrounding
glyphs. Even for scripts which are usually written in exactly the same
way, different writing modes may be encountered in particular
contexts; for example when a language using Roman script is embedded
within vertically-organized Chinese text, it may sometimes be
displayed vertically and sometimes horizontally. The writing mode may
also vary in response to layout constraints such as those imposed by a
complex table, where column or row labels may be written vertically or
diagonally to make the most effective use of available space, just as
it may vary in response to the size and shape of the carrier in the
case of a monumental inscription. </p>

<p>For many, perhaps most, TEI documents there may be no need to
encode the writing mode explicitly, even in so-called "mixed mode"
texts containing passages written in languages which use different
writing modes. Modern printed texts in most European languages, for
instance, may be expected to use left-to-right/top-to-bottom
directionality; while Arabic or Hebrew texts are expected to run
right-to-left/top-to-bottom. In a TEI document, language and script
are explicitly stated in the markup using the attribute
<att>xml:lang</att>; this indication will usually imply a particular
default writing mode. Even where this attribute is not used, passages
in different scripts will use different Unicode characters, and will
thus imply a particular default writing mode. </p>

<p>Consider the case of an English text containing a few Arabic words:
<eg><![CDATA[ The Arabic term قلم رصاص means "pencil".]]></eg>
A correct TEI encoding might read as follows:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWM-egXML-ng" source="#NONE">
  <s xml:lang="en">The Arabic term
  <term xml:lang="ar">قلم رصاص</term> means "pencil".</s>
</egXML>
We might assume that it is the presence of the <att>xml:lang</att>
attribute with value <val>ar</val> that causes processing software to
display the Arabic from right to left, but in fact, this is not the
case. The order in which the Arabic characters appear when rendered
would be the same, even if the markup were not present:
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWM-egXML-yb" source="#NONE">
  <s>The Arabic term قلم رصاص means "pencil".</s>
</egXML>
This is because Arabic glyphs are always displayed right to left,
even when they appear within a left-to-right English sentence. Like
most other codepoints in the Unicode standard, they have a specific
directionality setting which helps any rendering software determine
how they should be ordered. The Latin glyph "a" has a strong
left-to-right bidirectionality setting, as do the digits 0 to 9; the
Hebrew א (alef) is strongly right-to-left. Of course, some glyphs
(common punctuation marks such as the period or comma for example)
have weak or neutral settings because they may appear in several
contexts.</p>

<p>The Unicode Bidirectional Algorithm (<ref target="#WDBIDI">Unicode
Consortium, 2017</ref>)
 defines a number of
rules enabling software to render sequences of characters which have
differing directionality properties in a predictable and reliable way,
using only those properties. <note place="foot"> Because this
algorithm may not always give the desired result, Unicode also
provides a set of "directional formatting characters" (<ptr target="https://www.unicode.org/reports/tr9/#Directional_Formatting_Characters"/>). These
additional codepoints can be used to signal to rendering software that
a specific directionality setting should be turned on or off. However,
in the case of documents encoded in XML, there is generally no need to use such
characters, and the W3C advises against it unless markup is unavailable. (<ptr target="https://www.w3.org/International/questions/qa-bidi-controls"/>)</note>. It
should be remembered however that individual sequences of characters
are always stored in a file in the order in which they should be read,
irrespective of the order in which the characters making up a sequence
should be displayed or rendered. For example, in a RTL language such
as Hebrew, the first character in a file will be that which is
displayed at the rightmost end of the first line of text.</p>

<p>An encoder wishing to document or to control the order in which
sequences of characters in a TEI document are displayed will usually
do so by segmenting the text into sequences presented in the desired
order and specifying an appropriate language code for each. In
situations where this approach may result in ambiguity or lack of
precision, or if the encoder wishes to record directional information
explicitly in their encoding, we recommend using the global @style
attribute to supply detail about the writing mode applicable to the
content of any element. The <att>style</att> attribute (discussed in
<ptr target="#STGAre"/>) permits use of any formatting language; for
these purposes however, we recommend use of CSS, which  includes a
Writing Modes module <note place="foot"> At the time of writing, this
W3C module has the status of a candidate recommendation: see further
<ptr target="#CSSWM"/> <!--
http://www.w3.org/TR/css-writing-modes-3/ -->
</note> which permits direct specification of a number of useful properties
associated with writing modes, notably <code>direction</code> (<code>ltr</code>
or <code>rtl</code>);  <code>writing-mode</code>
(<code>horizontal-tb</code>, <code>vertical-rl</code>, or <code>vertical-lr</code>);
and <code>text-orientation</code> (<code>mixed</code>, <code>upright</code>,
<code>sideways</code> ...)  <!-- | use-glyph-orientation<anchor xml:id="id_cite_ref-2"/>
  <ref target="#cite_note-2">[3]<note>The value "use-glyph-orientation" may be dropped from the CSS Writing Modes specification.</note></ref> -->
<!-- suppressing this because we dont discuss it and its likely to be dropped (LB) -->
as well as properties affecting the behaviour of the unicode-bidi (bidirectional) algorithm.
We discuss and exemplify how these properties may be used below.</p>

<p>The global TEI <att>style</att> attribute applies to the element on
which it is specified (and in most cases, its descendants). Rather
than specify it on every element, it will often be more efficient to
express sets of commonly-used styling rules as <gi>rendition</gi>
elements in the <gi>teiHeader</gi> and then point to them using the
global <att>rendition</att> attribute, as further discussed in <ptr target="#HD57-1"/>. Although the CSS specifications are mainly used to
provide instructions for software when rendering a digital text, they
also provide a useful means of describing the visual properties of a
pre-existing document in a formal and standardized way. </p>

<p>The next section presents some examples of how CSS can be used to
describe a variety of writing modes. A full description of the appearance
of a document will probably include many other properties of course. </p>
</div>

<div type="div2" xml:id="WDWMEG">
   <head>Examples of Different Writing Modes</head>
<p>The CSS recommendations provides several properties which can be used to encode aspects of the "writing mode". The most useful of these is the property "writing-mode" which may be used to specify a reading-order for both characters within a single line and lines within a single block of text. The property "text-orientation" may also used to indicate the orientation of individual characters with respect to the line, and the property "direction" to determine the reading order of characters within a line only. We give some examples of each below. </p>
   <div type="div3" xml:id="WDWMEG1">
  <head> Vertical Writing Modes</head>
   <p>The <code>writing-mode</code> property is particularly useful for languages
   which can be written in different writing modes, such as Chinese
   and Japanese. Its possible values include <code>horizontal-tb</code>,
   <code>vertical-rl</code> and <code>vertical-lr</code>. Each value has
   two components: <soCalled>horizontal</soCalled> or <soCalled>vertical</soCalled> specifies the inline
   writing direction, while the second component specifies the
   direction in which lines in a block, and blocks in a sequence are
   arranged: from top to bottom (as in most European languages, in
   which lines and paragraphs are arranged from top to bottom on a
   page), from right to left (as in the case of Japanese written vertically), or
   left-to-right (as in the case of Mongolian). </p>
   <p>The following example shows three versions of the same poem: first in
 Japanese, written top to bottom; next in <term>romaji</term> (Japanese in
 Latin script); and finally in an English translation. </p>
   <p>
 <figure xml:id="WDWMEG1-figure-pt">
<graphic width="250px" url="Images/basho_furu_ike_ya.png"/>
			<head>Taken from p.42 of <title>Haiku: Japanese Art and Poetry</title>. Judith Patt, Michiko Warkentyne (calligraphy) and Barry Till. 2010. </head>
   </figure>
  </p>
   <p/>
   <p>We might encode this as follows: </p>
 <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWMEG1-egXML-jv" source="#WD-BASHO"><div>
 <lg xml:lang="ja" style="writing-mode: vertical-rl">
 <l>古池や</l>
 <l>蛙</l>
 <l>飛び込む</l>
 <l>水の音</l>
 </lg>
 <lg xml:lang="ja-Latn" style="writing-mode: horizontal-tb">
 <l>furu ike ya</l>
 <l>kawazu tobikomu</l>
 <l>mizu no oto</l>
 </lg>
 <lg xml:lang="en">
 <l>Old pond,</l>
 <l>and a frog dives in—</l>
 <l>"Splash"!</l>
 </lg>
</div></egXML>
   <p>For the sake of simplicity, we have not attempted to capture in
   this encoding such aspects as the indenting of lines in the first
   Japanese version, or the central alignment of the other two
   versions, nor any other renditional features such as font weight or
   size etc. The Japanese transcription has <code><![CDATA[writing-mode:
   vertical-rl]]></code>, which is required because Japanese may be
   written either in this mode or horizontally. The transcription in
   romaji uses the attribute <att>xml:lang</att> to supply a value of
   <val>ja-Latn</val>, indicating Japanese written in Latin
   script. Its <att>style</att> attribute specifies a horizontal
   writing mode; this may seem superfluous, but vertically-written
   romaji is not unknown.</p>
   </div>
   <div type="div3" xml:id="WDWMEG2">
  <head>Vertical Text with Embedded Horizontal Text</head>

   <p>When Japanese is written vertically, the glyph orientation
   remains the same as when it is written horizontally. In other
   words, glyphs are not rotated (although as noted above some
   different glyphs may be used for some characters, in particular for
   punctuation which needs to be positioned differently in vertical
   and in horizontal text). However, it is very common for languages
   written vertically to have embedded runs of text from languages
   which are normally written horizontally. This raises the issue of
   the orientation of the glyphs from the horizontal language. Are
   they written upright, as they would normally appear in horizontal
   text runs, or are they rotated? Consider this fragment from a
   Japanese article about the Indonesian language, which takes the
   form of a glossary list: </p>
   <p>
 <figure xml:id="WDWMEG2-figure-mw">
<graphic width="500px" height="624px" url="Images/ja_vertical_indonesian_frag.jpg"/>
<head>Detail from p.62 of <title xml:lang="ja">インドネシア語". 崎山理. 1985. 外国語との対照  II. 講座日本語学 11.</title></head>
 </figure>
  </p>

   <p>The text-orientation property allows us to indicate whether or
 not glyphs are rotated. In the following example, we have indicated
 that the list uses a <code>vertical-rl</code> writing mode, but that the orientation
 of individual glyphs may vary: </p>

<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWMEG2-egXML-eu" source="#WD-VERT-IND"><list type="gloss" xml:lang="ja" style="writing-mode: vertical-rl; text-orientation: mixed">
 <label xml:lang="id">hampir</label>
 <item>「近い、ほとんど」</item>
 <label xml:lang="id">baru</label>
 <item>「新しい、ばかい」</item>
 <!-- ... -->
</list></egXML>
   <p>The rule <code>text-orientation: mixed</code> specifies that
   <quote>characters from horizontal-only scripts are set sideways,
   i.e. 90° clockwise from their standard orientation in horizontal
   text. Characters from vertical scripts are set with their intrinsic
   orientation</quote> (<ref target="https://www.w3.org/TR/css-writing-modes-3/#text-orientation">fantasai
   2014</ref>). Since the default value for
   <code>text-orientation</code> is <code>mixed</code>, this rule is
   not strictly required. However, if the Indonesian glyphs (which are
   roman characters) had been set vertically, like this:</p>
   <p>
 <figure xml:id="WDWMEG2-figure-yl">
   <graphic width="150px" url="Images/ja_vertical_indonesian_frag_rotated.jpg"/>
   <head>Fragment of previous image with Indonesian glyphs upright.</head>
 </figure>
   </p>
   <p>then an encoding like the following could be used to make this explicit: </p>
   <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWMEG2-egXML-we" source="#NONE">
 <list type="gloss" xml:lang="ja" style="writing-mode: vertical-rl; text-orientation: upright">
   <label xml:lang="id">hampir</label>
   <item>「近い、ほとんど」</item>
   <label xml:lang="id">baru</label>
   <item>「新しい、ばかい」</item>
   <!-- ... -->
 </list>
   </egXML>
   <p>The rule <code>text-orientation: upright</code> specifies that
   <quote>characters from horizontal-only scripts are rendered
   upright, i.e. in their standard horizontal orientation. Characters
   from vertical scripts are set with their intrinsic orientation and
   shaped normally</quote> (<ref target="https://www.w3.org/TR/css-writing-modes-3/#text-orientation">fantasai
   2014</ref>).</p>
   </div>
   <div type="div3" xml:id="WDWMEG3">
 <head>Vertical Orientation in Horizontal Scripts</head>
 <p>It is not unusual to see text from horizontal languages
 written vertically even where no vertically-written script is
 involved. This example is a fragment from a table of information
 about agricultural development on Vancouver Island, written in
 1855: </p>
 <p>
   <figure xml:id="WDWMEG3-figure-kj">
	 <graphic width="450px" url="Images/bcgenesis_co_305_06_00131v_table_extract.jpg"/>
	 <head>Enclosure with <title>Despatch to London</title> 10048, CO
	 305/6, p. 131v from <ptr target="https://bcgenesis.uvic.ca/V55116.html"/></head>
   </figure>
 </p>
 <p>Four of the subheading cells in this fragment contain English text written vertically,
 bottom-to-top, to conserve space on the page. To describe this sort of phenomenon,
 we can use the <code>text-orientation</code> property again: </p>

   <p><code>text-orientation: mixed | upright | sideways-right | sideways-left | sideways | use-glyph-orientation</code></p>

   <p>For full details on this property, we refer the reader to the CSS Writing Modes specification.
 For the present example, we will make use only of the <soCalled>sideways-left</soCalled> value,
 which <quote>causes text to be set as if in a horizontal layout, but rotated 90° counter-clockwise.</quote>
 We might encode the third of the four cells containing vertical text like this: </p>
 <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWMEG3-egXML-lr" source="#NONE">
   <cell style="writing-mode: vertical-lr; text-orientation: sideways-left">
	 <lb/>Cash Value
	 <lb/>of
	 <lb/>Farms
   </cell>
 </egXML>
   <p>The <code>writing-mode</code> property captures the fact that the script is written vertically, and
 its lines are to be read from left to right (so the line containing <quote>of</quote>
 is to the right of that containing <quote>Cash value</quote>), while the <code>text-orientation</code>
 value encodes the orientation (rotated 90° counter-clockwise). We might also add
 <code>text-align: center</code> to the style, to express the fact that the text is centrally-aligned.</p>
   </div>
   <div type="div3" xml:id="WDWMEG4">
  <head>Bottom-to-top Writing</head>
   <p>Of the rather small number of scripts which appear to be written
   bottom-to-top, perhaps the best-known is Ogham, an alphabet used
   mainly to write Archaic Irish. Ogham is typically found inscribed
   along the edge of a standing stone, starting at its base. The CSS Writing
   Modes specification does not explicitly distinguish between
   vertical scripts which are written  top-to-bottom and those which
   are written bottom-to-top. Instead, such bottom-to-top scripts are best treated
   as left-to-right horizontal scripts, oriented vertically because of
   the constraints of the medium on which they are inscribed. Such
   scripts are analogous to the vertical English text-runs in the
   table cells in the example above, and can be handled in exactly the
   same manner (<code><![CDATA[writing-mode: vertical-lr; text-orientation:
   sideways-left]]></code>). In cases where writing follows a curved path
   (such as Ogham running around the edge of a stone), a meticulous
   encoder might resort to the use of SVG to describe the path, rather
   than treating the phenomenon as a writing mode.</p>
   </div>
   <div type="div3" xml:id="WDWMEG5">
  <head>Mixed Horizontal Directionality</head>
  <!-- [Question MDH to LB: Why is this bit detached from the original horizontal text section above? Because he section above isn't specifically about horizontal texts only, though it uses one as an initial example] </p-->
   <p>Returning to our previous simple example </p>
  <eg><![CDATA[ The Arabic term قلم رصاص means "pencil".]]></eg>
   <p>we could use the direction property to make directionality explicit:</p>
   <p><code>direction: ltr | rtl</code></p>
   <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWMEG5-egXML-fl" source="#NONE">
 <s xml:lang="en" style="direction: ltr">The Arabic term
 <term xml:lang="ar" style="direction: rtl; unicode-bidi: embed">قلم رصاص</term> means "pencil".</s>
   </egXML>
   <p>The use of the <code>direction</code> property to record the observed directionality
 of the text is unambiguous, even though it is (as we noted above) superfluous.
 The use of the <code>unicode-bidi</code> property here may require some explanation.
 By default this property has the value <soCalled>normal</soCalled>, the effect of which in this
 context would be to ignore any value supplied for the direction property. The CSS Writing
 Modes specification stipulates that the direction property <quote>has no effect on bidi
   reordering when specified on inline boxes whose <code>unicode-bidi</code> property’s
   value is <soCalled>normal</soCalled>, because the element does not open an additional
   level of embedding with respect to the bidirectional algorithm.</quote>
  </p>

   <p>Mixed horizontal directionality is very common in languages such as Arabic
   and Hebrew, particularly when numbers (which are always given LTR)
   or phrases from LTR languages are embedded. It is not
   impossible, though quite unusual, for ambiguities
to arise in such situations, which may give rise to the
parts of a document being displayed in unexpected ways that do
not correspond to the natural reading order. A more detailed
   discussion of this issue from an HTML perspective is provided by a
   W3C Internationalization Working Group report <ref target="https://www.w3.org/International/articles/inline-bidi-markup/#where">Inline
   markup and bidirectional text in HTML</ref>. </p>


  <!--p>[Would it be helpful to have another example presenting ambiguity arising out of the use of a g element at the end of a text run?] [how might a <g> element introduce ambiguity? only if the glyph or character concerned is vague about its directionality surely] [(MDH) A <g> element would normally be used for a glyph which has no Unicode representation; therefore it has no directionality per the Unicode character database; therefore its effect would be potentially disruptive. Imagine a case where a rtl text run ends with a weak-directionality character such as a period, followed by a <g> for a glyph which the encoder knows should represent an rtl character, but which isn't in Unicode, followed by a strongly ltr character.] [If the encoder knows that the glyph or character concerned has a strongly ltr character then they should use the <charProp> element to document this fact within the <glyph> or <char> definition, as per <ptr target="http://www.tei-c.org/release/doc/tei-p5-doc/en/html/WD.html#ucsprops"/>. If they want a rendering agent to deal with the character properly, they are at liberty to put a strongly ltr character as content for the <g> ]</p-->
<!-- <p>The title is "مدخل إلى C++" in Arabic.</p>-->

  </div>
   <div type="div3">
  <head>
 Summary</head>
   <p>For most texts,  information about text directionality need not be explicitly
 encoded in a TEI text, either because it follows unambiguously from
 <att>xml:lang</att> values, or because it can be expected to be handled
 unequivocally by the Unicode Bidi Algorithm. Where it is considered important
 to encode such information, properties and values taken from the CSS Writing
 Modes module may be used by means of the global TEI <att>style</att> attribute
 (or using the TEI <gi>rendition</gi> element, linked with the <att>rendition</att>
 attribute). Most  phenomena can be well described in this way; of those which
 cannot, other approaches based on the CSS Transforms module are presented
 in the next section.</p>
   </div>
</div>
<div xml:id="WDWMTT">
   <head>
  Text Rotation</head>
   <p>In what follows, we examine a range of textual phenomena which
   in some ways appear very similar to those examined above, and even
   overlap with them. We can categorize these as text transformation
   features, and suggest some strategies for encoding them based on
   the properties detailed in the <ref target="#CSSTM">CSS Transforms (Fraser et al 2013)</ref> specification.
 This CSS module provides a complex array of properties, values and
 functions which can be used to rotate, skew, translate and otherwise
 transform textual and graphical objects. We can borrow this vocabulary
 in order to describe textual phenomena in a precise manner.</p>

   <p>We begin with a simple example of a rotational transform: </p>
   <p>
 <figure xml:id="WDWMTT-figure-no">
<graphic url="Images/rotation_on_z_axis.png"/>
 </figure>
  </p>
   <p>Here a block of text has been rotated around its z-axis. This is clearly
 not a <soCalled>writing mode</soCalled>; the writing mode for this text
 is horizontal, left to right. Furthermore, even if we wished to treat this
 as a writing mode, we could not do so, because there is no way to use
 writing modes properties to describe an text orientation which is angled
 at 45 degrees; no human languages are consistently written in this
 orientation. It is more appropriate to treat this as a rotational transformation.
 We can do this using two properties: <code>transform</code> and
 <code>transform-origin</code>. (Both of these properties have quite complex
 value sets, and we will not look at all of them here. See the
 <ref target="#CSSTM">specification</ref> for full details.)</p>

   <p>The <code>transform</code> property takes as its value one or more of the transform functions,
 one of which is the function <code>rotateZ()</code>:</p>

   <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWMTT-egXML-kx" source="#NONE"><ab style="transform:rotateZ(-45deg)">TEI-C.ORG</ab></egXML>

   <p>Any rotation must take place clockwise around an axis positioned relative
 to the element being rotated, and the <code>transform-origin</code> property
 can be used to specify the pivot point. By default, the value of <code>transform-origin</code>
 is <soCalled>50% 50%</soCalled>, the point at the centre of the element, but these
 values can be changed to reflect rotation around a different origin point.
 (The TEI <gi>zone</gi> element also bears an attribute <att>rotate</att> which can
 specify rotation in degrees around the z-axis, but it is not available for any other
 element.)</p>

   <p>A block of text may also be rotated about either of its other axes. For example,
 this shows rotation around the Y (vertical) axis: </p>
   <p>
 <figure xml:id="WDWMTT-figure-mu">
<graphic url="Images/rotation_on_y_axis.png"/>
 </figure>
  </p>
 <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWMTT-egXML-qk" source="#NONE"> <ab style="transform:rotateY(45deg)">TEI-C.ORG</ab></egXML>

   <p>These are obviously trivial examples, but similar features do appear in historical texts.
 George Herbert's <title level="m">The Temple</title> includes two stanzas headed
 <title level="a">Easter Wings</title> which are both normally printed in a rotated form
 so that they represent a pair of wings:</p>
   <p>
 <figure xml:id="WDWMTT-figure-ri">
<graphic url="Images/herbert_church_p35_sm.jpg" width="300px"/>
   <head>Page 35 of George Herbert's <title level="m">The Temple</title>
   (1633), from a copy in the Folger Library.</head>
 </figure>
  </p>

   <p>This could be encoded thus: </p>
   <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWMTT-egXML-vn" source="#NONE">
				   <lg style="transform:rotateZ(90deg)">
 <l>My tender age in ſorrow did beginne:</l>
 <l>And ſtill with ſickneſſes and ſhame</l>
 <!-- ... -->
 </lg></egXML>
   <p>We might also argue that this is in fact a vertical writing
   mode by supplying <code><![CDATA[writing-mode: vertical-rl;
   text-orientation: sideways-right]]></code> as the value for the
   <att>style</att> attribute in the preceding example.</p>

   <p>Rotation is also useful as a method of handling a true writing
   mode which is not covered by the CSS Writing Modes:
   <term>boustrophedon</term>. This is a writing mode common in
   inscriptions in Latin, Greek and other languages, in which
   alternate lines run from left to right and from right to left<note place="foot">The name is taken from the Greek βουστροφηδόν, meaning
   <q>ox-turning</q> from βοῦς (an ox) and στροφή (<q>turn</q>); that is,
   turning as an ox does when pulling a plough.</note>. Right-to-left
   lines in boustrophedon have another unexpected feature: their
   glyphs are reversed, so that these lines appear as <soCalled>mirror
   writing</soCalled>, as in the following ancient Greek inscription:
 <figure xml:id="WDWMTT-figure-tw">
<graphic width="592px" height="502px" url="Images/boustrophedon_small_J_NW_Epeiros_13_p03.jpg"/>
<head>Leaden plaque bearing an inquiry by Hermon from the oracular
precinct at Dodona. (L.H. Jeffery Archive)</head>
 </figure>
  </p>
   <p>This might be transcribed as follows (ignoring word boundaries for the moment): </p>
   <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:id="WDWMTT-egXML-se" source="#WD-BOUS"> <ab>
 <lb/>ΗΕΡΜΟΝΤΙΝA
 <lb/><seg style="rotateY(180deg)">ΚΑΘΕΟΝΠΟΤΘΕΜ</seg>
 <lb/>ΕΝΟΣΥΕΝΕΑϜ
 <lb/><seg style="rotateY(180deg)">ΟΙΥΕΝΟΙΤΙΕΚΚ</seg>
 <lb/>ΡΕΤΑΙΑΣΟΝΑ
 <lb/><seg style="rotateY(180deg)">ΣΙΜΟΣΟΤΤΑΙΕ</seg>
 <lb/>ΑΣΣΑΙ
 </ab></egXML>
  <p>The 180-degree rotation around the Y (vertical) axis here
 describes what is happening in the RTL lines in boustrophedon; the order of glyphs
 is reversed, and so is their individual orientation (in fact, we see them
 <soCalled>from the back</soCalled>, as it were). <gi>seg</gi> elements
 have been used here because these are clearly not <soCalled>lines</soCalled>
    in the sense of poetic lines; the text is continuous prose, and the division into separate lines is incidental.</p>

   <p>There are obviously some unsatisfactory aspects of this manner of encoding
     boustrophedon. In the inscription above, some words are split across two lines,
 so if we wished to tag both words and the right-to-left phenomena, one
 hierarchy would have to be privileged over the other. By using a transform
 function rather than a writing mode property, we are apparently suggesting
 that boustrophedon is not in fact a writing mode, whereas it clearly is. But
 the CSS Writing Modes specification does not provide support for boustrophedon,
 because it is a rather obscure historical phenomenon; using a rotational transform
 is one practical alternative. </p>

   </div>
   <div xml:id="WDCAV">
  <head>Caveat</head>

   <p>As with other parts of the CSS specification, the intended
   effect of CSS Transforms properties and values is defined with
   reference to a specific <ref target="https://www.w3.org/TR/CSS2/visuren.html">Visual formatting
   model</ref>; the language is designed to describe how an HTML
   document should be formatted. This is not, of course, the case for
   the TEI, which lacks any explicit processing or formatting model,
   and attempts to define objects as far as possible without
   consideration of their visual appearance. As long as the properties
   and values from the CSS Transforms module are used as a convenient,
   well-specified descriptive language to capture features of a text,
   without any expectation of using them directly and reliably for
   rendering, this is not particularly problematic. CSS provides a
   useful and well-defined vocabulary to describe many aspects of the
   appearance of source texts, benefitting particularly from the
   clarity of definition provided by the specification. However, if
   there is any expectation of using this information to render a text
   in a predictable and accurate way, it will be essential to provide
   enough styling information throughout the document hierarchy to
   resolve all ambiguities with regard to size, positioning, block
   status, etc. before any element undergoes a transform
   operation.</p>
</div>


<div type="div2" xml:id="WSD-DEF"><head>Formal Definition</head>
<p>The gaiji module described in this chapter makes available the following
components:
<moduleSpec xml:id="DWD" ident="gaiji">
  <idno type="FPI">Character and Glyph Documentation</idno>
  <desc xml:lang="en" versionDate="2006-09-13">Character and glyph documentation</desc>
  <desc xml:lang="fr" versionDate="2018-07-12">Représentation des caractères et des glyphes non standard</desc>
  <desc xml:lang="zh-TW" versionDate="2018-07-12">文字與字體說明</desc>
  <desc xml:lang="it" versionDate="2018-07-12">Documentazione di caratteri non standard e glifi</desc>
  <desc xml:lang="pt" versionDate="2018-07-12">Documentação dos carateres</desc>
  <desc xml:lang="ja" versionDate="2018-07-12">外字モジュール</desc>
</moduleSpec>

The selection and combination of modules to form a TEI schema is described in
<ptr target="#STIN"/>.
</p>
<specGrp>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/g.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/charDecl.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/char.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/glyph.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/localProp.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/mapping.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/unihanProp.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/unicodeProp.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/att.gaijiProp.xml"/>
</specGrp>
</div>
</div>