TEI: East Asian/Japanese SIG

Kiyonori Nagasaki and A. Charles Muller21 June 2016The Text Encoding Initiative

Manually encoded based on source in Microsoft Word format

The East Asian cultural region has a long and rich literary tradition that extends as far back as the first millennium BCE. While the basic writing system for most of the early and middle of East Asian literature was that of the Chinese ideograph (hanzi), the cultures on the periphery of the hanzi cultural sphere also began to develop their own native scripts, which were used both alone, as well as in mixed form with Chinese ideographs. For example, the Japanese syllabic writing system called kana began to develop soon after the turn of the first millennium CE, and the Korean hangeul system came into widespread use in the fifteenth century. Japanese kana includes a few variants, mainly hiragana (round cursive script), katakana (square cursive script), and hentaigana (nonstandard script). The third, which was popular in premodern Japanese printing is now nearly obsolete, but it was recently proposed for inclusion to ISO/IEC 10646 for usage in government documents and research. For historical reasons, Japanese documents embrace several writing systems: not only the above three kana systems but also parts of the Chinese writing systems and hybrid writings partially derived from Korean Peninsula before the spreading of the hangeul script.

On the other hand, Japanese researchers have recently engaged themselves in encoding Japanese and East Asian texts according to the TEI guidelines. However, while there is already a large amount of Japanese electronic texts on the Web, including over 10,000 public domain texts distributed by Aozora-Bunko (similar with Gutenberg project) and Buddhist texts including 100 million characters, Japanese textual researchers have often faced difficulties in various levels of TEI-XML encoding, such as the lack of appropriate elements and attributes, differences of text models, and management of sharing good practices of encoding.

To begin to deal with these kinds of problems, we have formed the Special Interest Group for East Asian/Japanese. We expect the SIG to be a window between the consortium and Japanese practitioners to improve the situation. Therefore, we will address to form a general guideline for encoding Japanese text within TEI P5 Guidelines and clarify the lack of parts of textual models and elements in the guidelines in the first phase.

Furthermore, one of our primary goals would be to expand the SIG to cover East Asian Texts more broadly, including Chinese, Korean, Taiwanese, and Vietnamese practitioners.

The SIG also maintains a number of wiki pages.