<?xml version="1.0" encoding="UTF-8"?>
<?oxygen RNGSchema="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/teilite.rnc" type="compact"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" rend="home">

    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>TEI: Migrate to the P5 Guidelines</title>
            </titleStmt>
            <editionStmt>
                <edition> </edition>
            </editionStmt>
            <publicationStmt>
                <authority>The Text Encoding Initiative</authority>
            </publicationStmt>
            <sourceDesc>
                <p>No source</p>
            </sourceDesc>
        </fileDesc>
        <profileDesc> </profileDesc>
        <revisionDesc>
            <change when="2008-04-24" who="CJR">Updated link to wiki</change>
            <change when="2007-04-22" who="CJR">Converted to P5</change>
        </revisionDesc>
    </teiHeader>
    <text>
        <body>
            <p>Because the TEI is constantly developing to support more advanced encoding and more
                complex data, at intervals TEI projects may need to migrate their data and systems
                forward into a new version of the TEI Guidelines. The last migration process
                accompanied the release of P4 in 2002, when the TEI changed its underlying
                representation from SGML to XML. With the release of P5 in November 2007, the Guidelines
                changed again. Some of the most significant changes are architectural: in P5 the
                Guidelines themselves are stored and written using a different technology and the
                TEI schemas are expressed not only as DTDs but also in the RELAX NG schema language.
                Some of the changes affect the vocabulary and constraints of the TEI encoding
                language: adding new elements, improving content models, and in some cases adding
                entire chapters covering new material. More detail on the changes is available at
                the <ref target="index.xml">P5 page</ref>. </p>
            <p>The information below is intended to answer some basic questions about migrating from
                P4 to P5. For more detailed information, please post questions to the TEI-L
                discussion list, or search the list archives.</p>

            <div>
                <head>When should you migrate to P5?</head>

                <p>If your current P4 system is working and you are happy with it, there is no rush
                    to migrate. The TEI Consortium currently plans to support P4 for another five years,
                    until November 2012. Conversely, as soon as some change in your environment (for
                    instance, a new version of your XML editor or XSLT processor) causes your P4
                    system to break, or if you are planning improvements or changes to your current
                    encoding environment, it is a good idea to consider migrating to P5 as an
                    alternative to fixing or changing the P4 system.</p>

                <p>While P4 will have formal support for 5 more years, it is likely that a
                    significant majority of TEI users will migrate long before then. Thus if you are
                    looking for answers to questions, or to use someone else's stylesheets, or to
                    hire programming assistance, it will probably be easier to find other users who
                    are familiar with P5 than P4.</p>
            </div>

            <div>
                <head>What's involved in migration?</head>
                <p>For a full-scale encoding operation there are several steps involved in
                    migration. Some of these may not apply to smaller-scale or individual users.</p>
                <list type="ordered">
                    <item>First, ascertain whether you want to migrate your current encoding system
                        exactly as it stands, or whether you want to make changes. Migration is a
                        good time to revisit your encoding system and assess whether it still fits
                        your needs. You may want to take advantage of the new features of P5 to
                        capture additional information, or there may be features that you no longer
                        find it useful to encode. </item>

                    <item>Next, you may need to migrate your DTD and any extensions. That is, you need
                        to have your current P4 DTD as a P5 schema. For TEI Lite projects this will
                        typically involve no more than an hour or two with Roma constraining the possible values
                        of the <att>type</att> attribute. For projects that have developed their own
                        TEI extensions, schema migration will involve creating a new ODD file (using
                        Roma or a similar tool) based on your .ent and .dtd extension files. This
                        process will probably take up to a few days of work, and will require some
                        familiarity with Roma and with the TEI extension and customization
                        mechanisms. While this process probably cannot be automated, it is likely
                        that those who have already done this will be quite willing to advise and
                        assist, and we anticipate that the TEI-L list will be a useful forum for
                        discussion on this issue.</item>

                    <item>Next, you will need to make changes to any work processes that are
                        specific to your P4 markup and DTDs, and adapt them to work with P5 markup
                        and schemas. Depending on your work flow, this may include changes to
                        stylesheets, automated pre- or post-processing, documentation, conversion
                        scripts, and other tools and systems. XML editing tools that work with DTDs
                        may not work (or may not work in the same way) with schemas and this may
                        necessitate some changes to your work flow. As above, this may be an
                        opportunity to rethink your work flow and take advantage of new tools and
                        systems. </item>

                    <item>The last practical step in migration is to migrate your XML data, and the
                        difficulty or ease of this process depends on the decisions you made at the
                        start concerning the nature of your intended P5 encoding. If you are simply
                        converting P4 markup into a P5 equivalent, the process is largely
                        automatable and not difficult. If you are planning to add markup or to
                        restructure the way you encode certain features, this may or may not be
                        automatable, and will depend on the nature of the new markup you intend.</item>

                    <item>Finally, we recommend that you share the results of your efforts, even if
                        only informally, by posting a report on the process to TEI-L or on your own
                        site. This knowledge may be of great use to others undertaking the same kind
                        of migration.</item>
                </list>
            </div>
            <div>
                <head>Migrating your instances</head>

                <p>The TEI Consortium expects that, for many projects, most steps involved in migrating your XML
                    data files from P4 to P5 will be automatable. Many simple documents can be
                    migrated by making the following changes:<list>
                        <item>Changing the root element from <gi>TEI.2</gi> to <gi>TEI</gi></item>
	<item>Adding the TEI namespace declaration to <gi>TEI</gi></item>
                        <item>Changing <gi>xref</gi> to <gi>ref</gi>, and <gi>xptr</gi> to
                            <gi>ptr</gi>, and changing the link attribute to point to URLs</item>
                        <item>Changing pointer values on existing <gi>ptr</gi> and <gi>ref</gi> elements from IDREFs to
		URLs; this usually means prefixing the existing value with a # character</item>
                        <item>Converting the encoding of <gi>abbr</gi>, <gi>orig</gi>, <gi>sic</gi>,
                            and their counterparts (<gi>expan</gi>, <gi>reg</gi>, and <gi>corr</gi>)
                            to use the P5 <gi>choice</gi> element</item>
                        <item>Changing <att>id</att> and <att>lang</att> to <att>xml:id</att> and
                                <att>xml:lang</att></item>
                        <item>Changing the <att>value</att> attribute of <gi>date</gi> to
                            <att>when</att></item>
                    </list>
                </p>
                <p>However, there are a few steps that will require human intervention, either
                    because they cannot be automated, or because no one has yet written a program to
                    do so. Some such areas are: <list type="unordered">
                        <item>The values of <att>lang</att> must be converted to BFC 47 format for
                            the values of <att>xml:lang</att>; this is not automatable in the global
                            sense, but typically will be automatable at the project level, if needed </item>
                        <item>Normalized dates (e.g., values of <att>value</att> of <gi>date</gi>)
                            that are not already in W3C or ISO 8601 formats will need to be changed.
                            In general this is automatable, but as there are so many possible
                            formats, TEI has no plans at present to supply a general-purpose
                            conversion tool. Project-specific tools will probably not be difficult
                            to write. </item>
                        <item> Although a proof-of-concept Perl script for converting extended
                            pointers into XPointers exists, it is non-trivial to install, and has
                            not been tested thoroughly.</item>
                    </list></p>

                <p>TEI has two efforts in place already to assist in instance migration. The first
                    is an  XSLT stylesheet, <ptr target="p4top5.xsl"/>, written by Sebastian Rahtz which
                    covers the simpler aspects of conversion in a single transformation.</p>

            	<p>In addition, the TEI has established a <ref target="http://www.tei-c.org/wiki/index.php/Category:P4toP5">space on the TEI wiki</ref>
                    dedicated to migration tools contributed by the TEI community. It includes a
                    collection of small XSLT stylesheets, each intended to transform a single
                    difference between P4 and P5. We welcome contributions of stylesheets to address
                    specific migration issues that projects encounter.</p>
            </div>
            <div>
                <head>Migrating P4 DTDs</head>

                <p>Migrating P4 DTDs and DTD extensions is a more complex process and depends greatly on the kinds of choices represented in the DTD. For P4 DTDs that are constructed simply by include specific tag sets (for instance, the tag set for Names and Dates, and the tag set for Simple Analytic Mechanisms) the process is quite simple. Using <ref target="../Customization/use_roma.xml">Roma</ref>, you can make a similar selection of modules, and generate a P5 schema that represents essentially the same set of elements as your original P4 DTD. There will be some differences, of course, because the P5 modules include changes and updates (some of which are described above). Once you have generated your new P5 schema and done an initial conversion on your document instances you should validate all of your files and see whether there are discrepancies that need to be address through further changes to your XML, or by customizing the P5 schema.</p>
                <p>If your P4 DTD included any customizations, the process is somewhat more involved. If the extensions consist simply of eliminating elements from the DTD, or specifying attribute values, you can make these changes through the Roma interface. It is particularly important that you develop value lists for any attributes that were expressed as CDATA in P4; these are now classed as the datatype data.enumerated and must have a restricted set of values. If your DTD extension included any new elements, you will need to write an element specification for these using the ODD language. </p>
            </div>

        </body>

    </text>
</TEI>