Automagical conversion notes


Automagical Conversion

These notes describe one way of converting Word documents to a well-marked up TEI XML corpus.

You will need the following tools:
The TEI Open Office Filters are installed as follows:
  • Download the file teioop5.jar from http://www.tei-c.org/Software/teioo/
  • Open Open Office Writer and select XML Filter Settings from the Tools menu.
  • Click the Open Package button, navigate to the file teioop5.jar, and select it.
  • TEIP5 now appears as one of the available filter options.
Now proceed as follows:
  • Open any Word document using the Open command on the File menu of Open Office
  • Select Save As from the File menu.
  • Scroll down the list of available File types to TEI P5 (.xml) and press Save
  • Your document will now be saved as an XML file

If you look at the document in your XML editor, you will probably spot some tagging you'd like to improve and some data you weren't expecting to see in the TEI header. But it's a start! The document can also be automatically tagged by CLAWS...

Good luck!