Automagical conversion notes


Automagical Conversion

These notes describe one way of converting Word documents to a well-marked up TEI XML corpus.

You will need the following tools:
  • A copy of Open Office, a well known open source replacement for Microsoft Office.
  • The TEI Open Office filters
  • Access to a suitable online tagger: we used the CLAWS Trial Tagger from the University of Lancaster
  • The XML tools used on this course
The TEI Open Office Filters are installed as follows:
  • Download the file teioop5.jar from http://www.tei-c.org/Software/teioo/
  • Open Open Office Writer and select XML Filter Settings from the Tools menu.
  • Click the Open Package button, navigate to the file teioop5.jar, and select it.
  • TEIP5 now appears as one of the available filter options.
Now proceed as follows:
  • Open any Word document using the Open command on the File menu of Open Office
  • Select Save As from the File menu.
  • Scroll down the list of available File types to TEI P5 (.xml) and press Save
  • Your document will now be saved as an XML file

If you look at the document in your XML editor, you will probably spot some tagging you'd like to improve and some data you weren't expecting to see in the TEI header. But it's a start! The document can also be automatically tagged by CLAWS...

Good luck!