TEI TF on SGML to XML Migration: Tools Page Syd Bauman NEH

For TEI website

No source; this XML file is the original.

13 December 2007Chris Ruotolo Updated and converted to P5 14 July 2004Lou Burnard Added link to Windows version of OpenSp 23 June 2004Syd Bauman Added tei2tei.xsl, convert.bat, wwp-store_sgml2xml, and xmlify (including prettyprint) per CR. Note that we don't actually seem to have convert.bat, so no link. 24 January 2003Syd Bauman Fixed commandline usage example of osx per Jessica Hekman. 17 January 2003Syd Bauman Minor improvements: credit to David Sewell, corrected release notes, etc. 17 January 2003Lou Burnard Vast improvements 16 January 2003Syd Bauman Created with only OpenSP 1.5 for Mac OS X.

This page provides pointers to the tools recommended by the task force.

OpenSP

OpenSP (based on SP by James Clark, itself based on SGMLS by James Clark, which was based on ARCSGML by Charles Goldfarb) is maintained by the OpenJade project. It contains a number of related utilities: an SGML or XML parser, a normaliser and, of particular interest to this group, a utility for converting SGML documents to XML. This utility, called osx has recently been enhanced by Jessica Hekman to include some features of particular usefulness to the task of TEI legacy conversion. The software is distributed in source form or as Windows binaries from Source Forge.

tei2tei.xsl

An XSLT stylesheet written by Sebastian Rahtz, tei2tei.xsl is specifically designed to clean up the results of an SGML to XML transformation that was performed with sx/osx. It transforms TEI element names into their proper mixed case and removes attributes with default values. It requires that the DOCTYPE declaration and DTD subset be replaced by hand.

convert.bat

A sample Unix batch script that uses sx and the Saxon XSLT processor to transform SGML documents into XML. The sed command preserves entity references through sx processing (note that the current version of osx provides command-line options to control entity handling). DOCTYPE declarations and the DTD subset must be replaced by hand.

wwp-store_sgml2xml.perl

A Perl script provided by Syd Baumann for converting files that conform to the Brown University Women Writer's Project SGML DTD to XML. While not intended for general purpose use, this program may work well in certain circumstances. Be sure to read the known bugs and limitations sections of the header comment.

xmlify

This shell script provided by Lou Burnard is for converting the SGML files from the British National Corpus. This script runs files through osx (preserving internal and external entity references) and an XSLT transformation (pretty printing and replacement of character entity references with character number references).