.sr docfile = &sysfnam. ;.sr docversion = 'Draft';.im teigmlp1
.* Document proper begins.
.sr docdate '29 November 1991'
Interim report on Maths and Tables
<author>Paul Ellison
<docnum>TEI &docfile.
<date>&docdate.
</titlep>
<!>
<abstract>General Introduction
<p>
This work group was formed in November 1990 to develop DTD
fragments for Maths and for Tables for inclusion in the final
report of the TEI. The group has met once (in Knoxville, USA, in
January, 1991). Further meetings have not been possible, and
communication has been difficult because the other members of the
work group are not accessible via electronic mail. This report
is `interim', in that I have been unable to obtain the views of
my colleagues as to its contents.
<p>
Math(s) and Tables are inherently different, but have been linked
together in the TEI allocation of work group responsibilities as
the group of non-Humanities topics. As they are different, they
will be treated as such from here on in.
<p>
The recommendations included in this interim report will be
submitted to the other members of the work group for comment at
the earliest opportunity, and further development of the concepts
discussed will be undertaken.
This interim report will be distributed to my colleagues in the
work-group for comment and further development.
</abstract>
<!>
</frontm>
<!>
<body>
<h1>Mathematics
<h2>1.1 Introduction
<p>
Mathematicians have, over the centuries, developed a variety of
techniques for representing on paper the abstract concepts that
they have `invented'. These representations have involved the
use of existing characters used in familiar and unfamiliar ways,
the invention of new characters (sometimes for one idea only,
which quickly dies), plus the use of all these characters in non-normal
positions in relation to some conceptual baseline or
reading direction, and in unusual orientations. The writing of
Mathematics was easy when all documents existed as manuscripts
only. Even when Maths was typeset in hot metal, life was not too
complex as the author and compositor could work together to
obtain the results the author wanted, and, very importantly, the
compositor had the full range of characters (`glyphs') available
to him in all the required point sizes. The advent of computers
and of early computer typesetting did much to destroy the art of
fine typography in general, and the typesetting of Maths in
particular. During the 1950's, 60's and 70's various computer
`languages' were developed for the processing of Mathematics.
Early attempts to develop systems for Mathematicians to be able
to use computers to typeset their Mathematical ideas where not
very successful, then along came TeX. To many (the author of
this report included), TeX provides an almost perfect method to
use a computer to represent mathematics for typesetting. Its
one flaw is that it is only capable of setting Math as required
by the American Mathematical Society --- many other countries have
modified their layout requirements and will accept AMS layout
rather than develop another (different) version of TeX. However,
one thing was assumed --- typesetting the representation of the
concepts has always been considered separately from writing
programs to compute answers to those ideas. Then, along came
SGML...
<h2>1.2 Maths and SGML
<p>
There are three main groups (other than the TEI) who have put, or
are putting, considerable thought into the repesentation of Maths
using SGML. They are:
<ol>
<li>Association of American Publishers (AAP) --- In 1986/87,
the AAP developed a Maths DTD aimed solely at using SGML
to typeset Maths. This DTD maintains the essential
hierarchical structure of the Mathematics, but only
considers the requirements for typesetting it. Not
unreasonable, considering the organisation that developed
the DTD. This system is not as thorough as TeX, it uses
`layout' concepts but does not provide as good a layout
as TeX, and it is not sufficient for the AMS. As of
November, 1991, the AAP have just set up a working party
to review this DTD.
<li>International Standards Organisation (ISO) --- As part of
the first edition of TR9573 (Techniques for using SGML)
there is a `clause' devoted to the use of SGML to
represent the structure of mathematics. This clause does
not do the complete task, but aims to show that SGML can
be used to represent the structure in a manner that can
be clearly understood. The DTD does not contain any
layout-specific information, but an intelligent composing
system would/should be able to compose the maths in any
way that is required by the specific publisher. This is
a non-trivial task, but would remove the restriction of
only doing it the AMS-way (as with TeX). There are no
publishing implementations of ISO-Math. As part of the
second edition of TR9573, there are Parts specifying
entity sets for use in different applications, as well as
a Part that will contain an expansion of the existing
clause 8. One of these sets (in Part ?), contains entity
declarations for the special characters used by
Mathematicians (whether `invented' by them or `stolen'
from alphabets or other disciplines). These entity sets
are generally accepted as `complete'. A full set of
shapes is held by the Association for Font Information
Interchange (AFII) in its registry. This registry is the
only one recognised by ISO.
<li>The Euromath Project --- This project was initiated about
5 years ago to provide a complete computing and
communications environment for academic mathematicans in
Europe. At the time, the concept of a complete
environment was ahead of itself, but it has been dogged
by bad luck, bad management, and the financial problems
of one of it's software contractors. The situation has
now changed. Fundamentally, the concept was/is to allow
Mathematicians to be able to key in their Mathematics
along with associated text, be able to use that Maths in
any of a variety of computer processes, for example,
typesetting, algebraic manipulation, computation,
communication with colleagues, and to be able to edit the
maths in a WYSIWYG manner. Although not yet complete,
the author has seen aspects of this project working. The
DTD for this project is not yet written, but it will be
based on structure rather than layout.
</ol>
<h2>1.3 Maths and the TEI
<p>
Math (or Maths) is an anathema to most people involved with the
Humanities, however there are occasions when they need to be able
to deal with the maths in texts they are working with, or at
least, cope with it! Those occasions are as follows:
<ol>
<li>When representing a manuscript --- including representing
any emendations to the mathematics made by the author.
This occasion is probably the most complex, because, as
well as recording emendations, it may be necessary to
represent the mathematical structure (the concept) as
well as the original layout. It is clear from discussion
with members of the manuscript working group that layout
is of considerably less importance than structure.
<li>When preparing a manuscript for traditional paper
publication --- a subset of 1. The only requirement is to
encode the layout information.
<li>When encoding information (texual and mathematical) for
electronic storage, interchange and dissemination --- it
may be necessary to include layout information, but the
Maths structure is paramount.
</ol>
<p>
Should the TEI take into account the work of the groups refered
to above? The answer must be an emphatic YES. The reasons are
simple --- re-inventing the wheel without taking into account the
work of other inventors would restrict the use of information
stored with TEI-conformant documents to TEI purposes only; and
re-inventing the wheel from scratch would take considerable time
that no-one has to spare. The benefits of using the developments
of others could be that Humanities scholars would be able to
access software provided for other academic disciplines to be
able to input the maths, edit it, and prepare it for publication
without undue difficulty; that publishers could process the
resulting files easily, and that encoded information would have a
place outside the confines of a TEI-only domain. The problems are
not trivial --- the TEI solution must provide structural
information, should provide layout information, and must provide
the ability to include notes and emendations as necessary. The
only existing DTD that provides a realistic starting point for
the TEI is that included in TR9573 clause 8, even though this DTD
does not provide the layout information for typesetting. This
DTD will need further development before being acceptable as a
long term solution. It requires more Mathematical constructs to
be added, and (probably) some limited layout constructs. The
development of these constructs is in the ISO work program for
ISO-IEC/JTC1/SC18/WG8, but does not have a very high priority.
<h2>1.4 DTD fragments
<p>
The previous section recommends that the TEI adopts the Maths DTD
from ISO TR9573, even though this DTD is not yet completed.
However, it is important to formalise the interface between the
Mathematics and the remainder of the body of the text/document.
Maths can appear in two places in the flow of a document, and, in
one of these situations, the maths can either be included as
single or multiple equations, making three categories of
mathematical base construct which will need to be represented:
<ol>
<li>In line --- simple mathematical constructs can appear `in
line' with the normal flow of the narrative. These
constructs appear singly and will not be numbered (when
typeset). Within the restriction of being fundamentally
`simple', they could contain the full range of primitive
mathematical constructs. It is recommended that a
generic indentifier of the form and name `inline.formula'
be used.
<li>Single constructs in `display' --- the maths is centred in
a `block' of it's own between relevent narrative. Very
often, when typeset, such constructs are sequence
numbered down the right-hand margin. They will contain
the full range of primitive constructs. It is
recommended that an identifier of the form and name
`display.formula' be used. The content models of in-line
formulae and display formulae must be identical. Display
formulae may require additional attributes to hold the
position, style and value of the reference number (this
assumes that existing typeset texts will require
encoding).
<li>Multiple constructs in display --- there may be limited
text between each construct/formula, but the intention of
the author has been to link these particular formulae.
It is recommended that an identifier of the form and name
`display.formula.group' be used. This GI will require
attributes to hold position, style, value, and sub-value
of the formulae reference numbers. The content model
will consist of repeated display.formula with optional
text between each formula. This is an extension of the
current TR9573 DTD, which will be included in any
development of this DTD.
</ol>
<p>
The TEI Guidelines relax the restriction on namlen that is set in
the concrete reference syntax of SGML from 8 to 128. As a result
it is possible to avoid clashes between existing TEI element
names, and those used in TR9573 by the simple device of prefixing
all names in the TR9573 Maths DTD with a suitable code when
adopted into the TEI Guidelines. Even though the majority of
formulae encoded will be Mathematical, it is important not to
make users believe that they are restricted to encoding just
Mathematics, hence, I would recommend that the prefix be `f.'.
<h1>Tables
<h2>2.1 Introduction
<p>
Considerable work on representing two-dimensional tables using
SGML has been done by various groups over the last few years.
The prime aim of all of these DTDs has been to provide sufficient
information to allow tabular material to be typeset. The
fundamental requirement of the TEI is to be able to represent the
content and structure of existing typeset or manuscript tables in
such a way that further study can be undertaken on the
information stored in the table. As with Mathematics described
above, the fundamental requirement is to encode the structure and
content of the table, and not the actual layout. The remaining
clauses in this section of the report assess the characteristics
of the widely available existing Tables DTDs with the view to
recommending one for use within the TEI Guidelines.
<h2>2.2 Available Table DTDs
<p>
There are four popular DTDs worth investigating. These have been
created by AAP (American Association of Publishers), DoD for
CALS, ISO TR9573, and Software Exoterica. The author of this
report does not believe that the layout adds any further
information --- it only helps the reader to ascertain which parts
of the information (rows, or columns) are related to which other
parts of the information --- for example, to separate the titles
from the row data, or to separate the stub lines from the
columns. Hence, the layout aspects of these DTDs are not
considered (and would be removed from any tag-set recommended for
inclusion in the TEI Guidelines).
<ol>
<li>AAP DTD --- This DTD has two sets of elements, one for
simple tables and one for 'complex' tables (although
still quite simple). Both sets of elements are
orientated towards layout (hence deal with two-
dimensional tables only) and contain additional
attributes to define, for example, widths and rules. The
complex table model allows for `spanned' columns, but not
for spanned rows. [Note: `spanned' indicates that the
contents of this row or column span two or more rows of
the basic table.] Each data element in the table matrix
can be another table; this will accomodate divided rows
and/or columns. The AAP is about to revise this DTD and
has formed a subcommittee which will report in February
1992.
<li>CALS DTD --- (to be done)
<li>ISO TR9573 --- The DTD fragment in Edition 1 of TR9573 is
extremely simple. It will not handle spanned rows nor
columns, but will allow tables within tables. It only
handles two-dimensional tables. The 2nd Edition of
TR9573 will include a separate Part for tables, which
will be a major enhancement of the original DTD fragment.
The time schedule for production of this part is ???
This extended DTD will take account of all the work being
done elsewhere.
<li>Software Exoterica --- This DTD is an extension of the AAP
DTD, and, importantly, is implemented in the Software
Exoterica SGML product XGML. It is one DTD fragment
for simple and complex tables. Complex tables are
accomodated by allowing nesting of tables within tables ---
this provides the facilities for spanned and divided
rows and columns, although simple spanned columns can be
handled by the simple DTD fragment. The content model
for a cell is `#PCDATA'. Attributes exist for the
definition of widths and depths, cell content alignment,
and rules.
</ol>
<p>
Currently, DTDs for Tables are the subject of an AAP committee
which is scheduled to report in February next year. As a result,
other individuals and groups are focussing their attention on
this subject. Notably, Arbortext has indicated that it will
implement in it's 'SGML-Publisher' software the results of the
AAP deliberations (currently, Arbortext supports either the CALS
tables fragment or the AAP fragment), and Soft-Quad have a tables
editor as part of Author/Editor. The most interesting
development may be produced by another US company, Avalanche.
At the recent SGML conference Ludo van Vooren (of Avalanche)
talked about tables as nothing more than <q>multi-dimensional
arrays<q>, where any part of the table can be accessed via sets of
co-ordinates, and information in the table can be manipulated.
One of van Vooren's final comments was that using a format-
approach to tables makes it much harder to `do' anything
significant with the data in the table. I could not agree more.
<h2>2.3 Requirements of the TEI
<p>
For the same reasons as with the DTD fragment for Maths, it is
important to adopt a DTD that will be generally acceptable to
other users. The users of the TEI DTD will have at least two
quite distinct additional requirements:
<ol>
<li>Encoding manuscripts with added material (eg. notes and
emendations) If modelled correctly these will be
additional to the whole DTD and should not influence the
individual DTD fragments.
<li>To analyse the tabular information (to regain information
'lost' in typesetting on to flat paper) A
format/layout based approach will not assist in this at
all.
</ol>
In short, none of the current widely available DTD fragments will
be of much use to the TEI.
<h2>2.4 DTD Fragments
<p>
It is not possible at this stage to specify a full model for a
suitably rich table DTD fragment for use by the TEI. However, as
discussed for Maths above, it is possible to define a top-level
element that would be used to introduce the whole table. Unlike
Maths, tables appear only in 'display', and, in addition, they
can be considered to be 'floating' or 'fixed'. Within the TEI
DTD, it is recommended that a single identifier be used for
simple and complex tables of the form and name 'table'. This
element would contain additional attributes indicating the number
of dimensions, and whether the table is 'fixed' or floating'.
Within the rest of the fragment it must be possible:
<sl>
<li>a) address individual cells by row/column, column/row, or via
multi-dimensional co-ordinates,
<li>b) handle spanned cells (in one or more dimensions)
<li>c) table within a cell (primarily to handle divided cells) -
again multi-dimensional, although such tables are impossible
to picture!
</sl>
Note: Headings can be handled separately or via cells within
the table.
<p>
The original aim was to recommend an existing DTD fragment for
use by the TEI. I have not been able to do this. However, I
believe that the development of ISO/TR 9573 will result in a
structure-based DTD fragment that would be suitable. To ensure
that this development occurs, I would recommend that the TEI
submits a suitable discussion paper.
</body>
<!>
</gdoc>