Lou, Sebastian, and Laurent met at AFNor in Paris on 16 October. We discussed
some specific details about the proposed ODD-NG, the likely outputs
from the TEI-ISO joint activity on Feature Structures, what should be
done about the terminology chapter, and also Laurent's plans as leader
of current ISO/TC 37/SC 4 activities, all which turned out to have some interesting
synergies.
ISO has decided that there should be a single repository for
linguistic-related applications, which would collate and register data
categories of various kinds, by analogy with existing standards for
registration of country and language names. Current work in ISO/TC 37/SC 4
relates to the model to be used for representing information within
this repository and is thus terminological, in the same way that ODD
and FS are, being concerned with the use of identifiers, definitions, notes,
etc. together with names and mappings into different languages.
The repository would be concept-based rather than name-based, however;
representation and realization were secondary issues.
Laurent suggested that where a data category identified by some tagdoc
(e.g. an element, a class, a data value), also existed in the ISO
registry, there should be an explicit link. We agreed that this was
highly desirable, and that the old equiv element should be
repurposed for this. The new version would be used to specify
equivalences between the categories defined in the ODD and those
defined in other standards, e.g. the data registry for linguistic
data, ISO 11179 for metadata, etc.
The new data category registry would include metadata vocabulary from
OLAC and IMDI as well as the full morphosyntactic categorization
inherited from EAGLES and Multext. Laurent's view was that it was
unrealistic to attempt to define a full linguistic ontology, parts of
one might be built up incrementally in this way. He agreed to make
available a copy of the current draft data category specification.
We discussed the TEI Terminology recommendations, and agreed that
although the current TEI recommendations were now outdated,
terminology as such was an important part of the TEI intellectual
landscape which should not be abandoned. We thought that we should
find a way of embeddeding TBX conformant terminological descriptions
into a TEI termEntry, since TBC is conformant to the model
developed in ISO 16642, which is derived from TMF, the Termonological
Markup Framework which developed from the original TEI work. The TEI
model would specify where such descriptions fitted within a TEI
document, but not what their contents were; as such, this was
comparable with ways of embedding objects from other vocabularies,
e.g. SVG. We agreed that someone should check the ISO recommendations
against the current TEI model, and produce a revised draft of the
chapter but not who or by when.
Reviewing the likely timetable for the joint ISO/TEI work on
feature structures, we thought it would be worth trying to use the FS
specification as a testbed for ODD-NG, with a view to presenting some
preliminary version at the FS meeting in Nancy next month, and a more
stable one in time for the FS meeting in Jeju in Feb 2004. Laurent
felt that there was some potential for using TEI to draft technical
documentation in the ISO context.
Turning to ODD-NG itself we agreed that there was an urgent need for simpler
documentation, which could be used by those wishing to use it as a
means of describing new TEI customizations and also agreed
some specific changes to the vocabulary itself:
- rather than use CDATA marked sections for examples, we should
use the existing exemplum tag with a content model of ANY:
this implies that all gis used must be unique within the TEI name
space, and that ODD-NG can only be validated against a full TEI dtd
- simplify the content model for valList to contain only
valItem elements whose n attribute carries the
value being described, and whose content is the gloss, rather than the
current series of val and desc pairs
- remove the part element, if possible: membership of an
element in a particular module should not be hard-wired
- remove the dataDesc element, if possible: it is redundant
- consider removing the first if there is more than one
ptr at the end of a doc element: the
canonical reference for this thing should be generated from the point at which the object
concerned is declared, not in its documentation
- there should be a single wrapper element for the
RNG specifications defining the content model (content? model?)
- replace the specific naming elements gi,
attName, entName, class with the generic
ident element, since the thing being named is implicit in the
parent element.
- rename the name element within entDoc, classDoc, and
tagDoc as gloss and remove rs from content
model: the name of the thing is given by its ident; the
content of this element simply expands that name where its meaning is
not obvious.
- rename dtdRef and dtdFrag to chunkRef and chunk respectively;
similarly rename entDoc, entDecl as
patternDoc and patternDecl: the old names are too SGML-DTD-specific
- disallow classes element
with null name attribute
- disallow empty attList element (ie must have
attDef children)
- ideally, content models should refer only to element classes, not to
individual elements
- references to objects documented in ODD (tags, classes, etc.)
should all be made by pointing to the ident element rather
than by using the ID/IDREF mechanism
- add an (optional repeatable) equiv element wherever an
ident can occur: this can reference a concept in the
ISO data repository or elsewhere; or it might contain an Xpath to
locate some equivalent markup construct elsewhere.