Overlapping Markup SIG
Minutes
23 October, 2004
Dot Porter
November 12, 2004
Contents
Seven in attendence.
Approach: At last year's meeting (see the
minutes), we had discussed creating a web site to
explain in some detail many different approaches to
overlapping markup. This year, we discussed some approaches
that are currently in use by those of us in the SIG: the use
of milestone elements and Just In Time Trees (JITTS). The
traditional problem with using milestone elements
extensively to deal with overlapping markup is that existing
support languages (XPath, XSLT) cannot deal with non-content
(that is, text between two milestone elements acting as the
beginning and end tags). However, Alex Dekhtyar and Emil
Iacob at the University of Kentucky have been working on an
extension of XPath (Extended XPath, or EXPath) that can
search overlapping encodings represented in a GODDAG. The
GODDAG can be stored in an XML file with milestones (plus a
set of DTDs, one per hierarchy) or in separate files: one
XML file per hierarchy. In fact, the storage method is not
important as long as there are parsers for GODDAG. The
GODDAG implementation provides DOM-like API which can be
used as well by an XML editor. They have begun working on an
extension of XSLT as well. Patrick Durusau also cited a
paper presented at the 2004 Extreme Markup Conference by
Steven DeRose, Markup
Overlap: A Review and a Horse. In this paper,
DeRose outlines a system of milestone elements similar to
that already implemeneted at the University of Kentucky,
which he calls clix (not to be confused with Constraint Language
in XML (CLIX)).
The SIG proposes to investigate the possibility of
implementing within TEI a system for dealing with
overlapping markup through a system of milestone elements
based on clix, JITTS, and the EXPath and EXSLT support being
developed at the University of Kentucky.
-
Provide several examples on the OM SIG website and
invite TEI users to comment and criticize the
approach.
-
Invite examples of overlapping markup from the user
community.
-
Finally, make a recommendation to the TEI editors
— either to look into making the milestone
approach an integrated part of P5 (if it appears to
handle most instances of OM), or not.
Examples
Example 1
<p><q who="Wilson" sID="001"/>The first thing that put us out was that advertisement.
Spaulding, he came down into the office just this day eight weeks with this very
paper in his hand, and he says:—</p>
<p><q who="Spaulding" sID="002"/>I wish to the Lord, Mr. Wilson,
that I was a red-headed man.<q eID="002"/></p>
<p><q who="Wilson" sID="003"/>Why that?<q eID="003"/> I asks.<q eID="001"/></p>
Example 2
This example shows how we can use milestones to show, at
the same time, four different and overlapping organizational
sections:
-
<line>
= folio line
-
<vline>
= verse line (TEI
<l>
)
-
<HL>
= half line
-
<oecno>
= lines according to the Old English Corpus
-
<vsection>
= verse section (TEI
<lg>
)
I used the same id for sID
and eID as we use for the
regular ID.
<p>
<oecno sID="boe014000005002" n="66"/>
...
<line sID="oa6003r12" n="12"/>
...
<vsection sID="oa6m05"/>
<vline sID="oa6m05001" n="m5.1"/>
<HL sID="oa6m05001a"/>ÐV meaht be ðære sunnan<HL eID="oa6m05001a"/>
<line eID="oa6003r12"/>
<line sID="oa6003r13" n="13"/>
<HL sID="oa6m05001b"/>sweotole geþencean<HL eID="oa6m05001b"/>
<vline eID="oa6m05001"/>
<vline sID="oa6m05002" n="m5.2"/>
<HL sID="oa6m05002a"/>7 be æghwel-
<line eID="oa6003r13"/>
<line sID="oa6003r14" n="14"/>cum
<HL eID="oa6m05002a"/>
<HL sID="oa6m05002b"/>oðrum steorran<HL eID="oa6m05002b"/>
<vline eID="oa6m05002"/>
<vline sID="oa6m05003" n="m5.3"/>
<HL sID="oa6m05003a"/>þara
<line eID="oa6003r14"/>
<line sID="oa6003r15" n="15"/>þe æfter burgum
<HL eID="oa6m05003a"/>
<HL sID="oa6m05003b"/>beorhtost
<line eID="oa6003r15"/>
<line sID="oa6003r16" n="16"/>scineð.
<HL sID="oa6m05003b"/>
<oecno eID="boe014000005002"/>
<oecno sID="boe014000005004" n="67"/>
<vline eID="oa6m05003"/>
<vline sID="oa6m05004" n="m5.4"/>
<HL sID="oa6m05004a"/>gif him wan fore<HL eID="oa6m05004a"/>
<HL sID="oa6m05004b"/>wolcen
<line eID="oa6003r16"/>
<line sID="oa6003r17" n="17"/>hangað
<HL eID="oa6m05004b"/>
<vline eID="oa6m05004"/>
<vline sID="oa6m05005" n="m5.5"/>
<HL sID="oa6m05005a"/>ne mægen hi swa leohtne<HL eID="oa6m05005a"/>
<HL sID="oa6m05005b"/>leo-
<line eID="oa6003r17"/>
<line sID="oa6003r18" n="18"/>man ansendan
<HL eID="oa6m05005b"/>
<vline eID="oa6m05005"/>
<vline sID="oa6m05006" n="m5.6"/>
<HL sID="oa6m05006a"/>ær se þicca mist<HL eID="oa6m05006a"/>
<line eID="oa6003r18"/>
<line sID="oa6003r19" n="19"/>
<HL sID="oa6m05006b"/>þynra weorðe<HL eID="oa6m05006b"/>
<oecno eID="boe014000005006"/>
<oecno sID="boe014000005007" n="68"/>
<vline eID="oa6m05006"/>
<vline sID="oa6m05007" n="m5.7"/>
<HL sID="oa6m05007a"/>swa oft smylte<HL eID="oa6m05007a"/>
<HL sID="oa6m05007b"/>sæ
<line eID="oa6003r19"/>
...
<HL eID="oa6m05007b"/>
<vline eID="oa6m05007"/>
...
<oecno eID="boe014000005007"/>
...
<vsection eID="oa6m05"/></p>