TEI MI M 01 (draft) Draft of TEI XML Migration Task Force Meeting Minutes, 2002-10-13/14 Syd Bauman TEI Consortium

None, this electronic version is the original.

SBTook original notes (directly in TEI meet-mins DTD)

SBStarted cleaning up notes

SBMore cleaning up notes

Initials Used for People SB Syd Bauman AB Alejandro Bia LB Lou Burnard JH Jessica Hekman TR Tobias Rischer CR Christine Ruotolo NS Natalia Smith ST Syun Tutiya JU John Unsworth CW Christian Wittern

Meeting took place in the Claridge Hotel, Chicago IL, USA on Sunday 13 and Monday 14 October. All times listed are the local-time in Chicago.

Commenced ~13:28 with SB, LB, AB, JH, TR, CR, NS, ST; CW joined ~13:50.

Introductions.

Objectives

Review of list of objectives from our charge.

CR Q: are DTDs in scope? Consensus is that they are, but because few people will need help here, low priority. CR: plan to have relatively vague suggestions in recommendation documents.

CR suggests our focus should be on P3->P4. Consensus is that outlining tasks for P3->P4(XML) will include all steps needed for P4 SGML -> P4 XML. Asks if we want to have more of an advocacy role. LB answers yes. SB agrees, but wonders if any advocacy is necessary. LB points out that (disregarding extensions) a P3 document is ipso facto a P4 document.

Brief discussion of why a project wants to move to XML: access to new technologies, new tools (XML); non-support of P3 (P).

We have 2 sections of document already! 1. Scoping; 2. Motivation.

Case studies.

SB asks do we need a test suite? Is it hard to make? JH is concerned we may not be able think of 'em all.

LB points out difference between test suite and using samples. Thinks we need to ascertain what practices are via survey (#2).

2002-10-22 Remind MP to send OTA materials 2002-10-22 Send WWP samples

CR asks whether or not we need test suite. SB asks how hard is it to do? CW suggests start with list of differences between XML and SGML.Available from http://www.w3.org/TR/NOTE-sgml-xml; doesn't seem to be on the CD, though..

Question comes up as to who we are surveying: SB holds repository reps insufficient to survey.

Summary that we will not use test suite, but rather results of survey of real cases, perhaps augmented with a fabricated test if deemed necessary.

Although software development is not an output of this group, suggestions for areas ripe for new tools or modifications to existing ones are.

Modifications to ED W 76 made.

2002-11-01 Write preliminary work-plan and circulate to list

Survey

CR: not too many responses.To posting to TEI-L of 2002-10-04 14:12-04 migrating TEI resources from SGML to XML.

CW explains the recent experience of Character Set WG with its survey.

SB suggests as only 50 projects listed on TEI website, perhaps phone survey. Generally disliked, but LB counter proposes e-mail with caveats of privacy. LB likes e-mail and phone call. Question discussed about whether we just want files or answers to survey questions, too.

So, after identification stage letter asking a very brief survey culminating with asking for only a small data sample (no DTD or other supporting files should be explicitly requested). Non-respondents to be contacted by phone. Respondents for whom we have questions followed-up by e-mail. Also a thank-you.

Five stages of survey project: Identification of projects using TEI (SGML) Survey letter for collection of samples. Telephone follow-up of non-responders (repository group to help) Analysis stage: divvy up sample files and check for various features. Follow-up based on number and nature of samples — e.g., asking for DTDs when needed, getting info on technical, organizational challenges and opportunities Samples will then be checked against a checklist of issues. 2002-11-01 Create MI W 04, the checklist for stage 3 examination of files ?? Develop database of contacts 2002-10-23 Follow up on JF's survey (of which data went to JU), find out where the data is. 2002-11-01 Develop list of projects that use TEI to which we should send survey, get data 2002-10-26 Send LB any projects you know of 2003-01-02 Draft survey letter asking for samples and asking questions 2002-11-01 Look for "tei" on HUMBUL; coordinate the great TEI search. 2002-10-26 Draft "stand up and identify yourself" letter

XML4LIB, TEI-L, HUMANIST, BIBLIOTECH, DIGLIB, LINGUIST, CORPORA, ANSAX-L, 2002-11-15 Get a list of lists from MF and get "stand up and identify yourself" letter posted to all lists (including above).

Identify . . .

Split out technical to expert group, organizational to repository group. 2002-10-28 Initiate organizational discussion in repository group.

Discussion of order of objectives in Charge. Decided charge is really unordered, not to worry about it. CR to provide order in work-plan.

Decided to discuss further issues (e.g. XPointer and other P5ish issues) in appendix to output reports.

Adjourned ~17:18.

Minimally invasive vs. canonical

Commenced ~09:15.

CR reviews discussion from list.

Discussion of whitespace. General agreement that we need to try to munge source whitespace so that parsed whitespace matches.

Discussion of character entity references. LB argues that in migration character entity references should be converted to characters or numeric character references. Consensus is to have prose discussing reasons for desiring this conversion (that later XML processes won't be able to handle character entity references), but to recommend it as an option.

Discussion of external entities. 2002-10-28 Ask Steve DeRose for his notes of what he did to convert P3 ODD files to P4 ODD files. Consensus is the same as for the previous two: user option with discussion of why you'd prefer to use XInclude to system entities.

Discussion on DTDs: yes, we need to keep 'em. XML tools that won't do well-formedness work on files that specify a DOCTYPE declaration are broken, so it's not our problem.

Can address dirty hacks.

Comments: can't have comments inside other declarations; can't have multiple comments inside one comment; <!> not permitted. 2002-10-14 Investigate how comments are processed in SX or other tools

strategies document will have things like advising migrators to think about issues of, say, XInclude v. external entities. practices document will have advise on how to convert to XInclude or how to migrate without converting.

In strategy document we should probably point out that more migrations in the future are likely, but that if you're happy with P4, TEI does plan to support it, you could just stay there.

Specification of defaulted attributes: we'll recommend not to specify them (and hopefully point out ways to migrate without them) unless you really need them.

Discussion of DTD conversion: we can't help those who did not use extension mechanism, but we should have a paragraph addressing the problems created by not doing so.

Strategic document should discuss the fact that migration may be an opportunity to improve your DTD.

CR: In technical report document we need to address minimal conversion easy conversion conversion that maximizes XML tool usability conversion that is forward-looking to P5, or at least what we can predict of P5. in depth discussion of macro issues identified in samples

SDATA entity discussion. SB suggests three categories characters that are in Unicode characters that are not in Unicode solutions, ala P4 chap 4.2.1 CDATA PIs markup (<c>) SB suggests we need to better describe the disadvantages of each method in our practices document others ambiguous glyph glyph exists in Unicode with different meaning in the document temp data capture flags

Processing environment

LB points out difficulty in actually managing all the little pieces of a sample (or real) case. Corollary is that practices document needs to address catalog files.

Things to Consider instances DTD extension files catalog files style-sheets and other parts of processing environment

Add questions about processing environment to third round survey questions.

SDATA entities to be attacked by a separate individual in practices document.

Discussion of problems found in samples

TR: ??

LB: consultancy may be desirable. General agreement that a workshop on specific issue like, e.g. extension files, would be a good thing.

SB asks about recommending open source v. proprietary software. In resulting discussion LB points out that he'd prefer we say this tool does this rather than make a recommendation use this tool.

LB sees only three strategies for obtaining tools for migration: in-house development buy proprietary tools use open source 2002-11-04 Seek out vendors of useful tools, and contact them to find out rudimentary information about their tools.

Case Studies

CR expects repository reps to write up a case study each. Recommendations for tools & strategies should be ready by mid- to late-December to give repository reps a month to work before joint meeting.

2002-12-01 Write up a framework of feedback information we want from repository reps, MI W 01, Format for Case Study Feedback

Dividing up Labor for Writing up Reports
Strategic document: MI W 02 Strategic Considerations in Migrating TEI documents from SGML to XML. Challenges, opportunities, and motivation. Types or scope of migration (P3->P4 or P4->P4) Areas of migration (instances, DTD extensions, catalog files, processing environment) Levels of migration, e.g. minimal surgery approach, get almost to P5 approach, et. al. Appendix: potential impact of future versions of the Guidelines on migration issues.
MI W 03 Practical Guide to Migration of TEI Documents from SGML to XML DTD conversions SDATA (ST) Extension files (TR) Instance conversion: tools. Issues: whitespace & comments, prologue & file structure (e.g. external entities) (JH) Recommended work-flow (AB)

Section write-ups due 2002-12-02. 2002-11-25 Send reminder mail to group to get write-ups done in 1 week

Adjourned ~16:00.