Minutes of the TD meeting, Rutgers University, 7-8 November 1991. none. Revised : 11 Dec 91 : CMSMcQ Made file : 7 Nov 91 : CMSMcQ
Preliminaries

MSM reviewed the documents before the committee and apologized for the delays in distribution. The documents included: Agenda L. Burnard on revision of TEI header D. Biber et al. on text corpora (TR6 W1) S. Johansson et al. on spoken texts (extract from AI2 W1) David Rhead on revision of in-text citations (unnumbered) list of MARC tags from Marianne Gaunt (unnumbered) The committee spent several minutes reviewing the documents.

The meeting formally began at 10:10; MG suggested we break at noon. Apologies for Absence

SH reported that Dominik Wujastyk had regretfully found himself forced to resign from the committee. MSM reported that Barbara Ann Kipfer-Westerlund was unable to attend. Review of TEI Header

LB's proposals were reviewed. They were: The suffix .desc should be used for the high level grouping tags. This implies changing file.description, encoding.declarations and revision.history to file.desc, coding.desc, and revision.desc. The suffix .group should be used for groupings within each heading rather than the current mixture of .statement, .declarations etc. Thus title.group instead of title.statement etc. For source.description, statement.of.responsibility, text.category.defs read source.group, resp.group, texttype.group. Shorter names are suggested for some tags: for principal.researcher, created.by, creation.date, read principal, creator, date. The suffix .decl should be used for all tags used to provide declarations that are invoked (by ID/IDREF) elsewhere in the text. Thus sampling.decl not sampling.def, etc. The suffix .type should be used for all tags used to invoke tags in the group above. Thus refs.type and category.type not ref.type and text.category. Incorporate the suggestions of TR6, AI2, TR12 in what are now pp. 66-67 of TEI P1. Introduce some more specific tags into the section on general notes (TEI P1 p. 64), e.g. availability (MARC field 506), identification (MARC 510) instead of just ISSN. Include MARC equivalents in the discussion of each group of tags.

RG noted that LB's name conventions would loosen the current tight relationship between the TEI header names and those of the ISBD. MG suggested that cataloguers could nevertheless see the relationship.

LB's point 1 accepted: naming conventions should be normalized. '.group' not accepted. '.stmt' or 'Stmt' preferred. '.stmt' recommended as being a common abbreviation; the dot may be strictly against ML W23, but we ask for it anyway.

LB 2 accepted with revisions: title.stmt, etc, extent, notes.stmt, resp (not resp.stmt: we use '.stmt' for ISBD 'areas');'texttype.stmt'(N.B. 'stmt' is used for all ISBD areas, but not only for them). Thus also texttype.decl and texttype.type.

LB 3 shorter forms accepted except for creation.date, which is retained. Add 'funder' for 'funding.agency'.

LB 4-5 accepted except for category.type, which is texttype.type.

LB 7 (prima). Within general NOTES.STMT, add availability (additional sources), or restrictions on access / terms of use (CONDITIONS) -- this needs to be available also in the DISTRIBUTION structure as an optional element. identifying numbers (IDNO) -- use content of element to describe what the system is, if needed. related publications (PUBLICATIONS); can distinguish descriptions of the creation of the data (related publications), publications of research carried out using the data, and publications on related topics. language(s) (LANGUAGE)

The group discussed the possibility of exchange of TEI headers independent of the text transcriptions. The consensus was that this should be possible, and so it was necessary to allow CONDITIONS to appear within DISTRIBUTION.

After lunch, the committee considered the possibility of defining a simpler version of the header. MSM described the simpler header developed for the 'TOY2' DTD. The consensus of the committee was that an official description at this level of detail would be catastrophic: no one would use the fuller structure, which would mean the more detailed information defined in the current proposal would be lost.

The simplifications necessary for low-level software should be achieved by omitting strictly optional items.

The committee felt it would be wise to review the status of each element in the header; this should follow AACR2 where possible.

MSM moved that FILE.DESC be allowed to contain either CITN.FULL (roughly the same as the current definition) or any other form of citation allowed in running text. MT asked whether such simpler citation forms would simply result in omitted information which could later never be recovered. JB suggested that a minimum level of information could reasonably be required of anyone wanting to produce TEI conformant texts. HJM pointed out that if the minimum level is too daunting, scholars may provide no information at all. MT reported that in her experience, requesting a lot of information in either an unstructured or highly structured form resulted in nothing at all; requesting a middle level of information in highly structured form was the most successful.

MSM's motion failed; the structure to be used is the full form.

The committee decided to list explicitly the minimum data required in a TEI header, in a separate section at the beginning of the chapter. to list explicitly the data recommended for a TEI header, in a separate section at the beginning of the chapter. to distribute templates with all tags defined for the header, with comments showing which are required, recommended, and optional. Proposals from TR6 and AI2 for TEI Header TR6 W1 Language Corpora

MSM reviewed the overall structure of corpora and collections. The proposals of work group TR6 amount to a list of situational information they would like to see specifiable in each header; the question before the committee is whether that information goes in the bibliographic description of the electronic document, in the source description, or in the CODING.DESC, or partly in each.

HJM pointed out that if we allow any work group to add matter to the header, the header can never be finished. Could we add a media-specific or discipline-specific area within which anyone could devise new tags? MT objected that chaos would result; such tags would not be well structured. HJM agreed on the danger, but felt that if we provide what we believe to be essential, there will always be the need for extensions. (Further discussion of extensions.)

SH observed that the paper does not make clear how to record situational parameters for texts in which the situation changes. It is also not clear what is to be treated as a 'text'. HJM suggested that the TEI as a whole does not assume that a 'text' is anything in particular. We have been talking more about conventional paper texts, but here we face very different matters.

The expectation is that each item in the following list will be represented by a tag of X.DECL or X.TYPE either in the NOTES.STMT or in the CODING.DESC. HJM suggested that some, at least, of these characteristics belong in the source description; the problem with putting them there is that they are not part of a traditional bibliographic description. It was also suggested that perhaps the current CODING.DESC should be divided into PROJECT.DESC, CODING.DESC, and TEXTTYPE.DESC, or subdivisions introduced into CODING.DESC.

Three possibilities were identified: leave coding.desc alone and place the situational parameters into it or into the notes.stmt (as originally planned) subdivide the existing coding.desc break up the existing coding.desc

MSM proposes names CHARACTERISTICS.DESC, CODING.DESC, and PROJECT.DESC. HJM suggests we need to consider very generally what might possibly go into the header; otherwise, if we are making the header into a collection point for everything anybody might want to record about a text, we run the risk of creating (as here) two new sections in the header for every working paper we read.

RG suggests borrowing the concept of a formatted note from bibliography. E.g. a FORMATTED.NOTE tag for corpora, which would have instructions to place mode information, semicolon, channel, semicolon,etc.

MSM suggested that perhaps the areas now provided could after all be considered adequate for general purposes. As H-JM has pointed out, one may view the text as a series of layers, not all of which are equally well understood. In the general case, we have at least the machine-readable text the original text the relation between them

These correspond to the bibliographic description of the MR text together with the description of the aims, etc., of the project which created it, in the FILE.DESC and PROJECT.DESC elements the bibliographic description of the source text (which may itself have multiple layers), together with the descriptions of its situational parameters, in the SOURCE and CHARACTERISTICS.DESC (or PROFILE.DESC) elements. the description of the way in which the MR text encodes the latter, in the CODING.DESC.

SH suggested as a possible order for the five elements: FILE.DESC , PROJECT.DESC, PROFILE.DESC, CODING.DESC , REVISION.DESC.

The elements in the old encoding.declarations and notes.statement need to be relocated as follows. In the note to the cataloguer, specify that information found in notes area of a traditional catalogue record can in some cases be found elsewhere in the header.

Project description includes information on the creation of the machine readable text: its context, purpose, origin, aims, etc. It includes the existing aim composition.history The CONTENTS element is dropped; this information should go into the SOURCE element in FILE.DESC.

CHARACTERISTICS.DESC includes any information describing the text itself which does not form a standard part of bibliographic description. Notably this includes specification of the text type or category, the subject matter treated, keywords for retrieval, and the language(s) in which the text is written. texttype.decl texttype.type subject.decl subject.type keywords language (not in notes.stmt anymore) Nature, scope, artistic form, or purpose.

CODING.DESC includes information on the relationship between the machine-readable text and its source: the techniques used in sampling, the inclusion or exclusion of specified parts of the source, the editorial principles used in transcription, reference systems, etc. sampling.decl sampling.type editorial.principles editorial.type refs.decl refs.type textincl correction normalization q.tags hyphenation standard.numeric.values, standard.date.values, analysis, segment.demarcation variant.encoding

Some materials do not get moved, necessarily. Suggest that the NOTES.STMT contain a summary note providing a brief objective summary of the content of an item unless another part of the description provides enough information. (-AACR2 7.7B17) Acknowledgement of research assistants, etc. stays in NOTES.STMT. CONDITIONS stays in DISTRIBUTION. Other information about alternative sources, etc., stays in notes. PUBLICATIONS stays in NOTES.STMT.

After discussion, the project description element was deleted and its contents merged into the CODING.DESC element. AIM is renamed projectAim.

The committee approved the proposal of AI2 to place description of script, recording event, and transcription within the source description inside the file.desc.

The script description should be a normal bibliographic citation.

The recording statement might benefit from closer examination of bibliographic practice, e.g. for oral-history cataloguing. This needs further study; it was decided however that demographic information, situational information, and setting description belong in the characteristics.desc rather than in the source description.

The transcription statement, like the script statement, should be / be replaced by a normal bibliographic citation (possibly a citation for ms material).

Characteristics.desc gets the ListOfParticipants.

SituationalInformation should perhaps move out of the List of Participants; it's not clear that embedding it in the list is apt; relationships among participants are not apt to change during a transcript, while other items (e.g. awareness of recording) may change.

Setting element should be in Characteristics.Desc as well.

The following decisions were made about the TR6 situational parameters. Mode: in characteristics.desc, a MODE.DECL and MODE.TYPE Channel: ditto. (Note: Clarification needed on usage.) Language: already in Characteristics.

The remainder should be unified with the AI2 list. HJM pointed out that demographic information may be of much more general applicability, and perhaps such concrete proposals are too concrete. What is there includes debatable points as well as items clearly to be included. Review of in-text citations and references

MSM summarized the salient points of David Rhead's paper thus: Replace the CITN and CITN.STRUCT elements with REF.MONO, REF.SERIAL, REF.MONO.PART, etc. As a standard prefix, CIT. was preferred to REF. If any two of Rhead's forms are structurally indistinct, it was felt wise to merge them. HJM suggested that this list creates too many new element types and urged that types of citation be distinguished not by separate element types but by a TYPE attribute on the CIT element. JB agreed that the distinction of so many types of citation seemed to put a large burden on the encoder. Bibliographically every item is a monograph, a serial, or a component part of a monograph or serial; MSM observed that JB's analysis would allow the reduction of the citation types to four instead of Rhead's eight structured and one unstructured. The consensus of the committee, however, was to reject this further specification and define three types of citation: CIT.FULL (based on file.desc), CIT.STRUCT (as before), and CIT (as before). Distinguish primary and secondary responsibility with different tags. The committee decided to add a TYPE attribute instead. Distinguish personal from corporate names by introducing ORG tag. The committee decided to introduce a TYPE attribute on the NAME element. Flatten structure. This modification was rejected. Add some new detail tags. Some of these are included in the new CIT.FULL element; others seem negligible. Add IBID, OPCIT? No. Allow replacement of LIST.CIT with an empty tag to generate a bibliographic list. MARBI and TEI Header

MARBI is the ALA committee which authorizes changes to the table of MARC tags. LC maintains agency for MARBI between meetings. JB called the relevant person in LC, and will check to see whether recent changes to MARC have implications for the TEI header. MG will check with the Rutgers cataloguers. X.500 and TEI Header

X.500 is a suite of protocols for network information sources; JB reported that the same contact person in LC has information on X.500 and he will check on that. MG also volunteered to ask network personnel at Rutgers. Any other business

SH summarized the timetable to June 1992; TEI P2 to be released in January and circulated for review. Final date for substantial input to TEI P2 is end of November. Comment on TEI P2 closes on 15 March 1992; a preliminary presentation will be made at ALLC/ACH in April. Advisory Board meets at end of May; current phase of project ends at end of June. The steering committee is planning future structure now; there will be further development, but we will also work more at consolidation and teaching. Funding and organizational arrangements will probably change in some ways.

The results of this meeting, in the form of these minutes, will be provided to the Steering Committee and the Myrdal meeting.

Action items were assigned as follows: RG will investigate cataloguing practice for oral history records and the practice of the British Sound Archives in London, as input for the revision of AI2's recording.statement. JB will consult sources at LC on recent MARBI work and X.500. MG will consult sources at Rutgers on recent MARBI work and X.500. JB will consult ISO standard for bibliographic abbreviations to ensure we are not in conflict. JB will determine whether there is a list of minimal data elements for MARC.