9 Dictionaries

Contenu

This chapter defines a module for encoding human-oriented monolingual and multilingual dictionaries, glossaries, and similar documents. The elements described here may also be useful in the encoding of computational lexica and similar resources intended for use by language-processing software; they may also be used to provide a rich encoding for wordlists, lexica, glossaries, etc. included within other documents. Dictionaries are most familiar in their printed form; however, increasing numbers of dictionaries exist also in electronic forms which are independent of any particular printed form, but from which various displays can be produced.

Both typographically and structurally, print dictionaries are extremely complex. In addition, dictionaries are of interest to many communities with different and sometimes conflicting goals. As a result, many general problems of text encoding are particularly pronounced here, and more compromises and alternatives within the encoding scheme may be required in future.28 Two problems are particularly prominent.

First, because the structure of dictionary entries varies widely both among and within dictionaries, the simplest way for an encoding scheme to accommodate the entire range of structures actually encountered is to allow virtually any element to appear virtually anywhere in a dictionary entry. It is clear, however, that strong and consistent structural principles do govern the vast majority of conventional dictionaries, as well as many or most entries even in more ‘exotic’ dictionaries; encoding guidelines should include these structural principles. We therefore define two distinct elements for dictionary entries, one (entry) which captures the regularities of many conventional dictionary entries, and a second (entryFree) which uses the same elements, but allows them to combine much more freely. It is however recommended that entry be used in preference to entryFree wherever possible. These elements and their contents are described in sections 9.2 The Structure of Dictionary Entries, 9.6 Unstructured Entries, and 9.4 Headword and Pronunciation References.

Second, since so much of the information in printed dictionaries is implicit or highly compressed, their encoding requires clear thought about whether it is to capture the precise typographic form of the source text or the underlying structure of the information it presents. Since both of these views of the dictionary may be of interest, it proves necessary to develop methods of recording both, and of recording the interrelationship between them as well. Users interested mainly in the printed format of the dictionary will require an encoding to be faithful to an original printed version. However, other users will be interested primarily in capturing the lexical information in a dictionary in a form suitable for further processing, which may demand the expansion or rearrangement of the information contained in the printed form. Further, some users wish to encode both of these views of the data, and retain the links between related elements of the two encodings. Problems of recording these two different views of dictionary data are discussed in section 9.5 Typographic and Lexical Information in Dictionary Data, together with mechanisms for retaining both views when this is desired.

To deal with this complexity, and in particular to account for the wide variety of linguistic context within which a dictionary may be designed, it can be necessary to customize or change the schema by providing more restriction or possibly alternate content models for the elements defined in this chapter. Section 9.3.2 Grammatical Information illustrates this with the provision of a closed set of values for grammatical descriptors.

This chapter contains a large number of examples taken from existing print dictionaries; in each case, the original source is identified. In presenting such examples, we have tried to retain the original typographic appearance of the example as well as presenting a suggested encoding for it. Where this has not been possible (for example in the display of pronounciation) we have adopted the transliteration found in the electronic edition of the Oxford Advanced Learner's Dictionary. Also, the middle dot in quoted entries is rendered with a full stop, while within the sample transcriptions hyphenation and syllabification points are indicated by a vertical bar |, regardless of their appearance in the source text.

9.1 Dictionary Body and Overall Structure

Overall, dictionaries have the same structure of front matter, body, and back matter familiar from other texts. In addition, this modules defines entry, entryFree, and superEntry as component-level elements which can occur directly within a text division or the text body.

The following tags can therefore be used to mark the gross structure of a printed dictionary; the dictionary-specific tags are discussed further in the following section.
  • text (texte) contient un seul texte quelconque, qu’il soit unitaire ou composite, par exemple un poème ou une pièce de théâtre, un recueil d’essais, un roman, un dictionnaire ou un échantillon d’un corpus
  • front (parties préliminaires) contient tout ce qui est au début du document, avant le corps du texte : en-têtes, page de titre, préface, dédicaces, etc.
  • body (corps du texte) contient la totalité du corps d’un seul texte unitaire, à l’exclusion de toute partie liminaire
  • back (parties postliminaires) contient tout ce qui suit le corps du texte : appendice, etc.
  • div (division du texte) contient une subdivision d'une partie liminaire ou du corps d’un texte
  • entry (entrée) contient une entrée structurée de dictionnaire.
  • entryFree (entrée libre) contient une entrée de dictionnaire qui ne se conforme pas nécessairement aux contraintes imposées par l’élément entry.
  • superEntry (groupe d'entrées) regroupe des entrées successives pour un ensemble d'homographes.
As members of the class att.entryLike, entry and entryFree share the following attributes:
  • att.entryLike regroupe les différents types d’entrées de dictionnaire.
    typedans des dictionnaires multi-types, indique le type d'entrée
    sortKeycontient une suite triable de caractères rendant compte de la position alphabétique de l'entrée dans le dictionnaire imprimé.

The front and back matter of a dictionary may well contain specialized material such as lists of common and proper nouns, grammatical tables, gazetteers, a ‘guide to the use of the dictionary’, etc. These should be tagged using elements defined elsewhere in these Guidelines, chiefly in the core module (chapter 3 Elements Available in All TEI Documents) together with the specialized dictionary elements defined in this chapter.

The body element consists of a set of entries, optionally grouped into one or several div elements. These text divisions might correspond, for example, sections for different letters of the alphabet, or to sections for different languages in bilingual dictionaries, as in the following example:
<body>
 <div>
  <head>English-French</head>
  <entry>
<!-- ... -->
  </entry>
  <entry>
<!-- ... -->
  </entry>
  <entry>
<!-- ... -->
  </entry>
 </div>
 <div>
  <head>French-English</head>
  <entry>
<!-- ... -->
  </entry>
  <entry>
<!-- ... -->
  </entry>
  <entry>
<!-- ... -->
  </entry>
 </div>
</body>

In a print dictionary, the entries are typically typographically distinct entities, each headed by some morphological form of the lexical item described (the headword), and sorted in alphabetical order or (especially for non-alphabetic scripts) in some other conventional sequence. Dictionary entries should be encoded as distinct successive items, each marked as an entry or entryFree element. The type attribute may be used to distinguish different types of entries, for example main entries, related entries, run-on entries, or entries for cross-references, etc.

Some dictionaries provide distinct entries for homographs, on the basis of etymology, part-of-speech, or both, and typically provide a numeric superscript on the headword identifying the homograph number. In these cases each homograph should be encoded as a separate entry; the superEntry element may optionally be used to group such successive homograph entries. In addition to a series of entry elements, the superEntry may contain a preliminary form group (see section 9.3.1 Information on Written and Spoken Forms) when information about hyphenation, pronunciation, etc., is given only once for two or more homograph entries. If the homograph number is to be recorded, the global attribute n may be used for this purpose. In some dictionaries, homographs are treated in distinct parts of the same entry; in these cases, they may be separated by use of the hom element, for which see section 9.2.1 Hierarchical Levels.

A sort key, given in the key attribute, is often required for superentries and entries, especially in cases where the order of entries does not follow the local character-set collating sequence (as, for example, when an entry for ‘3D’ appears at the place where ‘three-D’ would appear).

A dictionary with no internal divisions might thus have a structure like the following; a superEntry is shown grouping two homograph entries.
<body>
 <entry>
<!-- ... -->
 </entry>
 <entry>
<!-- ... -->
 </entry>
 <superEntry>
  <entry type="homn="1"/>
  <entry type="homn="2"/>
 </superEntry>
</body>

9.2 The Structure of Dictionary Entries

A simple dictionary entry may contain information about the form of the word treated, its grammatical characterization, its definition, synonyms, or translation equivalents, its etymology, cross-references to other entries, usage information, and examples. These we refer to as the constituent parts or constituents of the entry; some dictionary constituents possess no internal structure, while others are most naturally viewed as groups of smaller elements, which may be marked in their own right. In some styles of markup, tags will be applied only to the low-level items, leaving the constituent groups which contain them untagged. We distinguish the class of top-level constituents of dictionary entries, which can occur directly within entries, from the class of phrase-level constituents, which can normally occur only within top-level constituents. The top-level constituents of dictionary entries are described in section 9.2.2 Groups and Constituents, and documented more fully, together with their phrase-level sub-constituents, in section 9.3 Top-level Constituents of Entries.

In addition, however, dictionary entries often have a complex hierarchical structure. For example, an entry may consist of two or more sub-parts, each corresponding to information for a different part-of-speech homograph of the headword. The entry (or part-of-speech homographs, if the entry is split this way) may also consist of senses, each of which may in turn be composed of two or more sub-senses, etc. Each sub-part, homograph entry, sense, or sub-sense we call a level; at any level in an entry, any or all of the constituent parts of dictionary entries may appear. The hierarchical levels of dictionary entries are documented in section 9.2.1 Hierarchical Levels.

9.2.1 Hierarchical Levels

The outermost structural level of an entry is marked with the elements entry or entryFree. The hom element marks the subdivision of entries into homographs differing in their part-of-speech. The sense element marks the subdivision of entries and part-of-speech homographs into senses; this element nests recursively in order to provide for a hierarchy of sub-senses of any depth. All of these levels may each contain any of the constituent parts of an entry. A special case of hierarchical structure is represented by the re (related entry) element, which is discussed in section 9.3.6 Related Entries. Finally, the element dictScrap may be used at any point in the hierarchy to delimit parts of the dictionary entry which are structurally anomalous, as further discussed in section 9.6 Unstructured Entries.
  • entry (entrée) contient une entrée structurée de dictionnaire.
  • entryFree (entrée libre) contient une entrée de dictionnaire qui ne se conforme pas nécessairement aux contraintes imposées par l’élément entry.
  • hom (homographe) regroupe les informations relatives à un homographe dans une entrée.
  • sense regroupe toutes les informations concernant le sens d’un mot dans une entrée de dictionnaire (définitions, exemples, équivalents linguistiques, etc.).
    levelindique le niveau de ce sens dans la hiérarchie.
  • dictScrap (bloc d'informations) contient la partie d'une entrée de dictionnaire dans laquelle d'autres éléments de niveau "expression" sont librement associés.
For example, an entry with two senses will have the following structure:
<entry>
 <sense n="1"/>
 <sense n="2"/>
</entry>
An entry with two homographs, the first with two senses and the second with three (one of which has two sub-senses), may have a structure like this:
<entry>
 <hom n="1">
  <sense n="1">
<!-- ... -->
  </sense>
  <sense n="2">
<!-- ... -->
  </sense>
 </hom>
 <hom n="2">
  <sense n="1">
   <sense n="a">
<!-- ... -->
   </sense>
   <sense n="b">
<!-- ... -->
   </sense>
  </sense>
  <sense n="2">
<!-- ... -->
  </sense>
  <sense n="3">
<!-- ... -->
  </sense>
 </hom>
</entry>
In some dictionaries, homographs have separate entries; in such a case, as noted in section 9.1 Dictionary Body and Overall Structure, the two homographs may be treated as entries, optionally grouped in a superEntry:
<superEntry>
 <entry n="1type="hom">
  <sense n="1">
<!-- ... -->
  </sense>
  <sense n="2">
<!-- ... -->
  </sense>
 </entry>
 <entry n="2type="hom">
  <sense n="1">
   <sense n="a">
<!-- ... -->
   </sense>
   <sense n="b">
<!-- ... -->
   </sense>
  </sense>
  <sense n="2">
<!-- ... -->
  </sense>
  <sense n="3">
<!-- ... -->
  </sense>
 </entry>
</superEntry>
The hierarchic structure of a dictionary entry is enforced by the structures defined in this module. The content model for entry specifies that entries do not nest, that homographs nest within entries, and that senses nest within entries, homographs, or senses, and may be nested to any depth to reflect the embedding of sub-senses. Any of the top-level constituents (def, usg, form, etc.) can appear at any level (i.e., within entries, homographs, or senses).

9.2.2 Groups and Constituents

As noted above, dictionary entries, and subordinate levels within dictionary entries, may comprise several constituent parts, each providing a different type of information about the word treated. The top-level constituents of dictionary entries are:
  • information about the form of the word treated (orthography, pronunciation, hyphenation, etc.)
  • grammatical information (part of speech, grammatical sub-categorization, etc.)
  • definitions or translations into another language
  • etymology
  • examples
  • usage information
  • cross-references to other entries
  • notes
  • entries (often of reduced form) for related words, typically called related entries
Any of the hierarchical levels (entry, entryFree, hom, and sense) may contain any of these top-level constituents, since information about word form, particular grammatical information, special pronunciation, usage information, etc., may apply to an entire entry, or to only one homograph, or only to a particular sense. The examples below illustrate this point.
The following elements are used to encode these top-level constituents:
  • form (groupe d'informations sur une forme dans une entrée) regroupe toutes les informations relatives à la morphologie et à la prononciation d'une entrée
  • gramGrp (groupe d'informations grammaticales) regroupe des informations morphosyntaxiques sur un item lexical, par exemple Partie du discours pos, Genre gen, Nombre number, Cas case, ou Classe flexionnelle iType.
  • def (définition) contient le texte de la définition contenue dans l'entrée
  • cit (citation) citation provenant d'un autre document avec la référence bibliographique de sa source
  • usg (usage) usagecontient les informations sur l'usage dans une entrée de dictionnaire
  • xr (renvoi) contient une expression, une phrase, ou une icône qui invite le lecteur à se référer à un autre endroit dans le même texte ou dans un autre texte.
  • etym (étymologie) dans une entrée, contient les informations étymologique
  • re (sous-entrée) entrée concernant un item lié au mot-vedette comme un composé ou un dérivé, inclus dans une entrée plus large.
  • note contient une note ou une annotation
In a simple entry with no internal hierarchy, all top-level constituents appear at the entry level.

com.peti.tor/k@m"petit@(r)/ n person who competes. OALD

<entry>
 <form>
  <orth>competitor</orth>
  <hyph>com|peti|tor</hyph>
  <pron>k@m"petit@(r)</pron>
 </form>
 <gramGrp>
  <pos>n</pos>
 </gramGrp>
 <def>person who competes.</def>
</entry>
For the elements which appear within the form and gramGrp elements of this and other examples, see below, section 9.3.1 Information on Written and Spoken Forms, and section9.3.2 Grammatical Information.
Any top-level constituent can appear at any level when the hierarchical structure of the entry is more complex. The most obvious examples are def and cit, which appear at the sense level when several senses or translations exist:

disproof(dIs"pru:f) n. 1. facts that disprove something. 2. the act of disproving. CED

<entry>
 <form>
  <orth>disproof</orth>
  <pron>dIs"pru:f</pron>
 </form>
 <gramGrp>
  <pos>n</pos>
 </gramGrp>
 <sense n="1">
  <def>facts that disprove something.</def>
 </sense>
 <sense n="2">
  <def>the act of disproving.</def>
 </sense>
</entry>
In the following example, gramGrp is used to distinguish two homographs:

bray/breI/ n cry of an ass; sound of a trumpet. ∙ vt [VP2A] make a cry or sound of this kind. OALD

<entry>
 <form>
  <orth>bray</orth>
  <pron>breI</pron>
 </form>
 <hom>
  <gramGrp>
   <pos>n</pos>
  </gramGrp>
  <def>cry of an ass; sound of a trumpet.</def>
 </hom>
 <hom>
  <gramGrp>
   <pos>vt</pos>
   <subc>VP2A</subc>
  </gramGrp>
  <def>make a cry or sound of this kind.</def>
 </hom>
</entry>
Information of the same kind can appear at different levels within the same entry; here, grammatical information occurs both at entry and homograph level.

ca.reen/k@"ri:n/ vt,vi 1 [VP6A] turn (a ship) on one side for cleaning, repairing, etc. 2 [VP6A, 2A] (cause to) tilt, lean over to one side. OALD

<entry>
 <form>
  <orth>careen</orth>
  <hyph>ca|reen</hyph>
  <pron>k@"ri:n</pron>
 </form>
 <gramGrp>
  <pos>vt</pos>
  <pos>vi</pos>
 </gramGrp>
 <sense n="1">
  <gramGrp>
   <subc>VP6A</subc>
  </gramGrp>
  <def>turn (a ship) on one side for cleaning, repairing, etc.</def>
 </sense>
 <sense n="2">
  <gramGrp>
   <subc>VP6A</subc>
   <subc>VP2A</subc>
  </gramGrp>
  <def>(cause to) tilt, lean over to one side.</def>
 </sense>
</entry>
Alone among the constituent groups, form can appear at the superEntry level as well as at the entry, hom, and sense levels:

a.ban.don 1/@"band@n/ v [T1] 1 to leave completely and for ever; desert: The sailors abandoned the burning ship. 2 …abandon 2 n [U] the state when one's feelings and actions are uncontrolled; freedom from control...LDOCE

<superEntry>
 <form>
  <orth>abandon</orth>
  <hyph>a|ban|don</hyph>
  <pron>@"band@n</pron>
 </form>
 <entry n="1">
  <gramGrp>
   <pos>v</pos>
   <subc>T1</subc>
  </gramGrp>
  <sense n="1">
   <def>to leave completely and for ever … </def>
  </sense>
  <sense n="2"/>
 </entry>
 <entry n="2">
  <gramGrp>
   <pos>n</pos>
   <subc>U</subc>
  </gramGrp>
  <def>the state when one's feelings and actions are uncontrolled; freedom
     from control…</def>
 </entry>
</superEntry>

9.3 Top-level Constituents of Entries

This section describes the top-level constituents of dictionary entries, together with the phrase-level constituents peculiar to each.

9.3.1 Information on Written and Spoken Forms

Dictionary entries most often begin with information about the form of the word to which the entry applies. Typically, the orthographic form of the word, sometimes marked for syllabification or hyphenation, is the first item in an entry. Other information about the word, including variant or alternate forms, inflected forms, pronunciation, etc., is also often given.

The following elements should be used to encode this information: the form element groups one or more occurrences of any of them; it can also be recursively nested to reflect more complex sub-grouping of information about word form(s), as shown in the examples.
  • form (groupe d'informations sur une forme dans une entrée) regroupe toutes les informations relatives à la morphologie et à la prononciation d'une entrée
    typequalifie la forme comme simple, composée, etc.
  • orth (forme orthographique) donne l’orthographe d'une entrée de dictionnaire
    typedonne le type d’orthographe
    extentdonne l'étendue des informations fournies sur l'orthographe.
  • pron (prononciation) contient la/les prononciation(s) du mot
    extentindique si la prononciation est relative au mot entier ou seulement à une partie
  • hyph (usage du trait d'union) contient une entrée avec un trait d'union ou des informations relatives à l'usage du trait d'union dans une autre forme.
  • syll (syllabisation) contient la syllabisation du mot-vedette.
  • stress (accentuation) contient le modèle d’accentuation d'une entrée de dictionnaire, s’il est donné à part
  • lbl (étiquette) étiquette pour la forme d’un mot, pour un exemple, pour une traduction, ou pour tout autre type d’information, par exemple "abréviation pour", "contraction de", "littéralement", "approximativement", "synonymes", etc.
In addition to those listed above, the following elements, which encode morphological details of the form, may also occur within form elements:
  • gram (information grammaticale ) contient de l'information grammaticale relative à un terme, un mot, ou une forme dans une entrée de dictionnaire ou dans un fichier de données terminologiques.
    typeclasse l'information grammaticale fournie selon une typologie particulière : dans le cas d'informations terminologiques, de préférence au moyen du dictionnaire des types d'éléments de données spécifiés dans la norme ISO WD 12 620.
  • gen (genre) identifie le genre morphologique d'un élément lexical, tel qu'il est donné par le dictionnaire.
  • number (nombre) indique le nombre grammatical associé à une forme, tel qu'elle est donnée par le dictionnaire.
  • case (cas) contient des informations sur le cas grammatical présenté par le dictionnaire pour une forme donnée.
  • per (personne) contient des indications sur la personne grammaticale (1re, 2ème, 3ème, etc.) liée à une forme fléchie donnée dans un dictionnaire.
  • tns (temps) indique le temps grammatical lié à une forme fléchie donnée dans un dictionnaire
  • mood (mode) contient des informations sur le mode grammatical des verbes (par exemple l’indicatif, le subjonctif, l’impératif)
  • iType (classe flexionnelle) indique la classe flexionnelle à laquelle appartient un item lexical.
    typedonne le type d'indicateur employé pour indiquer la classe flexionnelle, quand il est nécessaire de distinguer entre les abréviations usuelles (par exemple inv) et d'autres types d'indicateurs, tels que des codes spéciaux faisant référence à des modèles de conjugaison, etc.
Of these, the gram element is most general, and all of the others are synonymous with a gram element with appropriate values (gen, number, case, etc.) for the type attribute.

Different dictionaries use different means to mark hyphenation, syllabification, and stress, and they often use some unusual glyphs (e.g., the ‘middle dot’ for hyphenation). All of these glyphs are in the Unicode character set, as discussed in Character References. When transcribing representations of pronunciation the International Phonetic Alphabet should be used. It may be convenient (as has been done in the text of this chapter) to use a simple transliteration scheme for this; such a scheme should however be properly documented in the header.

In the simplest case, nothing is given but the orthography:
<form>
 <orth>doom-laden</orth>
</form>
Often, however, pronunciation is given.

soucoupe [sukup] … DNT

<form>
 <orth>soucoupe</orth>
 <pron>sukup</pron>
</form>

For a variety of reasons including ease of processing, it may be desired to split into separate elements information which is collapsed into a single element in the source text; orthography and hyphenation may for example be transcribed as separate elements, although given together in the source text. For a discussion of the issues involved, and of methods for retaining both the presentation form and the interpreted form, see section 9.5 Typographic and Lexical Information in Dictionary Data.

This example splits orthography and hyphenation, and adds syllabification because it differs from hyphenation:

ar.eaW7

<form>
 <orth>area</orth>
 <hyph>ar|ea</hyph>
 <syll>ar|e|a</syll>
</form>
Multiple orthographic forms may be given, e.g. to illustrate a word's inflectional pattern:

brag … vb. brags, bragging, bragged … CED

<form>
 <orth>brag</orth>
</form>
<gramGrp>
 <pos>vb</pos>
</gramGrp>
<form type="infl">
 <orth>brags</orth>
 <orth>bragging</orth>
 <orth>bragged</orth>
</form>
Or the inflectional pattern may be indicated by reference to a table of paradigms, as here:

horrifier[ORifje] (7) vt … [C/R]

<form>
 <orth>horrifier</orth>
 <pron>ORifje</pron>
 <iType type="vbtable">7</iType>
</form>
Explanatory labels may be attached to alternate forms:

MTBF abbrev. for mean time between failures. CED

<entry>
 <form type="abbrev">
  <orth>MTBF</orth>
 </form>
 <form type="full">
  <lbl>abbrev. for</lbl>
  <orth>mean time between failures</orth>
 </form>
</entry>
When multiple orthographic forms are given, a pronunciation may be associated with all of them, as here:

biryani or biriani(%bIrI"A:nI) CED

<form>
 <orth>biryani</orth>
 <orth>biriani</orth>
 <pron>%bIrI"A:nI</pron>
</form>
In other cases, different pronunciations are provided for different orthographic forms; here, the form element is repeated to associate the first orthographic form explicitly with the first pronunciation, and the second orthographic form with the second pronunciation:

mackle("mak^@l) or macule ("makju:l) CED

<form>
 <orth>mackle</orth>
 <pron>"makəl</pron>
</form>
<form>
 <orth>macule</orth>
 <pron>"makju:l</pron>
</form>
Recursive nesting of the form element can preserve relations among elements that are implicit in the text. For example, in the CED entry for ‘hospitaller’, it is clear that ‘U.S.’ is associated only with ‘hospitaler’, but that the pronunciation applies to both forms. The following encoding preserves these relations:

hospitaller or U.S. hospitaler ("hQspIt@l@) CED

<form>
 <orth>hospitaller</orth>
 <form>
  <usg type="geo">U.S.</usg>
  <orth>hospitaler</orth>
 </form>
 <pron>"hQspIt@l@</pron>
</form>

9.3.2 Grammatical Information

The gramGrp element groups grammatical information, such as part of speech, subcategorization information (e.g., syntactic patterns for verbs, count/mass distinctions for nouns), etc. It can contain any of the following elements:
  • pos (partie du discours) indique la partie du discours attribuée à une entrée (nom, verbe, adjectif, etc.)
  • subc (sous-catégorisation) contient des informations de sous-catégorie (transitif/intransitif, dénombrable/indénombrable, etc.)
  • colloc (collocation) contient une collocation de l'entrée.

In addition, gramGrp can contain any of the morphological elements defined in section 9.3.1 Information on Written and Spoken Forms for form. Elements conveying morphological information bear different interpretations within gramGrp and form groups, the difference being that in the form group, the morphological information specified pertains to the specific alternate form in question, while within gramGrp it applies to the headword form. For example, in the entry ‘pinna ('pIn@) n., pl. -nae (-ni:) or -nas’CED, the word defined can be either singular or plural; the ‘pl.’ specification applies only to the inflected forms provided. Compare this with ‘pants (paents) pl. n.’, where ‘pl.’ applies to the headword itself.

As noted above in section 9.3.1 Information on Written and Spoken Forms, the elements for morphological information are simply shorthand for the general purpose gram element. Consider this entry for the French word médire:

médire v.t. ind. (de) … PLC

This entry can be tagged using specialized grammatical elements:
<form>
 <orth>médire</orth>
</form>
<gramGrp>
 <pos>v</pos>
 <subc>t ind</subc>
 <colloc type="prep">de</colloc>
</gramGrp>
Or using the gram element:
<form>
 <orth>médire</orth>
</form>
<gramGrp>
 <gram type="pos">v</gram>
 <gram type="subc">t ind</gram>
 <gram type="collocPrep">de</gram>
</gramGrp>
Like form, gramGrp can be repeated, recursively nested, or used at the sense level to show relations among elements.

isotope adj. et n. m. … DNT

<form>
 <orth>isotope</orth>
</form>
<gramGrp>
 <pos>adj</pos>
</gramGrp>
<gramGrp>
 <pos>n</pos>
 <gen>m</gen>
</gramGrp>

wits (wIts) pl. n. 1. (sometimes sing.) the ability to reason and act, esp. quickly … CED

<entry>
 <form>
  <orth>wits</orth>
  <pron>wIts</pron>
 </form>
 <gramGrp>
  <number>pl</number>
  <pos>n</pos>
 </gramGrp>
 <sense n="1">
  <gramGrp>
   <number>sometimes sing.</number>
  </gramGrp>
  <def>the ability to reason and act, esp. quickly …</def>
 </sense>
</entry>

9.3.3 Sense Information

Dictionaries may describe the meanings of words in a wide variety of different ways — by means of synonyms, paraphrases, translations into other languages, formal definitions in various highly stylized forms, etc. No attempt is made here to distinguish all the different forms which sense information may take; all of them may be tagged using the def element described in section 9.3.3.1 Definitions.

As a special case it is frequently desirable to distinguish the provision of translation equivalents in other languages from other forms of sense information; the use of <cit type="translation"> (which groups a translation equivalent with related information such as its grammatical description) for this purpose is described in section 9.3.3.2 Translation Equivalents.

9.3.3.1 Definitions

Dictionary definitions are those pieces of prose in a dictionary entry that describe the meaning of some lexical item. Most often, definitions describe the headword of the entry; in some cases, they describe translated texts, examples, etc.; see <cit type="translation">, section 9.3.3.2 Translation Equivalents, and <cit type="example">, section 9.3.5.1 Examples. The def element directly contains the text of the definition; unlike form and gramGrp, it does not serve solely to group a set of smaller elements. The close analysis of definition text, such as the tagging of hypernyms, typical objects, etc., is not covered by these Guidelines.

Definitions may occur directly within an entry; when multiple definitions are given, they are typically identified as belonging to distinct senses, as here:

demigod (…) n. 1.a. a being who is part mortal, part god. b. a lesser deity. 2. a godlike person. CP

<entry>
 <form>
  <orth>demigod</orth>
  <pron></pron>
 </form>
 <gramGrp>
  <pos>n</pos>
 </gramGrp>
 <sense n="1">
  <sense n="a">
   <def>a being who is part mortal, part god.</def>
  </sense>
  <sense n="b">
   <def>a lesser deity.</def>
  </sense>
 </sense>
 <sense n="2">
  <def>a godlike person.</def>
 </sense>
</entry>
In multilingual dictionaries, it is sometimes possible to distinguish translation equivalents from definitions proper; here a def element is distinguished from the translation information within which it appears.

rémoulade[Remulad] nf remoulade, rémoulade (dressing containing mustard and herbs). CR

<entry>
 <form>
  <orth>rémoulade</orth>
  <pron>Remulad</pron>
 </form>
 <gramGrp>
  <pos>n</pos>
  <gen>f</gen>
 </gramGrp>
 <cit type="translationxml:lang="en">
  <quote>remoulade</quote>
  <quote>rémoulade</quote>
  <def>dressing containing mustard and herbs</def>
 </cit>
</entry>
9.3.3.2 Translation Equivalents

Multilingual dictionaries contain information about translations of a given word in some source language for one or more target languages. Minimally, the dictionary provides the corresponding translation in the target language; other material, such as morphological information (gender, case), various kinds of usage restrictions, etc., may also be given. If translation equivalents are to be distinguished from other kinds of sense information, they may be encoded using <cit type="translation">. The global xml:lang attribute should be used to specify the target language.

As in monolingual dictionaries, the sense element is used in multilingual dictionaries to group information (forms, grammatical information, usage, translation(s), etc.) about a given sense of a word where necessary. Information about the individual translation equivalents within a sense is grouped using <cit type="translation">. This information may include the translation text (tagged q or quote), morphological information (gen, case, etc.), usage notes (usg), translation labels (lbl), and definitions (def).When bibliographic data is provided, the quote element should be used.
  • cit (citation) citation provenant d'un autre document avec la référence bibliographique de sa source
  • lbl (étiquette) étiquette pour la forme d’un mot, pour un exemple, pour une traduction, ou pour tout autre type d’information, par exemple "abréviation pour", "contraction de", "littéralement", "approximativement", "synonymes", etc.
Note how in the following example, different translation equivalents are grouped into the same or different senses, following the punctuation of the source and the usage labels:

dresser … (a) (Theat) habilleur m, -euse f; (Comm: window ~) étalagiste mf. she's a stylish ~ elle s'habille avec chic; V hair. (b) (tool) (for wood) raboteuse f; (for stone) rabotin m. CR

<entry n="1">
 <form>
  <orth>dresser</orth>
 </form>
 <sense n="a">
  <sense>
   <usg type="dom">Theat</usg>
   <cit type="translationxml:lang="fr">
    <quote>habilleur</quote>
    <gen>m</gen>
   </cit>
   <cit type="translationxml:lang="fr">
    <quote>-euse</quote>
    <gen>f</gen>
   </cit>
  </sense>
  <sense>
   <usg type="dom">Comm</usg>
   <form type="compound">
    <orth>window <oRef/>
    </orth>
   </form>
   <cit type="translationxml:lang="fr">
    <quote>étalagiste</quote>
    <gen>mf</gen>
   </cit>
  </sense>
  <cit type="example">
   <quote>she's a stylish <oRef/>
   </quote>
   <cit type="translationxml:lang="fr">
    <quote>elle s'habille avec chic</quote>
   </cit>
  </cit>
  <xr type="see">V. <ref target="#hair">hair</ref>
  </xr>
 </sense>
 <sense n="b">
  <usg type="category">tool</usg>
  <sense>
   <usg type="hint">for wood</usg>
   <cit type="translationxml:lang="fr">
    <quote>raboteuse</quote>
    <gen>f</gen>
   </cit>
  </sense>
  <sense>
   <usg type="hint">for stone</usg>
   <cit type="translationxml:lang="fr">
    <quote>rabotin</quote>
    <gen>m</gen>
   </cit>
  </sense>
 </sense>
</entry>
<!-- ... -->
<entry xml:id="hair">
<!-- ... -->
</entry>
In the following example, a distinction is made between the translation equivalent (‘OAS’) and a descriptive phrase providing further information for the user of the dictionary.

O.A.S. ... nf (abrév de Organisation de l'Armée secrète) OAS (illegal military organization supporting French rule of Algeria). CR

<entry>
 <cit type="translationxml:lang="en">
  <quote>OAS</quote>
  <def>illegal military organization supporting French rule of
     Algeria</def>
 </cit>
</entry>
Note that <cit type="translation"> may also be used in monolingual dictionaries when a translation is given for a foreign word:

havdalah or havdoloh Hebrew. (Hebrew hAvdA"lA; Yiddish hAv"dOl@) n. Judaism. the ceremony marking the end of the sabbath or of a festival, including the blessings over wine, candles and spices. [literally: separation] CED

<entry type="foreign">
 <form>
  <orth>havdalah</orth>
  <orth>havdoloh</orth>
 </form>
 <usg type="dom">Judaism</usg>
 <def>the ceremony marking the end of the sabbath or of a festival,
   including the blessings over wine, candles and spices.</def>
 <cit type="translationxml:lang="en">
  <note>literally</note>
  <quote>separation</quote>
 </cit>
</entry>

9.3.4 Etymological Information

The element etym marks a block of etymological information. Etymologies may contain highly structured lists of words in an order indicating their descent from each other, but often also include related words and forms outside the direct line of descent, for comparison. Not infrequently, etymologies include commentary of various sorts, and can grow into short (or long!) essays with prose-like structure. This variation in structure makes it impracticable to define tags which capture the entire intellectual structure of the etymology or record the precise interrelation of all the words mentioned. It is, however, feasible to mark some of the more obvious phrase-level elements frequently found in etymologies, using tags defined in the core module or elsewhere in this chapter. Of particular relevance for the markup of etymologies are:
  • etym (étymologie) dans une entrée, contient les informations étymologique
  • lang (nom de la langue) nom de la langue mentionnée dans une section relative à l'étymologie ou à un autre point linguistique
  • date (date) contient une date dans n'importe quel format
  • mentioned (mentionné) marque des mots ou des expressions mentionnés mais non employés
  • gloss (glose) identifie une expression ou un mot utilisé pour fournir une glose ou une définition à quelque autre mot ou expression
  • pron (prononciation) contient la/les prononciation(s) du mot
  • usg (usage) usagecontient les informations sur l'usage dans une entrée de dictionnaire
  • lbl (étiquette) étiquette pour la forme d’un mot, pour un exemple, pour une traduction, ou pour tout autre type d’information, par exemple "abréviation pour", "contraction de", "littéralement", "approximativement", "synonymes", etc.

As in other prose, individual word forms mentioned in an etymological description are tagged with mentioned elements. Pronunciations, usage labels, and glosses can be tagged using the pron, usg, and gloss elements defined elsewhere in these Guidelines. In addition, the lang element may be used to identify a particular language name where it appears, in addition to using the xml:lang attribute of the mentioned element.

Examples:

abismo m. (del gr. a priv. y byssos, fondo). Sima, gran profundidad. …

<entry>
 <form>
  <orth>abismo</orth>
 </form>
 <etym>del <lang>gr.</lang>
  <mentioned>a</mentioned> priv. y <mentioned>byssos</mentioned>,
 <gloss>fondo</gloss>
 </etym>
</entry>

neume\'n(y)üm\ n [F, fr. ML pneuma, neuma, fr. Gk pneuma breath — more at pneumatic]: any of various symbols used in the notation of Gregorian chant … [WNC]

<entry>
 <etym>
  <lang>F</lang> fr. <lang>ML</lang>
  <mentioned>pneuma</mentioned>
  <mentioned>neuma</mentioned> fr. <lang>Gk</lang>
  <mentioned>pneuma</mentioned>
  <gloss>breath</gloss>
  <xr type="etym">more at <ptr target="#pneumatic"/>
  </xr>
 </etym>
 <def>any of various symbols … </def>
</entry>
<!-- ... -->
<entry xml:id="pneumatic">
<!-- ... -->
</entry>

9.3.5 Other Information

9.3.5.1 Examples

Dictionaries typically include examples of word use, usually accompanying definitions or translations. In some cases, the examples are quotations from another source, and are occasionally followed by a citation to the author.

The <cit type="example"> element contains usage examples and associated information; the example text itself should be enclosed in a q or quote element. The cit element associates a quotation with a bibliographic reference to its source.
  • q (citation de discours, de pensée ou d'écrit) contient du texte marqué comme étant une citation venant (manifestement) d'ailleurs ; en narration cet élément est utilisé pour marquer le discours direct ou indirect ; dans les dictionnaires il peut être utilisé pour marquer des exemples réels ou fictifs d'usage et, dans la description des manuscrits ou dans d'autres types de métadonnées, pour marquer les citations des extraits de la source décrite
  • quote (citation) contient une expression ou un passage attribué par le narrateur ou l'auteur à un agent extérieur au texte
  • cit (citation) citation provenant d'un autre document avec la référence bibliographique de sa source

Examples frequently abbreviate the headword, and so their transcription will frequently make use of the oRef or oVar elements described below in section 9.4 Headword and Pronunciation References.

Examples:

multiplex/…/ adj tech having many parts: the multiplex eye of the fly. LDOCE

<quote>the multiplex eye of the fly.</quote>
Or when one wants a more comprehensive representation of examples:
<cit type="example">
 <quote>the multiplex eye of the fly.</quote>
</cit>
As the following example shows, cit can also contain elements such as pron, def, etc.

some … 4. (S~ and any are used with more): Give me ~ more/s@'mO:(r)/OALD

<sense n="4">
 <usg type="colloc">
  <oRef type="cap"/> and <mentioned>any</