18 Feature Structures
Table of contents
- 18.1 Organization of this Chapter
- 18.2 Elementary Feature Structures and the Binary Feature Value
- 18.3 Other Atomic Feature Values
- 18.4 Feature and Feature-Value Libraries
- 18.5 Feature Structures as Complex Feature Values
- 18.6 Re-entrant Feature Structures
- 18.7 Collections as Complex Feature Values
- 18.8 Feature Value Expressions
- 18.9 Default Values
- 18.10 Linking Text and Analysis
- 18.11 Feature System Declaration
- 18.12 Formal Definition and Implementation
A feature structure is a general purpose data structure which identifies and groups together individual features, each of which associates a name with one or more values. Because of the generality of feature structures, they can be used to represent many different kinds of information, but they are of particular usefulness in the representation of linguistic analyses, especially where such analyses are partial, or underspecified. Feature structures represent the interrelations among various pieces of information, and their instantiation in markup provides a metalanguage for the generic representation of analyses and interpretations. Moreover, this instantiation allows feature values to be of specific types, and for restrictions to be placed on the values for particular features, by means of feature system declarations.69
18.1 Organization of this ChapterTEI: Organization of this Chapter¶
This chapter is organized as follows. Following this introduction, section 18.2 Elementary Feature Structures and the Binary Feature Value introduces the elements fs and f, used to represent feature structures and features respectively, together with the elementary binary feature value. Section 18.3 Other Atomic Feature Values introduces elements for representing other kinds of atomic feature values such as symbolic, numeric, and string values. Section 18.4 Feature and Feature-Value Libraries introduces the notion of predefined libraries or groups of features or feature values along with methods for referencing their components. Section 18.5 Feature Structures as Complex Feature Values introduces complex values, in particular feature-structures as values, thus enabling feature structures to be recursively defined. Section 18.7 Collections as Complex Feature Values discusses other complex values, in particular values which are collections, organized as sets, bags, and lists. Section 18.8 Feature Value Expressions discusses how the operations of alternation, negation, and collection of feature values may be represented. Section 18.9 Default Values discusses ways of representing underspecified, default, or uncertain values. Section 18.10 Linking Text and Analysis discusses how analyses may be linked to other parts of an encoded text. Section 18.11 Feature System Declaration describes the feature system declaration, a construct which provides for the validation of typed feature structures. Formal definitions for all the elements introduced in this chapter are provided in section 18.12 Formal Definition and Implementation.
18.2 Elementary Feature Structures and the Binary Feature ValueTEI: Elementary Feature Structures and the Binary Feature Value¶
The fundamental elements used to represent a feature structure analysis are f (for feature), which represents a feature-value pair, and fs (for feature structure), which represents a structure made up of such feature-value pairs. The fs element has an optional type attribute which may be used to represent typed feature structures, and may contain any number of f elements. An f element has a required name attribute and an associated value. The value may be simple: that is, a single binary, numeric, symbolic (i.e. taken from a restricted set of legal values), or string value, or a collection of such values, organized in various ways, for example, as a list; or it may be complex, that is, it may itself be a feature structure, thus providing a degree of recursion. Values may be under-specified or defaulted in various ways. These possibilities are all described in more detail in this and the following sections.
Feature and feature-value representations (including feature structure representations) may be embedded directly at any point in an XML document, or they may be collected together in special-purpose feature or feature-value libraries. The components of such libraries may then be referenced from other feature or feature-value representations, using the feats or fVal attribute as appropriate.
- fs (feature structure) represents a feature structure, that is, a
collection of feature-value pairs organized as a
structural unit.
type specifies the type of the feature structure. feats (features) references the feature-value specifications making up this feature structure. - f (feature) represents a feature value specification, that
is, the association of a name with a value of any of several different types.
name provides a name for the feature. fVal (feature value) references any element which can be used to represent the value of a feature. - binary/ (binary value) represents the value part of a feature-value specification which can contain either of exactly two possible values.
| consonantal + |
| vocalic - |
| voiced - |
| anterior + |
| coronal + |
| continuant + |
| strident + |
+--- ---+
<f name="consonantal">
<binary value="true"/>
</f>
<f name="vocalic">
<binary value="false"/>
</f>
<f name="voiced">
<binary value="false"/>
</f>
<f name="anterior">
<binary value="true"/>
</f>
<f name="coronal">
<binary value="true"/>
</f>
<f name="continuant">
<binary value="true"/>
</f>
<f name="strident">
<binary value="true"/>
</f>
</fs>
The restriction of specific features to specific types of values (e.g. the restriction of the feature strident to a binary value) requires additional validation, as does any restriction on the features available within a feature structure of a particular type (e.g. whether a feature structure of type phonological segment necessarily contains a feature voiced). Such validation may be carried out at the document level, using special purpose processing, at the schema level using additional validation rules, or at the declarative level, using an additional mechanism such as the feature-system declaration discussed in 18.11 Feature System Declaration.
Although we have used the term binary for this kind of value, and its representation in XML uses values such as true and false (or, equivalently, 1 and 0), it should be noted that such values are not restricted to propositional assertions. As this example shows, this kind of value is intended for use with any binary-valued feature.
18.3 Other Atomic Feature ValuesTEI: Other Atomic Feature Values¶
- symbol/ (symbolic value) represents the value part of a feature-value specification
which contains one of a finite list of symbols.
value supplies the symbolic value for the feature, one of a finite list that may be specified in a feature declaration. - numeric/ (numeric value) represents the value part of a feature-value specification which contains a numeric value or range.
- string (string value) represents the value part of a feature-value specification which contains a string.
<f name="case">
<symbol value="accusative"/>
</f>
<f name="gender">
<symbol value="feminine"/>
</f>
<f name="number">
<symbol value="plural"/>
</f>
</fs>
<f name="case">
<symbol value="accusative"/>
</f>
<f name="gender">
<symbol value="feminine"/>
</f>
<f name="singular">
<binary value="false"/>
</f>
</fs>
<f name="address">
<string>3418 East Third Street</string>
</f>
</fs>
<f name="houseNumber">
<numeric value="3418"/>
</f>
<f name="streetName">
<string>East Third Street</string>
</f>
</fs>
<f name="houseNumber">
<numeric value="3418" max="3440"/>
</f>
<f name="streetName">
<string>East Third Street</string>
</f>
</fs>
<f name="dailyRainFall">
<numeric value="0.0" max="1.3" trunc="false"/>
</f>
</fs>
<f name="dailyRainFall">
<numeric value="0.0" max="1.3" trunc="true"/>
</f>
</fs>
As noted above, additional processing is necessary to ensure that appropriate values are supplied for particular features, for example to ensure that the feature singular is not given a value such as <symbol value="feminine"/>. There are two ways of attempting to ensure that only certain combinations of feature names and values are used. First, if the total number of legal combinations is relatively small, one can predefine all of them in a construct known as a feature library, and then reference the combination required using the feats attribute in the enclosing fs element, rather than give it explicitly. This method is suitable in the situation described above, since it requires specifying a total of only ten (5 + 3 + 2) combinations of features and values. Similarly, to ensure that only feature structures containing valid combinations of feature values are used, one can put definitions for all valid feature structures inside a feature value library (so called, since a feature structure may be the value of a feature). A total of 30 feature structures (5 × 3 × 2) is required to enumerate all the possible combinations of individual case, gender and number values in the preceding illustration. We discuss the use of such libraries and their representation in XML further in section 18.4 Feature and Feature-Value Libraries below.
However, the most general method of attempting to ensure that only legal combinations of feature names and values are used is to provide a feature-system declaration discussed in 18.11 Feature System Declaration.
18.4 Feature and Feature-Value LibrariesTEI: Feature and Feature-Value Libraries¶
<f xml:id="CNS1" name="consonantal">
<binary value="true"/>
</f>
<f xml:id="CNS0" name="consonantal">
<binary value="false"/>
</f>
<f xml:id="VOC1" name="vocalic">
<binary value="true"/>
</f>
<f xml:id="VOC0" name="vocalic">
<binary value="false"/>
</f>
<f xml:id="VOI1" name="voiced">
<binary value="true"/>
</f>
<f xml:id="VOI0" name="voiced">
<binary value="false"/>
</f>
<f xml:id="ANT1" name="anterior">
<binary value="true"/>
</f>
<f xml:id="ANT0" name="anterior">
<binary value="false"/>
</f>
<f xml:id="COR1" name="coronal">
<binary value="true"/>
</f>
<f xml:id="COR0" name="coronal">
<binary value="false"/>
</f>
<f xml:id="CNT1" name="continuant">
<binary value="true"/>
</f>
<f xml:id="CNT0" name="continuant">
<binary value="false"/>
</f>
<f xml:id="STR1" name="strident">
<binary value="true"/>
</f>
<f xml:id="STR0" name="strident">
<binary value="false"/>
</f>
<!-- ... -->
</fLib>
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT0 #STR0"/>
<fs
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT0 #STR0"/>
<fs
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT1 #STR1"/>
<fs
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT1 #STR1"/>
<!-- ... -->
<fs
xml:id="T.DF"
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT0 #STR0"/>
<fs
xml:id="D.DF"
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT0 #STR0"/>
<fs
xml:id="S.DF"
feats="#CNS1 #VOC0 #VOI0 #ANT1 #COR1 #CNT1 #STR1"/>
<fs
xml:id="Z.DF"
feats="#CNS1 #VOC0 #VOI1 #ANT1 #COR1 #CNT1 #STR1"/>
<!-- ... -->
</fvLib>
Feature structures stored in this way may also be associated with the text which they are intended to annotate, either by a link from the text (for example, using the TEI global ana attribute), or by means of standoff annotation techniques (for example, using the TEI link element): see further section 18.10 Linking Text and Analysis below.
Note that when features or feature structures are linked to in this way, the result is effectively a copy of the item linked to into the place from which it is linked. This form of linking should be distinguished from the phenomenon of structure-sharing, where it is desired to indicate that some part of an annotation structure appears simultaneously in two or more places within the structure. This kind of annotation should be represented using the vLabel element, as discussed in 18.6 Re-entrant Feature Structures below.
18.5 Feature Structures as Complex Feature ValuesTEI: Feature Structures as Complex Feature Values¶
Features may have complex values as well as atomic ones; the simplest such complex value is represented by supplying a fs element as the content of an f element, or (equivalently) by supplying the identifier of an fs element as the value for the fVal attribute on the f element. Structures may be nested as deeply as appropriate, using this mechanism. For example, an fs element may contain or point to an f element, which may contain or point to an fs element, which may contain or point to an f element, and so on.
<f name="surface">
<string>love</string>
</f>
<f name="syntax">
<fs type="category">
<f name="pos">
<symbol value="verb"/>
</f>
<f name="val">
<symbol value="transitive"/>
</f>
</fs>
</f>
<f name="semantics">
<fs type="act">
<f name="rel">
<symbol value="LOVE"/>
</f>
</fs>
</f>
</fs>
<!-- ... -->
<fs xml:id="N" type="noun">
<!-- noun features defined here -->
</fs>
<fs xml:id="V" type="verb">
<!-- verb features defined here -->
</fs>
</fvLib>
<fs xml:id="ADJ" type="adjective" feats="#F1 #F2"/>
<fs xml:id="PREP" type="preposition" feats="#F1 #F3"/>
<!-- ... -->
<f xml:id="NN-1" name="nominal">
<binary value="true"/>
</f>
<f xml:id="NN-0" name="nominal">
<binary value="false"/>
</f>
<f xml:id="VV-1" name="verbal">
<binary value="true"/>
</f>
<f xml:id="VV-0" name="verbal">
<binary value="false"/>
</f>
<!-- ... -->
</fLib>
<f name="surface">
<string>love</string>
</f>
<f name="syntax">
<fs type="category">
<f name="pos" fVal="#V"/>
<f name="val" fVal="#TRNS"/>
</fs>
</f>
<f name="semantics">
<fs type="act">
<f name="rel" fVal="#LOVE"/>
</fs>
</f>
</fs>
Although in principle the fVal attribute could point to any kind of feature value, its use is not recommended for simple atomic values.
18.6 Re-entrant Feature StructuresTEI: Re-entrant Feature Structures¶
- vLabel (value label) represents the value part of a feature-value specification which appears at more than one point in a feature structure.
<f name="nominal">
<fs>
<f name="nm-num">
<vLabel name="L1">
<symbol value="singular"/>
</vLabel>
</f>
<!-- other nominal features -->
</fs>
</f>
<f name="verbal">
<fs>
<f name="vb-num">
<vLabel name="L1"/>
</f>
</fs>
<!-- other verbal features -->
</f>
</fs>
In the above encoding, the features named vb-num and nm-num exhibit structure sharing. Their values, given as vLabel elements, are understood to be references to the same point in the feature structure, which is labelled by their name attribute.
18.7 Collections as Complex Feature ValuesTEI: Collections as Complex Feature Values¶
- vColl (collection of values) represents the value part of a feature-value specification which contains multiple values organized as a set, bag, or list.
A feature whose value is regarded as a set, bag, or list may have any positive number of values as its content, or none at all, (thus allowing for representation of the empty set, bag, or list). The items in a list are ordered, and need not be distinct. The items in a set are not ordered, and must be distinct. The items in a bag are neither ordered nor distinct. Sets and bags are thus distinguished from lists in that the order in which the values are specified does not matter for the former, but does matter for the latter, while sets are distinguished from bags and lists in that repetitions of values do not count for the former but do count for the latter.
If no value is specified for the org attribute, the assumption is that the vColl defines a list of values. If the vColl element is empty, the assumption is that it represents the null list, set, or bag.
<f name="forenames">
<vColl>
<string>Daniel</string>
<string>Edouard</string>
</vColl>
</f>
<f name="mother" fVal="#p002"/>
<f name="father" fVal="#p009"/>
<f name="birthDate">
<fs type="date" feats="#y1988 #m04 #d17"/>
</f>
<f name="birthPlace" fVal="#austintx"/>
<f name="siblings">
<vColl org="set">
<fs copyOf="#pnb005"/>
<fs copyOf="#prb001"/>
</vColl>
</f>
</fs>
In this example, the vColl element is first used to supply a list of ‘name’ feature values, which together constitute the ‘forenames’ feature. Other features are defined by reference to values which we assume are held in some external feature value library (not shown here). For example, the vColl element is used a second time to indicate that the persons's siblings should be regarded as constituting a set rather than a list. Each sibling is represented by a feature structure: in this example, each feature structure is a copy of one specified in the feature value library.
<f name="category">
<symbol value="verb"/>
</f>
<f name="tense">
<symbol value="present"/>
</f>
<f name="agreement">
<fs>
<f name="person">
<symbol value="third"/>
</f>
<f name="number">
<symbol value="singular"/>
</f>
</fs>
</f>
</fs>
<f name="category">
<symbol value="verb"/>
</f>
<f name="tense">
<symbol value="present"/>
</f>
<f name="agreement">
<vColl org="set">
<symbol value="third"/>
<symbol value="singular"/>
</vColl>
</f>
</fs>
<f name="lex">
<symbol value="auxquels"/>
</f>
<f name="maf">
<vColl org="list">
<fs>
<f name="cat">
<symbol value="prep"/>
</f>
</fs>
<fs>
<f name="cat">
<symbol value="pronoun"/>
</f>
<f name="kind">
<symbol value="rel"/>
</f>
<f name="num">
<symbol value="pl"/>
</f>
<f name="gender">
<symbol value="masc"/>
</f>
</fs>
</vColl>
</f>
</fs>
The set, bag, or list which has no members is known as the null (or empty) set, bag, or list. A vColl element with no content and with no value for its feats attribute is interpreted as referring to the null set, bag, or list, depending on the value of its org attribute.
<vColl org="set"/>
</f>
A vColl element may also collect together one or more other vColl elements, if, for example one of the members of a set is itself a set, or if two lists are concatenated together. Note that such collections pay no attention to the contents of the nested vColl elements: if it is desired to produce the union of two sets, the vMerge element discussed below should be used to make a new collection from the two sets.
18.8 Feature Value ExpressionsTEI: Feature Value Expressions¶
- vAlt (value alternation) represents the value part of a feature-value specification which contains a set of values, only one of which can be valid.
- vNot (value negation) represents a feature value which is the negation of its content.
- vMerge (merged collection of values) represents a feature value which is the result of merging together the feature values contained by its children, using the organization specified by the org attribute.
18.8.1 AlternationTEI: Alternation¶
<numeric value="2" max="3"/>
</f>
<vAlt>
<numeric value="2"/>
<numeric value="3"/>
</vAlt>
</f>
<vAlt>
<fs>
<f name="number.of.bathrooms">
<numeric value="2"/>
</f>
</fs>
<fs>
<f name="number.of.bedrooms">
<numeric value="2"/>
</f>
</fs>
</vAlt>
</f>
<vAlt>
<fs>
<f name="number.of.bathrooms">
<numeric value="2"/>
</f>
</fs>
<fs>
<f name="number.of.bedrooms">
<numeric value="2"/>
</f>
</fs>
<vColl>
<fs>
<f name="number.of.bathrooms">
<numeric value="2"/>
</f>
</fs>
<fs>
<f name="number.of.bedrooms">
<numeric value="2"/>
</f>
</fs>
</vColl>
</vAlt>
</f>
<f name="selling.points">
<vColl org="set">
<string>alarm system</string>
<string>good view</string>
<vAlt>
<string>pool</string>
<string>jacuzzi</string>
</vAlt>
</vColl>
</f>
</fs>
<f name="selling.points">
<vColl org="set">
<vAlt>
<string>alarm system</string>
<string>good view</string>
</vAlt>
<vAlt>
<string>pool</string>
<string>jacuzzi</string>
</vAlt>
</vColl>
</f>
</fs>
If a large number of ambiguities or uncertainties need to be represented, involving a relatively small number of features and values, it is recommended that a stand-off technique, for example using the general-purpose alt element discussed in section 16.8 Alternation be used, rather than the special-purpose vAlt element.
18.8.2 NegationTEI: Negation¶
<vNot>
<numeric value="2"/>
</vNot>
</f>
<vNot>
<symbol value="genitive"/>
</vNot>
</f>
(ii)
<f name="case">
<vAlt>
<symbol value="nominative"/>
<symbol value="dative"/>
<symbol value="accusative"/>
</vAlt>
</f>
If however no such system declaration is available, all that one can say about a feature specified via negation is that its value is something other than the negated value.
Negation is always applied to a feature value, rather than to a feature-value pair. The negation of an atomic value is the set of all other values which are possible for the feature.
Any kind of value can be negated, including collections (represented by a vColl elements) or feature structures (represented by fs elements). The negation of any complex value is understood to be the set of values which cannot be unified with it. Thus, for example, the negation of the feature structure F is understood to be the set of feature structures which are not unifiable with F. In the absence of a constraint mechanism such as the Feature System Declaration, the negation of a collection is anything that is not unifiable with it, including collections of different types and atomic values. It will generally be more useful to require that the organization of the negated value be the same as that of the original value, for example that a negated set is understood to mean the set which is a complement of the set, but such a requirement cannot be enforced in the absence of a constraint mechanism.
18.8.3 Collection of ValuesTEI: Collection of Values¶
The vMerge element can be used wherever a feature value can appear. It contains two or more feature values, all of which are to be collected together. The organization of the resulting collection is specified by the value of the org attribute, which need not necessarily be the same as that of its constituent values if these are collections. For example, one can change a list to a set, or vice versa.
<f name="genders">
<vColl org="set">
<symbol value="masculine"/>
<symbol value="feminine"/>
</vColl>
</f>
</fs>
<f name="genders">
<vMerge org="list">
<vColl org="set">
<symbol value="masculine"/>
<symbol value="feminine"/>
</vColl>
<symbol value="neuter"/>
</vMerge>
</f>
</fs>
18.9 Default ValuesTEI: Default Values¶
- default/ (default feature value) represents the value part of a feature-value specification which contains a defaulted value.
<f name="gender">
<vAlt>
<symbol value="feminine"/>
<symbol value="masculine"/>
<symbol value="neuter"/>
</vAlt>
</f>
<default/>
</f>
<symbol value="neuter"/>
</f>
<vNot>
<default/>
</vNot>
</f>
<vAlt>
<symbol value="feminine"/>
<symbol value="masculine"/>
</vAlt>
</f>
18.10 Linking Text and AnalysisTEI: Linking Text and Analysis¶
<w ana="#at0">The</w>
<w ana="#ajs">closest</w>
<w ana="#pnp">he</w>
<w ana="#vvd">came</w>
<w ana="#prp">to</w>
<w ana="#nn1">exercise</w>
<w ana="#vbd">was</w>
<w ana="#to0">to</w>
<w ana="#vvi">open</w>
<w ana="#crd">one</w>
<w ana="#nn1">eye</w>
<phr ana="#av0">
<w>every</w>
<w>so</w>
<w>often</w>
</phr>
<c ana="#pun">,</c>
<w ana="#cjs">if</w>
<w ana="#pni">someone</w>
<w ana="#vvd">entered</w>
<w ana="#at0">the</w>
<w ana="#nn1">room</w>
<!-- ... -->
</s>
