provides a set of attributes concerning linguistic features of tokens, for usage within token-level elements, specifically w and pc in the analysis module. provides a lemma (base form) for the word, typically uninflected and serving both as an identifier (e.g. in dictionary contexts, as a headword), and as a basis for potential inflections. 단어의 레마(사전의 표제 형식)를 명시한다. 指出在字典中該字的詞條形式。当該語の、辞書の見出し形を示す。 fournit le lemme du mot (entrée du dictionnaire). identifica el lema de una palabra (forma en que se encuentra como entrada en un diccionario). identifica il lemma (la voce di un dizionario) wives Artzeneyen hitting provides a pointer to a definition of the lemma for the word, for example in an online lexicon. hitting nager (part of speech) indicates the part of speech assigned to a token (i.e. information on whether it is a noun, adjective, or verb), usually according to some official reference vocabulary (e.g. for German: STTS, for English: CLAWS, for Polish: NKJP, etc.).

The German sentence Wir fahren in den Urlaub. tagged with the Stuttgart-Tuebingen-Tagset (STTS).

~~Wir fahren in den Urlaub .~~

The English sentence We're going to Brazil. tagged with the CLAWS-5 tagset, arranged inline (with significant whitespace).

We're going to Brazil.

The English sentence We're going on vacation to Brazil for a month! tagged with the CLAWS-7 tagset and arranged sequentially.

We 're going on vacation to Brazil for a month !

(morphosyntactic description) supplies morphosyntactic information for a token, usually according to some official reference vocabulary (e.g. for German: STTS-large tagset; for a feature description system designed as (pragmatically) universal, see Universal Features). Wir fahren in den Urlaub . when present, provides information on whether the token in question is adjacent to another, and if so, on which side. the token is not adjacent to another there is no whitespace on the left side of the token there is no whitespace on the right side of the token there is no whitespace on either side of the token the token overlaps with another; other devices (specifying the extent and the area of overlap) are needed to more precisely locate this token in the character stream

The example below assumes that the lack of whitespace is marked redundantly, by using the appropriate values of join.

~~" Friends will be friends . "~~

Note that a project may make a decision to only indicate lack of whitespace in one direction, or do that non-redundantly. The existing proposal is the broadest possible, on the assumption that we adopt the "streamable view", where all the information on the current element needs to be represented locally.

The English sentence We're going on vacation. tagged with the CLAWS-5 tagset, arranged sequentially, tagged on the assumption that only the lack of the preceding whitespace is indicated.

We 're going on vacation .

The definition of this attribute is adapted from ISO MAF (Morpho-syntactic Annotation Framework), ISO 24611:2012.

These attributes make it possible to encode simple language corpora and to add a layer of linguistic information to any tokenized resource. See section for discussion.

These guidelines provide no semantic basis or suggested precedence when both lemma and lemmaRef are provided. For this reason simultaneous use of both is not recommended for interchange unless documentation explaining the use is provided, probably in an ODD customization.