Canonical References
Contents
Introduction
A canonical reference is a means, specific to a community or corpus, of pointing into documents. For example, biblical scholars might understand ‘Matt 5:7’ to mean ‘the book called Matt, chapter 5, verse 7.’ They might then wish to translate the string ‘Matt 5:7’ into a pointer into a TEI-encoded document, selecting the element which corresponds to the seventh <div> element within the fifth <div> element within the <div> element with the n attribute valued ‘Matt.’
Algorithm for extracting and referencing targets
The application first follows the URI in the decls attribute, which points to a <refsDecl> element in the local document or a remote document. Within that declaration (see above for the corresponding example declaration), it refers to the list of <fragmentPattern> s, and for each pattern, applies the regular expression to the reference ‘Matt 5:7’. If the first regular expression matches, it applies the matched substrings (in this case, ‘Matt’, ‘5’, and ‘7’) to the string in the pat attribute of that <fragmentPattern> element, substituting the first matched substring for $1, the second for $2, and so on, to produce an Fragment Identifier. It then takes that Fragment-ID and appends it (with an intervening #) to each of the URIs specified by the <ptr> elements that precede the <fragmentPattern> elements to generate a URI Reference. If the regular expression in the first <fragmentPattern> element does not match, the regular expression in the second <fragmentPattern> element is tried, and so on.
Worked examples
Specifically, in this case, the application would first apply the regular expression (.+) (.+):(.+) to ‘Matt 5:7’. This regular expression would successfully match. The first matched substring would be ‘Matt’, the second ‘5’, and the third ‘7’. The application would then apply these substrings to the pattern xpath(//div[@n='$1']/div[$2]/div[$3]), producing xpath(//div[@n='Matt']/div[5]/div[7]). It would append this to xml:base in force, thus generating the URI Reference http://www.jph.org/resources/books/Bible.xml#xpath(//div[@n='Matt']/div[5]/div[7]).
If, however, the input string had been ‘Matt 5’, the first regular expression would not have matched. The application would have then tried the second, (.+) (.+), producing a successful match, and the matched substrings ‘Matt’ and ‘5’. It would have then proceeded to produce the URI Reference http://www.jph.org/resources/books/Bible.xml#xpath(//div[@n='Matt']/div[5]).
If the input string had been ‘Matt’, neither the first nor the second regular expressions would have successfully matched. The application would have then tried the third, (.+), producing the matched substring ‘Matt’, and the URI Reference http://www.jph.org/resources/books/Bible.xml#xpath(//div[@n='Matt']).
It is quite reasonable to believe that encoders would actually prefer much more precise regular expressions than those used as examples above. E.g., ^\s*([1-9]?[A-Z][a-z]+)\s+([1-9][0-9]?[0-9]?):([1-9][0-9]?)\s*$.

