Licensed under
No source: this is an original work
Wittgenstein himself published only one philosophical book:
Some of the previously published editions are selections from several different manuscripts, but their relationships to the manuscripts are not recorded in any detail. The editions are results of different editorial approaches to the manuscripts, some of them containing a lot of editorial intervention, others less. Most of them contain no critical apparatus or other detailed documentation of editorial decisions. Given this background, it is no wonder that for a long time there was a demand for a complete, text-critical edition of the entire
Like many other modern manuscripts, Wittgenstein's writings contain deletions, overwritings, interlinear insertions, marginal remarks and annotations, substitutions, counterpositions, shorthand abbreviations, as well as orthographic errors and slips of the pen.
A particular problem is posed by Wittgenstein's habit of combining interlinear insertion, marking, and often also deletion, to form alternative expressions. In some cases he has clearly decided in favour of a specific alternative, in others the decision has been left open. Moreover, Wittgenstein had his own peculiar editorial conventions such as an elaborate system of section marks, cross-outs, cross-references, marginal marks and lines, various distinctive types of underlining, and so on.
Many of these features are results of Wittgenstein's continuous efforts to revise and rearrange his writings. Some of the revisions consisted in copying or dictating parts of the text of one manuscript into another. The
In consideration both of the nature of the
On the one hand, the repetitive nature of the
On the other hand, a
The problems with a documentary edition, however, were considered acceptable if the edition was to be published in electronic form. First, the bulkiness of a documentary edition is easier to deal with in electronic form. Second, an electronic edition is more open-ended and flexible than a book edition.
It was therefore decided to go for an electronic, documentary edition. There was (and still is) no intention to publish this edition in book form. However it may clearly provide the
The facsimile simply consists of digital, high-quality colour images of each and every page of the
The diplomatic and normalised transcriptions are differentiated, not so much in terms of how much detail they convey, but rather in virtue of their textual perspectives.
The diplomatic version records faithfully not only every letter and word, but also details relating to the original appearance of the text. One might say it acknowledges that our understanding of the text derives in no small part from the visual appearance of material on the page. It reproduces features such as deleted words and letters, shorthand abbreviations, orthographic inconsistencies, rejected formulations, authorial instructions for the re-ordering of material, marginal comments, etc. It has been assumed that one of the principal uses of the diplomatic text will be as an aid to reading the facsimile.
The normalised version, on the other hand, presents the text in its thematic and semantic aspect. Orthography is corrected to a standard form, slips of the pen and deleted materials are suppressed, shorthand abbreviations are extended, and unequivocal instructions for the reordering of material are carried out. Variants have been merged to alternative readings, only one of which is always visible on screen, while the others may be displayed upon request. The result is a version which is easy to read and suitable for searching for words and phrases.
The three versions are linked, so that the reader can easily switch
between the three:
As already mentioned, the edition covers 20,000 pages, which are all presented in the three alternative formats just discussed. The 20,000 pages comprise altogether 3 mill words, and in order to interlink the three versions approximately 200 000 links have been created. In addition, there are a few thousand links representing Wittgenstein's own cross-references. The source transcriptions from which the edition has been derived (see below) contain approximately 2 million coded elements (not including entity references, or codes for special characters).
Even though partly based on work done prior to this project (Huitfeldt and Rossvær), it took almost ten years to complete. The Wittgenstein Archives at the University of Bergen spent altogether 40 man-years (including, in addition to text transcription and editing, also management, administration, systems development and maintenance and all other tasks related to the project.) This should give an average throughput of two pages per person per day, which must be considered high compared to most other editorial projects.
In the preparation of this edition, basic requirements guiding the work of the Wittgenstein Archives were the following: Transcriptions should provide a fully sufficient basis for the production of both (1)
It was considered of utmost importance that the edition should document manuscript details according to the highest possible standards of text-critical
Ideally, transcriptions should not only (though most importantly) be accurate and interpretationally sound, consistent and systematic, but also
These requirements often implied conflicting demands. In particular, implications of the requirement for a diplomatic version easily conflicted with the other requirements. Yet for a number of reasons, most notably the concern for secure and reliable data maintenance, it was decided that for any given manuscript we wanted one and only one source transcription to serve
Out of concern for the longevity of the work it was also considered imperative that the format of the source transcriptions should as far as possible be hardware- and software-independent. It was decided to use a declarative text encoding system, i.e. to mark textual features explicitly according to a formal syntax which would enable us to produce secondary versions which satisfied the demands set forth above by means of off-the-shelf or specially designed software.
At the time when the Wittgenstein Archives was established, Standard Generalized Markup Language (SGML) was the only serious international standard to be considered (SGML). However, it was decided not to use SGML for this project. Instead, a special code syntax was developed for the Wittgenstein Archives, and software which allowed for flexible conversion to other formats was developed. This system was called Multi-Element Code System (MECS) (Huitfeldt,
One of the reasons for not choosing SGML was that SGML had problems representing overlapping and other complex textual features. Another reason was that little relevant software for SGML existed, and there was little experience available from applying SGML in scholarly editorial work (The Text Encoding Initiative's Guidelines did not yet exist at the time.) Therefore, MECS was designed to overcome some of the problems with SGML and to provide software support for text-critical purposes beyond that provided by SGML at the time. In all other respects, MECS was kept as close to SGML as possible.
However, neither the reasons for the decision not to use SGML and to develop a special code system for the Wittgenstein Archives, nor the differences between SGML and MECS, are of any concern for the purposes of this discussion (but see Sperberg-McQueen and Huitfeldt,
The TEI Guidelines provides various alternative mechanisms for the encoding of many (or even most) textual phenomena. This is one of the strengths of the Guidelines, and one of the reasons why the TEI Guidelines are found applicable to a large number of widely different projects involved in text encoding. At the same time, this openness and flexibility poses a danger of inconsistency.
For example, abbreviations may be encoded in basically two different ways according to the TEI Guidelines. Take the German abbreviation 'dh', which normally stands for 'das heißt' ("id est", "that is"). 'dh' may be encoded either as follows:
(1) <abbr expan='das heißt'>dh</abbr>
or as follows:
(2) <expan abbr='dh'>das heißt</expan>
A stylesheet specifying that the content of an abbr element should
be replaced by its
There may indeed be a case for treating different instances of the same abbreviation differently (depending, for example, on context), but if the choice of representational form is left completely undecided by rules governing transcription and editing, the path to inconsistency is wide open. On the other hand, there may be a case for treating
For example, the German abbreviation 'dh' is a commonplace and may be regarded as a
In SGML element content can be marked up, but attribute values cannot. In other words, if a distinction is made between standard and non-standard abbreviations, and there is a need to mark up both, neither of them should be represented as attribute values.
One of the great advantages of text encoding systems like SGML is that they allow for automatic validation of document structure. What is checked, however, is the structure of the encoding, not the contents of text elements or attribute values of type text. In some cases there is a need to check the contents of standard and non-standard abbreviations separately. (For example, it may be desirable to check both against a list of standard abbreviations.) In such cases, both should be represented as element content, though of different element types.
The Wittgenstein Archives decided to make a distinction between standard and non-standard abbreviations, and to represent both as element content. (Had we used the TEI Guidelines, we might have used the abbr element for the former and the expan element for the latter.) The point of this discussion is not, however, to advocate the particular approach taken by the Wittgenstein Archives. The point is to illustrate that for virtually every textual phenomenon to be encoded, each project needs to reflect on issues like these in order to ensure the desired level of consistency.
Consider the following, hypothetical example from a manuscript source:
In order to comply with the Wittgenstein Archives' requirements for the diplomatic version, one has to account for the facts that "weiße" is inserted above the line, that "Schloß" is overstriked, and that "große" has been misspelt "grosse".
According to the requirements of the normalised version, one will not only have to account for the fact that "große" is the correct spelling of "grosse", but also to sort out the different possible readings of the text in question. Taken out of context, the example may seem intuitively to have at least the following possible readings:
In text-critical work, one will invariably rely on transcribers and editors to make their choice of possible readings using their best judgement based on thorough knowledge of the author, the history of the text, its historical and cultural context and other interpretationally relevant factors. Quite often, however, such considerations do not decide matters of details like these with any degree of certainty. In such situations, leaving the choice of readings entirely to the individual transcriber without further guidance is almost certain to lead to inconsistency.
According to the editorial principles employed at the Wittgenstein Archives, the example above has exactly two readings: (a) and (b) — these and no others, neither more nor less (unless interpretational considerations decide otherwise). We will not go into the details of the principles leading to this decision here.
One might say that the aim of a diplomatic representation is to get every
The following is a brief description of the criteria and procedures developed and employed at the Wittgenstein Archives. As mentioned, the Wittgenstein Archives did not use the TEI Guidelines, or indeed even SGML. However, the criteria and procedures discussed are entirely independent of such technicalities, and could equally well be adopted by e.g. TEI-based projects.
Let us start with the criteria for ensuring that a transcription is suited for diplomatic reproduction of the original text: It is not obvious what this means. According to some conventions a diplomatic reproduction retains an almost exact positioning of every text element in two-dimensional page space, faithfully reproduces differences between allographs of the same graphemes, and represents visual markings like strikeouts, underlinings etc. as close to the original in their visual appearance as possible.
The Wittgenstein Archives decided for a less strict definition: The diplomatic reproduction should reproduce the original grapheme by grapheme, contain indication of indentation and relative spatial positioning of text elements on the page, and include information about deletion and interlinear insertion and a number of different kinds of underlining. It was not considered necessary, however, to indicate every line break or allograph variation.
Consequently, the markup system contained markers for phenomena of the kinds mentioned, and the procedure followed by transcribers was simply to mark up every such phenomenon with the required element. By means of a style sheet the marked up features were reproduced according to certain conventions, and the correctness of transcriptions was checked by visually comparing the output with the original text.
The criteria for ensuring that a transcription is suited for normalised reproduction were less easy to formulate, and were dealt with by means of a much more elaborate and formal approach, to be described in the next two sections.
Consider the following hypothetical example, which is a bit more complicated then the previous one:
Again, an obvious requirement for a normalised reproduction is that orthographic errors are corrected. In this example one will have to mark up the misspelling 'Vather' so that it can be rendered correctly as 'Vater' in the normalised version. Admittedly, orthographic rules are not always clear, and texts are frequently written according to idiosyncratic or inconsistent orthographies. Further complication are that orthography variation is quite often a literary means of expression, and that orthography may in itself be an object of study. In electronic texts spelling affects not only readability, but also retrieveability. Therefore, standardisation is much more important in electronic than in traditional editions. While there may be a need to retain the original orthography, as in diplomatic transcription, there is also a need to standardise orthography to some set of uniform spelling rules.
But what about the variant readings? In the current example, 'ein' seems to have been substituted for 'eine', 'großes' for 'große' and 'Haus' for 'Hütte'. At least the following readings seem to be possible:
On the assumption that the inserted 'weißes' is intended to apply to both a) and b), and that the author simply forgot that the insertion of the adjective 'weißes' in a) would require the inflected form 'weiße', perhaps even this is a possible reading:
As with the previous example, some kind of guidance is needed as to whether all or only some of these are to count as possible readings. However, not Mein Vater hat ein sehr großes Hütte
is ungrammatical and not a possible reading. By mechanically selecting one out of every pair of substituenda, one can create a large number of obviously invalid readings. One needs to mark the text up in a way which does not include these.
We started out by making one basic decision: If among two transcriptions interpretational considerations do not decide clearly in favour of the one rather than the other, we would decide in favour of the one which came closest to an ideal of what we called a
A
Our next step was to define what we called an
An
There was one such procedure for each language present in the transcription. The procedures consisted in assigning functions such as inclusion, exclusion and case change to every element type.
Our final step was to define what we called
The
Thus, our basic requirement that transcriptions should be well-formed could be reformulated as the requirements that:
To avoid a possible misunderstanding: The set of betatexts derivable from one and the same transcription do not necessarily represent different interpretations of the manuscript in question. If the transcription derives more than one betatext, then that may just as well be regarded as a characteristic of the interpretation in question.
So far, we had formulated criteria which allowed us to identify some rather then others among possible transcriptions as acceptable. What remained to be done was to prescribe a procedure which helped the transcriber satisfy these criteria. To this end, we made use of a somewhat indirect strategy: We prescribed a procedure which, given the nature of the manuscripts we were dealing with, was in most cases almost certain to produce a transcription which would not satisfy the criteria. This basic procedure was:
The rule may seem too trivial to be of any interest whatsoever. It simply describes one of the most elementary features of the Western writing system. However, with manuscripts like ours, and if taken literally, this rule is almost guaranteed to produce transcriptions which are neither interpretationally acceptable nor well-formed.
Therefore, we also defined
Finally, we decided which kinds of deviations or modifications were allowed, and listed them in order of preference as follows:
Each of these deviations were in turn defined in terms of specific markup procedures. For example, 'exclusion' did not consist in leaving text out, but in marking it up with an element classified as a "beta-exclusion" or an "alpha-exclusion" code. We shall not go into further detail about these operations here. What is important to note, however, is that the deviations were given an order of priority. This meant that a lower level deviation could only be applied if no combination of higher-level deviations would suffice to satisfy T1 and T2. For example, simple substitution should only be used if rearrangement was not enough; reiterative substitution only if neither rearrangement, nor simple substitution, nor any combination of rearrangement and simple substitution would suffice; and so on.
A transcription of the example from the previous section according to transcription rules defined by our project generates the following alphatext:
mein Vater hat eine ein sehr große großes weißes Hütte Haus
and the following beta-texts:
Mein Vater hat eine sehr große Hütte
Mein Vater hat ein sehr großes weißes Haus
The alphatext satisfies T1, and the betatexts satisfy T2. It is worth noting, however, that while the procedure
One intriguing aspect of editing philosophical texts is that the editorial work itself exemplifies a number of classical philosophical problems, such as the relationships between representation and interpretation, the subjective and the objective.
Traditionally, it has been assumed that the responsibility of an editor is to provide an objectively correct representation of a text, and that as far as possible editors should avoid interpretation. The edited text is supposed to provide the
As can be gathered from the frequent references to "interpretational considerations" in the discussion above, the work of the Wittgenstein Archives was not based on such a view of interpretation. Even so, the very idea of a diplomatic transcription seems to presuppose the possibility of an objectively true and accurate representation of the original text. In particular, the whole motivation for some of the text encoding practices employed by the project was to ensure the accuracy of the diplomatic representation.
The TEI Guidelines, however, "define[s] markup ... as any means of making explicit an interpretation of a text"(TEI P3, 13). Interpretation, in turn, is described as "information which is felt to be non-obvious, contentious, or subject to disagreement"(TEI P3, 113).
We are used to think of representation and interpretation as a dichotomy. If text encoding is essentially interpretational, how can it possibly help in establishing accurate and correct representations of texts? In order to solve this problem we can take as our departure point what I believe to be a common-sense view on representation and interpretation:
On this background, I propose two steps to get us out of our problem. As a first step, let us imagine representation and interpretation as areas located towards opposite ends of a two-dimensional continuum. Then our task is to find somewhere along this continuum suitably clear demarcation lines, which allow us to decide, in particular cases and classes of cases, what is interpretation and what is representation.
By stating that something is a representation we are not excluding the possibility that it may, given some other demarcation line, legitimately be regarded as interpretive. Nor are we denying that representations and interpretations are, in some perspective, of the same kind. And we are not claiming that there are no difficult borderline cases. But we will clearly come to doubt the usefulness of a demarcation line placed at one extreme of the continuum.
At this point, we can observe that what was called representation above may be said to consist in the identification of the meaning of a text (reading, listening, deciphering). Methods for establishing such representations differ from mechanical methods of representation (such as bit maps, OCR etc.) in that they involve human symbol recognition and understanding, and are therefore sometimes felt to be less objective and reliable. This is probably why such activities are sometimes called interpretive. In accordance with the proposed view, however, we may safely position our demarcation line so that we regard them as matters of representation.
The second step on the way out of our difficulty is this: We should construe it not as a problem about the nature of text encoding, but as a question about the potential capacities of text encoding to create certain kinds of texts, namely representations and interpretations. This move is motivated by the common-sense view I formulated above, according to which representation and interpretation are names for relations, and derivatively, therefore, for certain texts which represent and interpret other texts.
By extending each of these dyadic relationships to a triadic one, between a
The method described here might easily be criticised for creating an illusion that traditional text-critical scholarship, based on philological knowledge and careful interpretation, can be replaced by mechanical procedures and an artificial definition of the one and only "correct" transcription.
However, the requirements T1 and T2 are
And as is often the case, the method as such does not in principle depend on the use of markup systems, or even on computers. But it is only by marking up texts and using computer tools that the method can be implemented in practice.