Electronic Textual Editing: Prose Fiction and Modern Manuscripts: Limitations and Possibilities of Text-Encoding for Electronic Editions [ Edward Vanhoutte ]


Thinking things out involves saying things to oneself, or to one's other companions, with an instructive intent.

(Gilbert Ryle, The Concept of Mind, 294-295)

The hypertext edition, the hypermedia edition, the multimedia edition, the computer edition, the digital edition, the electronic edition are all synonymous labels for a concept without a definition 1 . The use of ‘edition’ in these labels presuppose a conventional understanding of that word, and is based on the implicit assumption that there is general agreement on what an edition is. But there is no such thing. In textual studies we certainly mean ‘scholarly’ edition whereas in strict bibliography ‘the term edition refers quite properly to the identifiably separate type-setting of any book, whether it is an edition in the text-critical sense or not’ (Greetham, Textual Scholarship 348). Any kind of available text qualifies as an ‘edition’, and any kind of electronically available text qualifies to be labelled as an ‘electronic edition’, just as any printed text can be called a ‘paper’ or a ‘print edition’. The words ‘digital’ and ‘electronic’ only describe the materiality (as opposed to ‘print’) by which data is presented to the reader, ‘computer’ refers to the distribution and access medium (as opposed to ‘book’ or ‘codex’), and ‘hypertext’, ‘hypermedia’, and ‘multimedia’ hint at the functionality of the electronic product (as opposed to ‘linear’). As John Lavagnino pointed out in a report on a lecture I gave in London in 2000, 2 ‘Ten years ago, it seemed sufficient to say that you were going to create such an edition in the form of a hypertext: often with very little elaboration on just what the result would be or why it would be significant, as though the medium itself would automatically make such an edition significant.’

To allow a functional debate on editing and editions in the electronic paradigm, editors should provide an explicit definition of an electronic edition as well as the kind of scholarly edition they are presenting in electronic form. They should do so not only by using qualifying adjectives or labels in the titles or subtitles of their electronic products, for which conventional terminology could be used where applicable, but also by means of, for instance, a consistent argument in a textual essay that accompanies the edition proper. To avoid confusiion between different meanings and types of edition, I have sketched out my definition of an electronic scholarly edition in the first section of this essay, and formulate six requirements which editors could embrace to ensure that their edition is treated as such.

As the attentive reader will notice, the formulation and discussion of this definition is influenced by an interest in and concern for non-critical editing and genetic transcription of modern manuscript material, which will be two of the main topics covered in this essay. The case study of my own electronic-critical edition of Stijn Streuvels' De teleurgang van den Waterhoek in the second section will serve as a test case for my definition, and an introduction to some limitations and possibilities of text encoding for electronic editions of modern prose texts. The assessment of this edition will be used as a starting point for elaboration on issues concerned with the production of literal (non-critical) transcriptions and genetic editions of modern manuscript material with the use of text-encoding as proposed by the Text Encoding Initiative (TEI). The third section covers all these issues and introduces the French school of critique génétique, which has developed a decade-long theory and practice for dealing with modern manuscript texts in a specific way. The attempt at applying their central interest in the internal dynamics of the modern manuscript and the study of the genetic continuum in a so-called dossier génétique to the transcription of manuscript material by means of text-encoding, is the focus of this third section of this essay: the current inability to encode overlapping hierarchies and time (both absolute and relational) elegantly in (non-critical) manuscript transcriptions will take up most of this section. I finish with some concluding remarks.

Definition and aims of an electronic edition

My full working definition of an electronic (scholarly) edition has six parts. By electronic edition, I mean an edition (1) which is the immediate result or some kind of spin-off product from textual scholarship; (2) which is intended for a specific audience and designed according to project-specific purposes; (3) which represents at least one version of the text or the work; (4) which has been processed in a platform-independent and non-proprietary basis, i.e. it can both be stored for archival purposes and also made available for further research (Open Source Policy); (5) whose creation is documented as part of the edition; and (6) whose editorial status is explicitly articulated.

With respect to this definition, five immediate observations have to be made. First, by defining an electronic edition as a ‘result’ or a ‘product’ which has been ‘processed’, I plead for a practice in which the construction of a digital archive which contains all data (encoded transcriptions, high resolution image files etc.) differs from and precedes the generation of the edition. I have called this the Archive/Museum model 3 Espen Ore speaks in this case of Teksttilretteleggelse/arkivoppbyging and Utgaveproduksjon (Publisering: 143). Second, in so far that I define an electronic edition in productive terms, it is indeed true that, as G. Thomas Tanselle writes in Textual Criticism at the Millenium: paraphrasing Peter Shillingsburg ‘an electronic edition is a form of presentation and, as such, does not pose a different set of theoretical issues from the one faced by editors who present their work in a different form’(33). What Tanselle and Shillingsburg here seem to overlook is that the practice of creating an edition with the use of text-encoding calls for explicit ontologies and theories of the text which do generate new sets of theoretical issues. 4 Maybe they are not different sets of editorial issues, but they are certainly new sets of textual issues such as problems of document architecture, encoding time, etc. Third, I do not consider it a requirement for an electronic edition to display textual variation in an apparatus or in any other way, for in my view this is a project-specific purpose. However, it follows from the first requirement that the study of textual variation—where it appears—is an essential part of the research involved in creating an electronic edition. Even a reading edition cannot be made without a serious examination of the textual variants. Fourth, the requirement for the presentation of at least one version of the text or the work is not a fundamental call for a critically established editor's text. Diplomatic transcript editions (Greetham, Textual Scholarship 391) and facsimile editions, for instance, do not present an critically edited text, but they should not pretend to be non-editorial. The editor is always present in the organisation of the material and the transcription of source documents. Point three in my definition specifies that a database of textual variation cannot be considered an electronic edition. If one is forced to choose between an edited text or textual variation, the edited text must always take priority. The category of electronic edition includes more than critical, diplomatic, facsimile, and reading editions. Mixed-flavour editions (which have characteristics of all of them but are none of them exclusively), archives which present editions, and editions which are organized as archives all qualify as electronic editions. 5 Fifth, the working definition does not prescribe any hypertextual features to be included in the edition, because the emphasis on hypertext in the current debate on electronic editing is still often beside the point and is about to outgrow its hype. If the use of hypertext does not add any fundamental advantages to the electronic edition over the codex based edition, it is better to stick to the book. Indeed, hypertext is just the visualization of linking which DeRose and Van Dam define as ‘the ability to express relationships between places in a universe of information’ 6 (9), and which should by marked explicitly or should be generated automatically by making use of some sort of markup and path scheme. Consequently, the syntax of this markup and the markup-language becomes essential in the design of an electronic edition with hypertext functionality. The presence of hypertext functionality in an electronic text, however, does not guarantee its scholarly integrity.

As G. Thomas Tanselle defined in his essay The Varieties of Scholarly Editing, scholarly editing is the considered act of reproducing or altering texts. Whether the result of that act is published in print or as an electronic product does not matter to this general definition. Nor does it affect its primary aims. Above all, the scholarly edition is aimed at articulating the editors' notions, perspectives, or theories of the texts—or of what the texts should have been in the case of critical editions. By doing so, the scholarly edition could invoke the literary debate by providing the materials and tools to explore new ways of understanding and studying the text. Only the format of these materials and tools will differ in the print and electronic edition, not their intentions. The ultimate, and for technical reasons the most problematic, aim however is to preserve our cultural heritage.

A case study

My electronic edition of the classic Flemish novel De teleurgang van den Waterhoek 7 by the Flemish author Stijn Streuvels 8 (1871-1969)—the first electronic scholarly edition of any sort of a Dutch/Flemish literary work—was published as an ‘electronic-critical edition’ on CD-ROM by Amsterdam University Press in 2000. Unfortunately, at the time of publication, I had not defined ‘electronic edition’ in the general introduction, and neither had I spelled out the ways in which the edition was ‘critical’ nor provided an account of the ‘linkemic’ approach towards textual variation and the eclecticism towards conventional editorial theory the editors had taken. 9 As a result, in the absence of these explicit statements, some users and reviewers misunderstood the edition. Some criticized it as disorienting; for instance, the presentation of two critical texts, the mixed use of text-transcriptions and digital facsimiles as representations of the documentary source material, or the absence of an apparatus variorum confused the reader, even though these features were actually the three pivotal theoretical statements of the edition. The following short survey of the edition's main editorial procedures and problems and discussion of both the solutions suggested by the edition and the questions it raises will serve, I hope, as an object lesson and case study, 10 by introducing the issue of non-critical and genetic transcription/editing of modern manuscript material.

Editorial Principles and Markup

In Scholarly Editing in the Computer Age,: Peter Shillingsburg defines five formal orientations of editing: documentary, aesthetic, authorial, sociological, and bibliographic (15-27). Except for the aesthetic orientation, all of these were applied to editing De teleurgang van den Waterhoek. 11 The aim of the editorial project was to explore different ways to deal with textual instability, textual variation, the genetic reconstruction of the writing process, and the constitution of a critically restored reading text. Whereas the reading edition in bookform, which was published together with the electronic edition, presents one text (the edited text of the first print edition), 12 this seemingly best-text approach is countered by the electronic-critical edition, where instability and versioning 13 are the governing principles in the presentation of many texts.

On the basis of their documentary status and appearance, the many texts in the edition could be divided into two groups:

Complex Documentary Sources Simple Documentary Sources
- the complete holograph manuscript (1927)
- the prepublication of the novel which was published in installments in the Dutch literary journal De Gids (1927)
- the corrected authorial copy of the prepublication which served as a printer's copy for typesetting the first print edition (1927)
- the first print edition (1927)
- the corrected author's copy of the first print edition which was used as printer's copy for typesetting the second print edition (1939)
- the revised second print edition (1939)

In the electronic edition, the documentary sources of the complex group are represented by their digital facsimiles, whereas the texts of the simple group are TEI compliant full text representations. By including either a full text version or a digital facsimile version of a documentary source, the electronic edition shows a mixed approach towards Peter Shillingsburg's (and Thomas Tanselle's) first requirement to provide both a ‘a full accurate transcription’ and a ‘full digital image of each source edition’ in an electronic edition. (Principles 28). This mixed approach, however, is partly compensated by the possibility of deducing the physical form of the prepublication from the facsimiles of the corrected prepublication, and the first print edition from the corrected first print edition. Only the presentations of the holograph manuscript and the second print editions do not have a full text and a facsimile counterpart respectively.

For sociological reasons, the edition presents two critically edited texts: the versions of the first and the second print editions. The first print edition constitutes an important moment in the genetic history of the novel. It is the first finished version of the text— we consider the prepublication as one step in the writing process leading towards the first print edition—and its reception by literary critics is the reason for the drastic revision of the text in preparation of the second print edition. This second print edition is sociologically equally important because it is the version that generations of readers have read and which the author constituted in interaction with contemporary society. 14 In constituting these texts, we have applied the principles of the German (authorial) editorial tradition which only allows justified corrections of manifest mistakes. In these two critical texts, the emendations 15 were documented by the use of the <corr> element, containing the correction, and a sic attribute whose value documents the original reading. The editor responsible for the correction (in this case EV for Edward Vanhoutte) was specified in a resp attribute: <corr sic="katin" resp="EV">kattin</corr>. The text of the prepublication was retained for historical reasons, and the inverse markup was used. Uncertain readings 16 were encoded with the aid of the <sic> element, containing a suggested correction in a corr attribute: <sic corr="kattin">katin</SIC>. 17 Although both systems are equivalent for the computer, they do articulate different views on the text. In the corresponding use of the inverse markup, one can see the thin line between critical editing and non-critical or documentary editing.

A linkemic approach to textual variation

The electronic edition of De teleurgang van den Waterhoek contains four sections: 18 1. an account of the editorial principles, consisting of a genetic history of the work, an account of the constitution of the texts, and a description of the transmission and publication history, 2. the actual edition which presents the textual variation against the orientation text by making use of the linkemic approach (see below), 3. the separate documentary sources, and 4. the diplomatic edition of 71 relevant letters for the reconstruction of the textual history, selected from the correspondence of the author. 19

The third section of the edition, documentary sources, presents the six versions of the text chronologically, allowing the user to consult each of them separately. It is of course true that ‘every form of reproduction can lie, by providing a range of possibilities for interpretation that is different from the one offered by the original’ (Tanselle, Reproductions 33) and the process of imaging is a process of interpretation. 20 In order for the user of the edition to be able to evaluate what they see, the facsimile editions are accompanied by a full account of the imaging procedure including the documentation on the soft- and hardware (and settings) used in the project, which I believe is an essential requirement. No facsimile can of course substitute the original, but it is the best approximation we can offer the interested user.

The user of the edition can also read all six versions together in the second section. This part of the electronic-critical edition presents the edited text of the first print edition as the orientation text around which the hypertext presentation of textual variation is organized. Instead of linking the orientation text to an apparatus variorum, the editors opted for what I have called a linkemic approach to textual variation. I define a linkeme as the smallest unit of linking in a given paradigm. This unit can be structural (word, verse, sentence, stanza, etc.) or semantic. In the case of the glossary provided with the orientation text, the linkeme is of a semantic class which can be defined as ‘the unit of language that needs explanation’. In the case of the presentation of textual variation, the linkeme is a structural unit, namely the paragraph. In the actual hypertext edition it is possible to display all the variants of each paragraph in all six of the versions on the screen. This is made possible by a complicated architecture on the code side which allows for hypertext visualisation on the browser side. This architecture was ‘automagically’ generated from the digital archive by a suite of SED and AWK scripts. This linkemic approach provides the user with enough contextual information to study the genetic history of the text, and introduces new ways of reading the edition. Because of the fact that a new document window, displaying a text version of the user's choice, can be opened alongside the hypertext edition, every user can decide on which text to read as her own base text. The hypertext edition can then be used as a sort of apparatus with any of the versions included in the edition. This way, hypertext and the linkemic approach enable the reading and study of multiple texts and corroborate the case for textual qualifications such as variation, instability and genetic (ontologic/teleologic) dynamism.

Modern manuscripts

As we will see, despite its strengths this practice is problematic for a genetic edition based on modern manuscript material. In the following discussion, I will start with two observations which are inherently subjective and personal, but nonetheless—I think—true. I will then confront the reality these observations signal with the reality of transcribing modern manuscripts out of genetic interest, which will lead to a brief treatment of the goals of the mainly French school of Critique Génétique, the peculiarities of the modern manuscript, and the problems resulting from trying to achieve the work within the current TEI proposals. As an alternative I will suggest that further research on a methodology and practice of non-critical editing or transcription of modern manuscript material may result in markup strategies which could be applied to the constitution, reading and analysis of a so-called dossier génétique. My approach to the manuscript as a filtered materialization of an internal creative process which can be compared to the process of internal monologue or dialogue and can thus be considered as a form of speech, might be helpful in this respect.

Observation 1: (Non-)Critical editing

In his article Literal transcription: Can the text ontologist help: Allen Renear points out that the distribution of attention to and resources for the study of critical editing and the study of non-critical editing is in inverse proportion to their relative practical importance (25). Apart from Mary-Jo Kline's very useful A Guide to Documentary Editing: , it is, for instance, very difficult to find an extensive and coherent treatment of non-critical editing in handbooks and survey articles on scholarly editing; specialized journals mainly if not exclusively focus on the theory and practice of critical editing (24). The eight steps Wilhelm Ott outlined for the production of a critical edition in his Computers and Textual Editing: , and which Susan Hockey took as the framework for her chapter on Textual Criticism and Electronic Editions: in Electronic Texts in the Humanities (124-145), for instance, jump from the collection of witnesses (step 1) to the collation of the witnesses (step 2), and further to
  • the evaluation of the results of the collation (step 3)
  • the constitution of a copy text (step 4)
  • the compilation of apparatuses (step 5)
  • the preparation of indexes (step 6)
  • the preparation of the printer's copy (step 7)
  • and finally the publication of the edition (step 8).

The essential, difficult, and time consuming step of the transcription of the primary textual sources is not explicitly mentioned in this outline. Instead it is silently folded into the second step i.e. the collation of the witnesses, and thus primarily oriented towards the representations or captions of the linguistic text whose collations produce a record of the inter-documentary variation. Optimally, transcription should precede the second step, as transcription or non-critical editing is the principal activity of critical editing and the epistemic foundation for all further textual criticism and textual editing. Only on the basis of such transcriptions can the editor proceed to automatic collation, stemma (re)construction, the creation of (cumulative) indexes and concordances etc. by the computer.

This neglect of non-critical editing in otherwise instructive outlines for the textual critic is the more remarkable because in the not so distant past, as we may recall from reading, for instance, Ben Ross Schneider's report on the London Stage Information Bank: in his Travels in Computerland, up to half the time, the efforts— and the budget—of an electronic project were devoted to converting textual data into another format, in this case a machine-readable format. Electronic non-critical editing is concerned with the twofold transformation from one format into another: First the transformation from the text of a physical document to the transcription of that text, and second the transformation from one medium, the manuscript, to another, the machine-readable transcription.

Peter Robinson makes the important point about transcribing the Canterbury Tales: manuscripts that there is no division between transcription and editing. ‘To transcribe a manuscript is to select, to amalgamate, to divide, to ignore, to highlight, to edit.’ (Robinson, Politics 10). And if we consider ‘editing’ as short for scholarly editing, then both critical and non-critical editing should always have a scholarly result. The reason for the neglect of non-critical editing in the theory and practice of textual criticism, however, is frequently, as I will argue in a moment, the lack of a satisfactory ontology of the text on which a methodology of non-critical editing could be modelled.

In contrast with the production scheme for critical editions I have just discussed, the TEI Guidelines: do seem to make the explicit distinction between non-critical and critical editing by devoting two different—but consecutive—chapters on Transcription of Primary Sources: (chapter 18) and Critical Apparatus: (chapter 19). Although the former chapter starts with explaining that ‘It is expected that this tag set will also be useful in the preparation of critical editions, but the tag set defined here is distinct from that defined in chapter 19 Critical Apparatus, and may be used independently of it’ (453), the DTD-subset for transcription also allows the scholar ‘to include other editorial material within transcriptions, such as comments on the status or possible origin of particular readings, corrections, or text supplied to fill lacunae’ (453) which are strictly speaking notfeatures of literal transcription. So, the chapter that seemingly deals with non-critical editing in the TEI Guidelines addresses issues which are central in critical editing, and includes in its DTD-subset tags to encode them. A provocative reading of the rhetoric of the opening paragraphs of chapter 18 could then be paraphrased as ‘e scholar who wants to transcribe primary source material in a non-critical way, in fact only uses a small portion of the DTD-subset and does not exploit its capabilities to the full’ This is a typical signifier of the influence of the traditional schools of textual editing on the work of the TEI. Whereas they overemphasize the allegedly unimportance of non-critical editing in their theories, the French school of critique génétique mainly works with non-critical representations of the documents under study.

Observation 2: Methodology

According to Allen Renear ‘Non-Critical editions ‘transcribe’,‘reproduce’, ‘present’—all these words are used—the text of a particular document, with no changes, no subtractions, no additions. It gives us the text, the whole text and nothing but the text of an actual physical document’ (24). However, as I have already suggested, an answer to the basic question of what text is, and hence what to transcribe, is a prerequisite to the transcription of a primary source. Only when a project has a clear agreement on the ontology of the text, can a methodology for text transcription be developed.

Although the TEI subset for the transcription of primary source material has ‘not proved entirely satisfactorily’ for a number of problems (Driscoll 81), they do provide an extremely rich set of mechanisms for the encoding of older texts and documents with a fairly ‘neat’ appearance such as print editions, which are fairly static and stable. Although the TEI chapter on transcription does quote examples from modern manuscript material, the result of the successful use of the TEI in critical and non-critical editing can mainly be observed in projects which deal with older material which are interested in the record of inter-documentary variation.

The transcription of modern manuscript material using TEI proves to be more problematic because of at least two essential characteristics of such complex source material: namely the notions of time and of overlapping hierarchies. Since SGML (and thus XML) was devised on the assumptions that a document is a logical construct that contains one or more trees of elements that make up the documents content (Goldfarb 18) several scholars began to theorize about the assumption that text is an ordered hierarchy of content objects (OHCO thesis), which always nest properly and never overlap, 21 and the difficulties attached to this claim. 22 The TEI Guidelines propose five possible methods to handle non-nesting information, 23 but state that:

Non-nesting information poses fundamental problems for any encoding scheme, and it must be stated at the outset that no solution has yet been suggested which combines all the desirable attributes of formal simplicity, capacity to represent all occurring or imaginable kinds of structures, suitability for formal or mechanical validation, and clear identity with the notations needed for simpler cases (i.e. cases where the textual features do nest properly). The representation of non-hierarchical information is thus necessarily a matter of choices among alternatives, of tradeoffs between various sets of different advantages and disadvantages. 24

The editor using an encoding scheme for the transmission of any feature of a modern manuscript text to a machine-readable format, is essentially confronted with the dynamic concept of time which constitutes non-hierarchical information. Whereas the simple representation of a prose text can be thought of as a logical tree of hierarchical and structural elements such as book, part, chapter, and paragraph, and an alternative tree of hierarchical and physical elements such as volume, page, column, and line—structures which can be applied to the wide majority of printed texts and classical and medieval manuscripts— the modern manuscript shows a much more complicated web of interwoven and overlapping relationships of elements and structures.

Modern manuscripts, as Almuth Grésillon defines them, are ‘manuscripts which are considered as forming part of the genesis of a text, evidence of which evidence is given by several successive witnesses, and which show the writing process of an author’ (244). 25 Therefore, the structural unit of a modern manuscript is not the paragraph, nor the page or the chapter, but the temporal unit of writing. These units form a complex network which are often not bound to the chronology of the page.

The current inability to encode these temporal and genetic features of the manuscript and the overlapping hierarchies with a single, elegant encoding scheme, forces the editor to make choices which result in impoverished and partial representations of the complex documentary source. Further, if such an encoding scheme would exist, and genetic transcriptions could be produced, the current collation software would need to be redesigned in order to take these non-nesting structures into account. Therefore, in the electronic edition of De teleurgang van den Waterhoek, we have opted to represent the complex documentary sources by means of digital facsimiles only, preserving the genetic context of dynamic writing process of the author. By mounting these digital images in a hypertext structure which confronts them with representations of the other witnesses in the edition, the instability of the text is emphasized, and the user is provided with a tool to reconstruct the writing process of the novel under study. This focus on hypertext functionality and image based editing 26 could give the false impression that transcriptions become superfluous, and by astounding their audience with digital facsimiles, clickable image maps, and javascript driven dynamic views of the digital images, the developers of such editions very cunningly avoid having to include a full transcription of the documentary sources—and I have been one of them. 27

A further example of this fear of testing systems against modern manuscript material is illustrated in a recent article by Eric Lecolinet, Laurent Robert and François Role on a sophisticated tool for the Text-image coupling for Editing Literary Sources.: The article, which appeared in the thematic issue on Image-based Humanities Computing: of Computers and the Humanities (Kirschenbaum) presents, I quote from the abstract,

a system devoted to the editing and browsing of complex literary hypermedia including original manuscript documents and other handwritten sources. Editing capabilities allow the user to transcribe images in an interactive way and to encode the resulting textual representation by means of a logical markup language[...]

(Lecolinet et al. 49)

The first figure in this article shows a complicated page in terms of authorial interventions from Flaubert's manuscript, but the five further figures which demonstrate the system at work, show a simple medieval manuscript and a simple handwritten text neither of which contain any authorial additions, deletions, substitutions or revisions. It seems that the texts used for demonstration have been chosen to illustrate the limited capabilities of the demonstrated system.

This fear of testing existing transcription systems with modern manuscript material of a complicated nature in several projects may signal the fact that a coherent system or methodology for the transcription of modern material still has to be developed and tested, and that an ontology of the text must be agreed on.


There has never been a single standard convention for the transcription of manuscript texts, and it is not likely that there ever will be one, given the great variety of textual complications that manuscripts—from all times and places—can present.

(Vander Meulen and Tanselle 201)

Genetic Criticism — Critique Génétique

At the end of his previously mentioned article, Allen Renear announces that there are major challenges to his view of text ontology which sees the literal transcription as a ‘literal representation of the linguistic text of a particular document’ (30). Therefore, he adds, ‘it presents the author's linguistic achievements, not the author's linguistic intentions.’ One such challenge is Jerome McGann's identification and treatment of ‘bibliographic codes’ (30). Critique Génétique 28 is yet another major challenge to his view of text ontology, in that it is also interested in things such as time, ductus, and topology of the page which are strictly speaking not parts of the text.

The problematic position of critique génétique amongst the other schools of textual criticism and textual editing is that its primary aim is to study the pre-text (avant-texte) not so much as the basis to set out editorial principles for textual representation, but as a means to understand the genesis of the literary work. Therefore, critique génétique does not aim to reconstitute the optimal text of a work, and is eventually not interested in the text, but in the dynamic writing process which can be reconstructed by close study of the extant drafts, carnets, etc. Moreover, as Tanselle points out in a dicusssion of Antoine Compagnon's introduction to a special issue of Romanic Review: 29

French genetic critics are generally opposed to the construction of editions, on the grounds that an apparatus of variants, derived from the classical model in which variants are departures from an author's final text, is inappropriate for an authorial avant-texte and implies a subordination of it.

(Millenium 27)

Rather than producing editions, the généticiens constitute a dossier génétique by localising and dating, ordering, deciphering, and transcribing all pre-text witnesses. Only then can they read and interpret the dossier génétique. This does not mean, however, that the publication of genetic editions is not possible. The influential French généticien Pierre-Marc de Biassi defines three different categories of genetic editions (Van Hulle 375-376):

Transversal edition:
attempts to render ‘works’ that were left unfinished because of the author's sudden death or for whatever other reason.
Horizontal edition:
reconstructs one particular phase in the writing process, e.g. the author's notebooks of a certain period.
Vertical edition:
reconstitutes the complete textual history

The computer has been used both in trying to publish a genetic edition, and in constituting and reading a dossier génétique. The concept of hypertext has been used extensively to regroup a series of documents which are akin to each other on the basis of resemblance or difference in multiple ways but the various endeavours to produce such hypertext editions or dossiers are too much oriented towards display. In the past as well as nowadays commercial software packages such as Hypercard, Toolbook, Macromedia, Adobe Acrobat and even Powerpoint have been used for genetic purposes, with data stored in proprietary formats. Where standards for markup have been used, we see HTML as the unbeaten champion for products in which every link, colour, frame and animation has been hardcoded without a repository of SGML/XML based transcriptions of the manuscript texts. This is partly, as we have seen, because of a lacking or specifically non-text based ontology of the text.


The institutional and technical machinery of textual genetics should not blind us to the fact that the object that it purports to study will almost by definition escape ‘science’. What textual genetics studies is in effect something that cannot be observed, that cannot become an object: the origin itself of the literary work.

(Laurent Jenny) 30

Putting time back in manuscripts

I want to propose a methodology which might help us in combining the study of ‘what cannot be observed’ in very observable markup.

I am arguing that a writing process, and hence any text resulting from it (by definition) takes place in Time, which immediately results in 4 complications for the non-critical editor and text-encoder:
  1. Its beginning and end may be hard to determine and its internal composition difficult to define (document structure vs. unit of writing): authors frequently interrupt writing, leave sentences unfinished and so on.
  2. Manuscripts frequently contain items such as scriptorial pauses which have immense importance in the analysis of the genesis of a text.
  3. Even non-verbal elements such as sketches, drawings, or doodles may be regarded as forming a component of the writing process for some analytical purposes.
  4. Below the level of the chronological act of writing, manuscripts may be segmented into units defined by thematic, syntactic, stylistic, etc. phenomena; no clear agreement exists, however, even as to the appropriate names for such segments.

These four complexities are exactly what the TEI Guidelines: consider as ‘distinctive features of speech’. In chapter 11 on Transcription of Speech we read:

Unlike a written text, a speech event takes place in time. Its beginning and end may be hard to determine and its internal composition difficult to define. Most researchers agree that the utterances or turns of individual speakers form an important structural component in most kinds of speech, but these are rarely as well-behaved (in the structural sense) as paragraphs or other analogous units in written texts: speakers frequently interrupt each other, use gestures as well as words, leave remarks unfinished and so on. Speech itself, though it may be represented as words, frequently contains items such as vocalized pauses which, although only semi-lexical, have immense importance in the analysis of spoken text. Even non-vocal elements such as gestures may be regarded as forming a component of spoken text for some analytic purposes. Below the level of the individual utterance, speech may be segmented into units defined by phonological, prosodic, or syntactic phenomena; no clear agreement exists, however, even as to appropriate names for such segments.


If we consider, as I propose, any holograph witness as a filtered materialization of an internal creative process (thinking) which can be roughly compared to an internal monologue or dialogue, we may have a basis on which to build a methodology for the transcription of modern manuscript material. By combining the TEI DTD-subsets for the transcription of primary sources, the encoding of the critical apparatus and the transcription of speech, we could try to transcribe a manuscript and analyse it with tools for the manipulation of corpora of spoken language. It is interesting in this respect to observe how critique génétique describes the authorial interventions as deletions, additions, Sofortkerretur or currente calamo, substitutions, and displacements in terms of material or intellectual gestures as if it were kinesic (non-verbal, non-lexical) phenomena.

This approach does not do away with the essential problem of non-nesting information which is an inescapable fact of textual life, and even results from a one-way analysis. 31


Rather than focusing the debate on the possibilities of electronic scholarly editions and the advantages over the print edition in terms of usability—as has been too often the case—I'd rather see editorial theorists concentrate on the possibilities of the text which can be discovered by studying manuscript material very closely. Creating a non-critical edition/transcription of such a text with the use of text-encoding is the closest kind of reading one can endeavour. Those who could reveal the writing out of the written work and discover the traces of the genesis of the text in its internal dynamics, could say something sensible about the text and the intentions of the author. But modern manuscripts are complex and unwilling to obey the simple conventional ontologies of text and systems of text-encoding. Paraphrasing Louis Hay, one could say that the ‘principal merit of manuscripts is that they demonstrate the limitations and possibilities of text encoding. From the outset there are important material limitations: it is impossible to study a nonexistent manuscript.’ 32 Paradoxically, the existent and extant manuscripts generate, by their resistance to current systems of text-encoding, new ontologies of the text and new approaches towards text-encoding. Likewise, current systems of text-encoding lay bare new possibilities by their limitations to cater for such manuscripts.

This essay is for Alois Pichler, director of the Wittgenstein Archives, whom I'd like to thank for the many fruitful discussions on textual genetics and transcription procedures, and for Heli Jakobson who managed to drag me from behind my desk every now and again during my stay in Norway. Research for this article has been made possible by a European Research Grant (5th Framework Programme Improving the Human Research Potential and the Socio-economic Base: Access to Research Infrastructures (ARI)) for research at the Wittgenstein Archives at the University of Bergen (Norway, 01/06/2002-07/07/2002).
The Typopotamus called 'electronic edition'. Notes towards a definition and typology of electronic scholarly editions. King's College London, 8 December 2000.
Elsewhere I have suggested a model for electronic scholarly editing that unlinks the Archival Function from the Museum Function. ‘By Archival Function I mean the preservation of the literary artifact in its historical form and the historical-critical research of a literary work. Museum Function I define as the presentation by an editor of the physical appearance and/or the contents of the literary artifact in a documentary, aesthetic, sociological, authorial or bibliographical contextualization, intended for a specific public and published in a specific form and lay-out. The digital archive should be the place for the first function, showing a relative objectivity, or a documented subjectivity in its internal organization and encoding. The Museum Function should work in an edition—disregarding its external form—displaying the explicit and expressed subjectivity and the formal orientation of the editor. The relationship between these two functions is hierarchical: there is no Museum Function without an Archival Function and an edition should always be based on a digital archive.’ (Resistance 176).
See my and Ron Van den Branden's Describing, Transcribing, Encoding, and Editing Modern Correspondence Material: A Textbase Approach for a discussion on three ways to produce a scholarly edition with the use of text encoding, namely 1. Digitizing an existing print edition; 2. Creating an electronic edition e.g. by recording some or all of the known variations among different witnesses to the text in a critical apparatus of variants; and 3. Generating electronic editions from encoded transcriptions of the documentary source material.
For instance Jerome McGann's Rossetti Hypermedia Archive: and Peter Robinson's The Wife of Bath's Prologue on CD-ROM.
‘A place should be any piece of information, or at least any that exists in a stable or recoverable form’ (DeRose and van Dam 9).
De teleurgang van den Waterhoek tells the story about the resistance of a rural community in a hamlet called de Waterhoek to the intrusion of industrial technology i.e. the building of a bridge, and the consequences this new connection to the outside world brings forth. The plot of the novel is complicated by the passionate relationship between the frank village girl Mira and Maurice, the engineer supervising the works. The text of the first print edition of the novel is 297 pages long, and counts 117,800 words.
Stijn Streuvels is a pseudonym for Frank Lateur.
These ommissions were later addressed in a series of essays. See e.g. my Linkemic Approach: and Argument: , and De Smedt and Vanhoutte Eclecticism. The introductory material to the edition does give an account, however, of the textual genesis and the history of the text, descriptions of its physical forms and physical appearance, the rationale for editorial decisions such as emendations, the treatment of spelling, punctuation etc. A separate technical documentation outlines the markup strategies, the digitization process, the hardware and software used, and the creation of the electronic product.
For this purpose I will make extensive use of the aforementioned essays (see note 9).
The editorial project ran from 1998-1999 and was funded by the Royal Academy of Dutch Language and Literature, Gent-Belgium (Koninklijke Academie voor Nederlandse Taal- en Letterkunde http://www.kantl.be ). The project was supervised by Prof. dr. Marcel De Smedt, who also functioned as a co-editor. Together with the ‘electronic-critical edition’ on CD-ROM, a text-critical reading edition in bookform was published as a spin-off product of the electronic project (Antwerpen: Manteau, 1999).
Together with a glossary, an introductory article on the genesis of the novel, a description of the transmission history of the text including all of the documentary sources and existing editions, an account of the principles underlying the constitution of the reading text (spelling, punctuation, corrections) with a list of corrections and end-of line hyphenations, an account of the principles underling the creation of the glossary, and a couple of facsimiles from the several documentary sources involved in the research.
Cf. Reiman esp. Chapter 10: 'Versioning': The Presentation of Multiple Texts, 167-180.
After an incubation period of twenty-some years, Streuvels started writing his novel and published it in installments in the Dutch literary journal De Gids in 1927. At the end of the same year it was published in bookform both in Flanders and the Netherlands by two different publishers using the same print. For the second print edition (1939) the text had been drastically reduced by 26.6% because the publisher requested a shorter and hence a cheaper version of the book. The author choose to cross out, amongst others, those passages that were of a too explicit and erotic nature, and against which many catholic critics had fulminated in their reviews of the first print edition. At the same time, Streuvels changed the end of the novel. In the first edition the protagonist couple divorce, but the author suggests that the divorced couple can eventually reunite, and there is an allusion to an unborn child. In the second drastically revised edition, the divorce is definitive and conclusive. Up to the publication of the 13th print edition of the novel in 1987, this drastically revised second edition had been the basis for 11 successful reprints.
Whereas the second print edition only retains 73.4% of the text of the first print edition, more emendations had to be made (93 and 73 respectively).
76 in total.
These examples could be paraphrased as <corr sic="kiten" resp="EV"> kitten </corr> and <sic corr="kitten"> kiten </sic> respectively.
The electronic-critical edition of De teleurgang van den Waterhoek is an auto-executable application which launches itself on inserting the CD-Rom in the CD-drive, and comes with the MultiDoc Pro CD Browser software. No programmes need to be installed on the hard drive of the computer (PC only) on which the edition is consulted. The electronic edition is an autonomous closed package without links to the internet. This meets the requirements Paul Brians (Washington State University) voiced in a discussion on the Humanist-list about CD-Roms in libraries and at home: ‘CD-ROMs at the least should be self-contained, and not require that files be installed on hard drives or permanent links be available to the Internet.’
For the transcription and markup of the correspondence material, I developed a project-specific StreuLet DTD which allows the encoding of letter-specific elements such as the existence of the envelope and envelope information such as postmark, place of posting, sender, sender's address, recipient, recipient's address, etc. This work has developed further into the DALF guidelines for the description and encoding of modern correspondence material and the DALF DTD. DALF is an acronym for Digital Archive of Letters in Flanders and focusses on correspondence by Flemish authors and composers from the 19th and 20th century. It is envisioned as a growing textbase of correspondence material which can generate different products for both academia and a wider audience, and thus provide a tool for diverse research disciplines ranging from literary criticism to historical, diachronic, synchronic, and sociolinguistic research. The input of this textbase will consist of the materials produced in separate electronic edition projects. See the papers by Vanhoutte and Van den Branden and the DALF website: http://www.kantl.be/ctb/project/dalf/ .
The choice of hardware and software, and the parameters decided on when batch converting a TIFF file to a lossy format such as JPEG (e.g. the application of an Unsharp mask filter) are non-objective moments in the digitization process and highly influence the eventual result.
Annex C of ISO 8879 introduces the optional CONCUR feature (not available in XML) which ‘supports multiple concurrent structural views in addition to the abstract view. It allows the user to associate element, entity, and notation declarations with particular document type names, via multiple document type declarations’ (Golfarb 89).
Suggested further reading on overlap-related problems: Barnard et al., SGML-Based Markup: ; Barnard et al., Hierarchical Encoding: ; DeRose et al.; Durand, Mylonas, and DeRose; Huitfeld, ‘Multi-Dimensional Texts’; Renear, Durand, and Mylonas; Sperberg-McQueen and Huitfeldt, Concurrent: ; Sperberg-McQueen and Huitfeldt, GODDAG: ; and chapter 31 Multiple Hierarchies: of the TEI Guidelines.
The suggested methods are CONCUR, milestone elements, fragmentation of an element, virtual joints, and redundant encoding of information in multiple forms. Cf. Chapter 31 Multiple Hierarchies: of the TEI Guidelines: . The Rossetti Archive: based in Virginia, and the Wittgenstein Archives: at Bergen created their own encoding system—respectively RAD (Rossetti Archive Document), and MECS (Multi Element Code System)—out of dissatisfaction with the operationality of the options suggested by TEI. Cf. McGann, Radiant 88-97; and Text Encoding at the Wittgenstein Archives http://www.aksis.uib.no/1990-99/textencod.htm .
Cf. chapter 31 Multiple Hierarchies: of the TEI Guidelines.
‘...manuscrits qui font partie d'une genèse textuelle attestée;e par plusieurs témoins successifs et qui manifestent le travail d'écriture d'un auteur’ (Grésillon 244).
Which I by all means consider a valuable and valid form of editing.
One absurd exponent of image based editing is the genetic edition of Flaubert's L'éducation sentimentale by Tony Williams and Allan Blunt ( http://www.hull.ac.uk/hitm/ ). The diplomatic transcriptions were ‘initially prepared as Word (.doc) files, which cannot be readily converted into HTML text files, since 'raw' HTML code is unable to render the refined layout, freeform lines, and interlinear additions and other features of the transcriptions achieved through advanced use of modern word-processing software’ (Williams and Blunt 198). Therefore, the editors contend, ‘Diplomatic transcriptions need [...] to be stored in the hypertext package as image files’ (199) which they create by printing out the Word files and scanning them on a flatbed scanner. Hyperlinks from the transcription image file to documentary notes are then establised by means of image maps. But since hypertext is usually signalled by the browser by a different text colour, ‘the hotspots must be indicated (via words in red that do not corrupt the typographical integrity of the diplomatic rendering) in the word-processed transcription document before digitization’ (200).
Grésillon Eléments: provides a good introduction to the methodology of critique génétique. A good introduction in English is Falconer, Genetic Criticism.
86(3), May 1995.
‘L'appareillage institutionel et technique de la génétique textuelle ne saurait faire oublier que l'objet qu'elle se donne échappe presque par définition à la 'science.' Ce que scrute la génétique textuelle, c'est en effet un inobservable, un inobjectivable: l'origine même de l'oeuvre littéraire’ (Laurent Jenny, Divagations généticiennes, cited in Lernout 124).
See my Display or Argument for a further discussion of non-nesting information
‘The principal merit of manuscripts is that they demonstrate the limitations and possibilities of genetic criticism. From the outset there are important limitations: it is impossible to study a nonexistent manuscript’ (Hay 68).

Last recorded change to this page: 2007-10-31  •  For corrections or updates, contact web@tei-c.org