Licensed under
No source: this is an original work
The world of electronic editing is a new one and it is going to take some time for standards and practices to settle down and become part of the established world of scholarship. As the editors point out in their introduction, the book has developed over 500 years, the print edition has been evolving over the past 150 years, while electronic tools have been used to assist in editing for perhaps 50 years (11). The reconceptualization of the edition as an innovative electronic artefact has happened only over the last 10-15 years, and this means that the librarians charged to acquire, deliver, and preserve this most important of scholarly tools are facing new challenges. This chapter discusses what the different issues are in preserving electronic editions, some of which derive from the newly conceptualized modes of instantiation of electronic text, and offers some suggestions to scholars that should enable them to take the necessary steps from the beginning of an editorial project in order to ensure that what is produced is preservable as far as it is possible.
As is shown by the essays in this volume, electronic editing offers
a plenitude of materials that represent a work in all its different
states of being. It also allows the situation of a work within a
nexus of social, contextual, and historical materials, all of which
contribute to the totality of its meaning. The electronic edition is
itself another
Conventional printed editions present few problems to libraries in
collecting and preserving them that they have not already been
grappling with for many years. Such problems as print editions may
present all derive from the physicality of the medium used to deliver
them. They may, for instance, be old and rare, and in need of careful
handling, controlled storage, conservation etc. They may be
nineteenth century editions printed on acid paper which is
disintegrating into yellow crumbs or may have pages coming loose from
bindings. Editions presenting these problems can either be repaired,
or their contents can be reformatted onto another medium: microfilm,
photographic facsimile, new print volume. While reformatting can
raise questions about the authenticity of the material object, it is
preferable to the alternative, which might be total loss of the object
and its valuable contents. Librarians are concerned mostly with the
format and construction of the material object, the
In the analogue world the editor’s roles and responsibilities in relation to the longevity of the work are clear. Editors are little concerned with the format or composition of the carrier, unless they are involved to some degree in font choice, cover design etc. Most responsibilities of the editor end on the day of publication, by which time he or she may already be engaged in a new piece of work. For publishers, their responsibility is to produce a printed volume containing the edition, ensuring as far as possible to quality of the scholarship it represents. This is a complex process involving many different tasks and subtasks of selecting, advising, reviewing, revising, copy-editing, typesetting, proofing, designing, printing, marketing and dissemination. On-going responsibilities might include reprinting or publishing a new edition, but any role in the long-term survival of the product is minimal. When an edition is out of print, that usually means unobtainable from the publishers, and it is not for the editor or publisher to ensure continuing access: that rests with libraries, especially the copyright and major research libraries. Copyright libraries as a nation’s library of record receive a copy of every publication produced in a country in any one year, so a published edition which is out of print and unavailable anywhere else will almost certainly be locatable in a copyright library somewhere, even if it is more than a hundred years old. And that edition should be perfectly usable and accessible, even if created according to different principles than the user expects: it should be self-describing and contain all the explanation needed for its use.
In his foreward to this volume, G. Thomas Tanselle has suggested
that the format in which books are delivered is irrelevant: the use
of the computer in editing does not change the questions, and the
varying temperaments of editors will continue to result in editions of
differing character
(page ref). This may be true for editors, and
it may also be true for users of editions, but it is not true for
publishers producing or libraries collecting and preserving electronic
editions; for them the changes are profound and far-reaching, and
involve a complete rethink of the economics of producing and handling
these works. Previously, these editions of differing character
would all have been treated in much the same way by both publishers
and libraries. Even uniform series of texts with strict style guides,
unchanging livery, immutable type styles could have very different
kinds of editions within them. Glance at the Early English Text
series for instance, which has existed for a century and a half and
you will find very different contents underneath the brown covers:
Zupitza’s facsimile of
Tanselle goes on to say that editors working in the electronic world need to confront the same issues that editors have struggled with for 2500 years (page ref). Again, this is of course true, but if their work is to survive beyond even the next five years, serious thought must be given right at the start to how to create the edition in such a way that libraries can collect and preserve it alongside many other electronic and conventional editions for the benefit of scholarship in the long term, as well as for the users of today; this is not something that scholars have ever had to do before. And while the traditional questions are still being asked, the very plenitude of new possibilities in the electronic medium means that many new questions will be asked too. In an edition of Yeats’ poem
For a scholar contemplating embarking on an electronic textual
edition, the first place to look for advice about creating a
yesto all those that apply to his or her projected work, he or she would be well on the way to producing a durable resource. The rest of this chapter expands upon some of the issues raised by the Guidelines.
Two of the key questions the Guidelines ask of potential editors
are: How important is permanence or fixity? How can these be
obtained?
and Is there a possible benefit to openness and
fluidity?
These are crucial but in some ways contradictory points.
Openness and fluidity are significant benefits that the digital world
offers to editors, and many of the contributions in this volume
discuss projects which have no final point of publication of the
edition or any of its parts. Some of the editions are purely
networked and it is anticipated that these will grow and change over
time. Indeed, van Hulle suggests that publication is a 'freezing
point' ( 105) and that the act of publication is an exercise in
petrification (110). Van Hulle is talking not about edited texts, but
about some of the fluid published texts of Samuel Beckett’s
the scholarly editor’s basic task is to present a
reliable text: scholarly editions make clear what they promise and
keep their promises
(31). It is also vital to provide a
One of the main points to stress about the fluidity of electronic editions is that it is both a strength and a weakness: it can be a strength for the editor who can adapt and change the edition as new information comes available, but a weakness for the user who may not understand what changes have been made, and for the librarian who needs to deliver and preserve the materials: what version of something becomes the preservation version? Instability of citation is a critical problem; research and scholarship are based upon a fundamental principle of reproducibility. If an experiment is repeated, are the results the same? If a citation is followed by a scholar, does that lead to a stable referent, the same referent that every other scholar following that citation is led to? If these are not true then we do not have scholarship, we have anarchy, and it is an anarchy that is well-supported by the Internet.
Another key issue when planning electronic editions is to establish
standards and working practices that may enable editions to be
It is going to take some time for principles and practices to be established that will ensure the longevity of electronic scholarly editions, and those principles and practices will need to be much more precise in many ways than those that are needed for the production of conventional editions, elaborate and precise though these already are. The underlying scholarly practices are much the same, as many of the chapters in this volume make clear, it is the technical issues that need to be resolved. And that is difficult because at the moment electronic editing is more characterized by innovation, experimentation, and new developments than it is by established practices, and that is what makes it so exciting. Electronic scholarly editing is also caught up in the fast-moving world of hardware, software, applications, and standards, which change with dizzying speed. The editor is caught between taking advantage of all these new developments and trying to ensure that the work survives for the long term.
Before going on to discuss what the editor might do to ensure the
longevity of editions, it might be useful to look at some of the ways
in which digital data is being preserved by libraries and other memory
institutions.potentially infinite
expansion of the traditional editorial apparatus
(77) and that
users might be able to have their own customized version of the
edition, based on their behaviour while interacting with the materials
(82). They also suggest that an important feature of online poetry
could be the contextural
relation of multiple texts on the
internet within hyperlinked clusters, where the poem interacts with
many other kinds of information (74). For Crane, electronic editions
are protean
and able to adapt themselves to multiple needs
and to recombine themselves in new configurations
(207). In all of
this experimentation and innovation, it is sometimes difficult to know
what is at the core of a complex, everchanging edition that can be
collected and preserved by a library.
The preservation of digital data has two main components:
preserving the integrity of the bits and bytes, and preserving the
A new approach to the preservation of complex digital data is being
explored by the University of Virginia and Cornell University,
together with other academic partners. This is the Fedora project
(Flexible Extensible Digital Object Repository Architecture), one of a
number of repository architectures that have been proposed for use in
digital libraries.
There are a number of
projects too which are looking at the problems of preserving
information on web sites. The web is the largest and most prolific
source of information that there has ever been, much of which could be
lost if it is not actively preserved. Initiatives at a number of
national libraries, concerned with potential loss of considerable
portions of the national heritage now being produced in online form
(the Library of Congress, the British Library, National Library of
Australia, National Library of Sweden, among others), are looking
closely at the long-term preservation of web sites. The volumes of
data to be stored are enormous, so it is impossible to try to preserve
every state of a web site as a library might, for instance, preserve
every edition of a book. Some projects aim to harvest web sites every
six months and preserve them for historic purposes. Many of the
experiences in the preservation of web sites can offer insights into
the preservation of networked editions. For instance, if the
networked edition has many links to external sources of information
outside the control of the editors, these are going to be highly
vulnerable. Link checkers can automatically report that links are
broken and information missing, but rarely can anything be done about
this other than to remove the link. Another interesting web
harvesting initiative is the Internet
Archive (almost complete
snapshot of the World Wide Web every 60 days since 1996—that’s about
two billion pages
(Kahle, qtd. in
to surf more than 10 billion web pages.Try it on your favourite web site and see what problems this throws up.
In perusing this volume it is obvious that there is a real tension
between the new possibilities offered by the electronic edition and
the need to preserve the scholarly record. As Eaves points out in
discussing the Blake Archive: We have no answer to the haunting
question of where and how a project like this one will live out its
useful life
(157). The TEI itself was established to address many
of the problems (7), but just using the TEI for preparing text is
not going to be the answer: there are many more issues than what text
encoding scheme to use. The electronic medium offers so much more
than textual processing to the editor, as is clear from many
contributions here. Each of these media has its own problems of
handling and long-term storage. But even more problematic than the
format, handling, markup, and storage of different media is the
maintenance of the links between and within the media, and, with the
networked edition, the links can be highly prolific and sometimes
uncontrollable, if they are linking to resources external to the
edition..
In the past there has been a threefold distinction made between
data, programs, and interface that it might be useful to discuss here,
though Fraistat and Jones warn us against the venerable dichotomy
that divides the world into opposing demesnes one focused on the front
end ... and one on the back end
(74)—for them, editors (especially
editors of poetry) in the digital world need to consider the
providing an effective, easily understood,
use interface (or GUI) is always an important issue in an electronic
edition.
Perhaps we need to expand the distinction to a fivefold
one: data, metadata, links, programs, and interface. The first three
of these contain the intellectual capital in an edition, the last two
are (should be?) external. However important the programs used to
create and deliver the edition and the interface through which it is
accessed, scholars must always remember that these are likely to be
the least durable part of any electronic edition, and plan for the
design and formatting of their intellectual assets in such a way that
they can be reused with different programs and interfaces. Easier
said than done.
What do we mean here by data, metadata, and links? Data is the raw
material deriving from the source itself and can be text, images,
sound files, video etc. Metadata is added symbols that describe some
features of the data, and links are strings of code that link pieces
of data to other pieces of data—either internal to the source itself
or external. Throughout this volume, the metadata that has been
mostly of concern is textual markup, and in particular TEI markup, but
there are other kinds of metadata that it can be useful for the editor
to use in an electronic textual edition, some of which are discussed
by Lavagnino above. In particular, it may be that the TEI is
There are a number of approaches that can be taken to the creation of editions that will survive for the long term. One relatively straightforward one is to produce a fixed edition on some stable medium at regular stages in the life of the edition. This is the approach that is being taken for the
distribute editorial power among the users(86). This is an interesting approach as the
rhyzomaticnetwork of hypertext links. Fixing the text in one form and therefore being assured of its stability can perhaps allow for more experimentation in the electronic medium. It will be interesting to see in what way and over what period the editions diverge, and at what point there is felt to be the need for a new printed edition—if ever. If there ever
While this volume is mostly concerned with electronic
double editing
, editing first in discrete, then
in integrated media (155). This is a good approach to creating
preservable assets, and it makes editors think very hard about what
the different components of the edition are likely to be. It is also
a valuable approach to editing in teams: different team members can
prepare different parts of the edition which can be integrated later.
Indeed, this perhaps provides the nucleus of how one should conceive
of the individual components of an edition first, work out how to
capture, describe, and store them, and then work out what the links
between them might be.
Standards in this field are
changing constantly, and one of the key benefits the TEI has brought
to the world of textual editing is a standard that is durable yet
flexible. Standards
For text, the standard that should always be used is the ASCII
standard, with markup added that is also in ASCII; the markup can be
embedded or offset (see below), and, while there has been a great deal
of progress in the presentation of special characters through the
Unicode standard, it is preferable that characters are encoded as
entity references which can be displayed in Unicode than encoded as
Unicode itself: for historical reasons, some characters have multiple
representations, and there are some ancient and oriental languages
that do not yet have full Unicode support. The TEI is of course the
markup system of choice for most electronic editing projects starting
today. One choice to be made is whether markup (TEI or any other
kind) should be embedded in the text or
For image data, this should be captured at the best quality possible to reveal all significant information about the original, and then stored in a non-proprietary file format using only lossless compression (if compression is to be used at all). This will differ greatly depending on the source materials: modern printed documents might yield everything they have to offer scanned bitonally at 300dpi and stored as (for example) TIFF images with lossless compression. These images can be stored for the long term and will yield up all the information necessary now and in the future. Complex and perhaps damaged manuscript materials will need different treatment: these may need to be captured with the best systems currently available, at the highest possible resolution, stored as archival masters (currently TIFFs with no compression), and then have lower quality surrogates available for access. Such original images scanned with current professional digital cameras (as at mid 2003) can be up to 350 Mb each in size, which poses huge storage problems for projects and libraries. It is vital to consider right from the start what storage requirements a project has, and usually to double or even treble this to take account of all the extra files that will be produced along the way: delivery images, thumbnails, text files, metadata, etc. Think too about the networks that will have to move all this material around and the backup media that will also be needed.
Audio and video data is even more problematic storage requirements, being more memory hungry than still images. Uncompressed high quality video, for instance, has file sizes of around 28 Mb per second. Audio and video standards are currently moving very rapidly because of commercial developments in offering streaming audio and video over the internet. Any standards suggested here would be out of date immediately; it is vital that projects take the best expert advice that they can at the outset in order to establish formats and standards.
Other issues that greatly affect the long-term prospects of
electronic editions are in naming conventions used. A complex editing
project will likely produce many thousands of files that need to be
kept track of. These files may also be kept in different locations,
locations which may change over time. File-naming conventions should
be devised and documented at the early stages of an editing project.
It is often useful to have names assigned automatically as this can
reduce human error and can make batch renaming simpler and more
reliable if necessary later on. Bar codes, hashing functions, and
databases can all be used for assigning filenames. File locations can
be seriously problematic: the URL is notoriously prone to
difficulties, and some other way of indicating locations should be
found to avoid the dreaded ‘Error 404: FILE NOT FOUND’. Work is being
done on alternatives to URLs: Uniform Resource Names (URN) identify a
piece of information independent of its location: if the location
changes, the information can still be found. One particular type of
persistent identifier that has been adopted by a number of publishers
is the Digital Object Identifier (DOI). DOIs are persistent names
that link to some form of redirection, so that a digital object will
have the same DOI throughout its life, wherever it may reside. DOIs
need to be registered with an agency in order that the redirection
process can operate, but they offer good long-term prospects for
digital object naming.
All of the suggestions above relate to standards and conventions to
be used when producing the discrete components of an electronic
edition, but the whole point of editing in this new format is the
introduction of complexity in the interrelationships between the many
sub-parts of the edition, and it is this complexity that is likely to
be the most challenging problem in preserving electronic editions.
What creates most of the complexity in editions is the exponential
growth of numbers of links and in editions of even relatively short
works there can be millions of links created. Sometimes this complex
interlinking is managed by programs and interfaces—the most
vulnerable part of the edition. However, it is not necessary to rely
on programs and interfaces to provide linking capability: the TEI
specifies a number of mechanisms for the description and markup of
links
Finally, the most important thing an editor can do to ensure the development of preservable editions is to consult widely with experts in digital preservation, metadata, digital file formats etc, and with other editors who have been grappling with the same problems. Anyone who has read this far in this volume is well-equipped to begin an editing project with most of the tools they need at their fingertips.