TEI Tite Digitization Benefit: Request for Proposals


TEI Tite: Request for Proposals

Corrections and Clarifications

Please note the following corrections and clarifications to the RFP. Vendors should subscribe to the TEI news feed to ensure they are alerted to any changes to this document.

Timetable

  • May 13, 2009 TEI RFP distributed to vendors
  • May 29, 2009 TEI deadline for receipt of vendor questions
  • June 12, 2009 TEI deadline to provide responses to vendor questions
  • June 22, 2009 TEI deadline for receipt of vendor proposals
  • July 14, 2009 TEI announces short list of vendor finalists, start of testing
  • July 31, 2009 TEI testing period ends for vendor finalists
  • August 30, 2009 TEI announces vendor selection

The TEI reserves the right to modify this schedule.

Questions and comments

Questions regarding this proposal should be directed via-email to Daniel O'Donnell, Chair and CEO of the TEI, at digitization@tei-c.org.

Introduction

About TEI

The Text Encoding Initiative (TEI; additional information at: http://www.tei-c.org/index.xml) is a community-based consortium responsible for establishing and maintaining guidelines for encoding machine-readable texts for the purpose of literary and linguistic study. The history of TEI is documented on its website (http://www.tei-c.org/About/history.xml). In short, the initiative was established in 1987, incorporated as a nonprofit organization in 2000, and today represents 81 institutions in 19 countries (http://www.tei-c.org/Membership/current.xml).

Libraries, presses, and scholarly projects use TEI Guidelines (now in their fifth edition) to mark up the electronic text collections they produce out of their own holdings. For example, the University of Virginia (UVa), at http://www.tei-c.org/Membership/current.xml, used TEI Guidelines to encode machine-readable versions of a substantial subset of the UVa Libraries holdings of first editions of American fiction published between 1789 and 1875.  Currently TEI members digitize, on average, approximately 176,000 pages of primary (print, handwritten, and microfilm) materials per year, emanating from book, serials, newspaper, and manuscript collections.

About TEI Tite

In 2007, the TEI Executive Board and Technical Council developed a customized schema out of existing TEI Guidelines — TEI Tite (http://www.tei-c.org/Membership/current.xml) — to benefit TEI members working with keyboarding vendors. TEI developed TEI Tite to allow many of its smaller members and scholarly projects to procure digitization services according to a standardized schema in a coordinated, discounted fashion.

About this Request for Proposals

In 2008, the TEI conducted an intensive and exhaustive survey of member needs. As a result of this poll, TEI identified access to high-quality digitization services as an attractive benefit for its membership. The goal of such a program would be to allow TEI's members to pool their digitization work in order to take advantage of volume-based discounts and workflow from preferred vendor(s). For their part, the vendor(s) selected by TEI would be able to reduce their own setup costs by grouping together smaller jobs that are to be digitized according to identical standards. Based on its polling of its member institutions, TEI estimates that the contract value of the TEI work described here may range from hundreds of thousands to millions of dollars per year.

Proposal Requirements and Evaluation Criteria

The purpose of this request for proposals (RFP) is to enable TEI to identify and select appropriate vendor(s) to work with TEI members' material. This material falls into several broad classes:

  • Printed material (primarily 19th and 20th Century books, serial, and newpapers) in Western character sets and modern typefaces
  • Printed material and material in non-Western character sets or unusual or obsolete typefaces (e.g. Pre-19th Century, "Gothic," etc.)
  • Handwritten material.

The proposed program will involve the aggregation of digitization jobs into lots that allow TEI members collectively to achieve volume discounts from the preferred vendor(s). Proposals should detail:

  • The work flow for aggregation, processing, quality assurance, and billing of projects, including minimum and maximum turnaround times
  • The volume thresholds (minimum and maximum) for aggregated job size that can be accommodated
  • The pricing structure for the program (including thresholds for incremental volume discounts if any)
  • The provisions for quality assurance and mechanisms of redress in case of output not meeting agreed quality standards

Price quotes for OCR should be at 95% 99.95% accuracy; re-keying at 99.9% 99.995% accuracy [Note (2009-05-19 1800 UTC): Please note correction to accuracy figures].

The key competitive parameters (among proposals meeting the basic technical requirements) will be:

  • Price per page to our members
  • Ease of administration for the TEI: i.e. workflow issues surrounding the collection, aggregation, processing, billing, and return of members' projects as part of this program, and of distinguishing and negotiating separate prices for unusual or difficult material
  • Minimum aggregate job size: i.e. the minimum number of pages our members would need to aggregate before a job would be run
  • Length of maximum promised turnaround time for any one member's submission: i.e. the maximum time an individual member would have to wait before a job would be run

From the TEI's perspective, we are looking for the program that offers the cheapest possible price to our members on the smallest possible job size with the fastest possible turnaround time and the least administrative effort for the TEI. Of these criteria, price and ease of administration are the most important. Minimum requirements for the program are:

  • that all TEI members with digitization jobs should be able to achieve some discount over directly negotiated prices through this program (even if these discounts are variably priced depending on the type of material submitted for digitization),
  • that this discount should be significant enough to make membership attractive enough to get institutions to join in numbers that make it feasible for the TEI to cover any costs or administrative time required to service the benefit: the more administrative work the proposal requires from the TEI, the deeper the discount will need to be in order to cover our costs.

Minimum aggregate job size and maximum promised delivery time are less crucial factors--provided they are not such as to make the program unpalatable to our membership. But these will help us decide among bids with comparable price and administrative advantages.

In support of this work, the TEI is able to supply real-time membership information to the vendor and is willing to work with the vendor on quality assurance and dispute resolution should problems arise. In the case of proposals that require the development of new portal- or webstore-based systems, the TEI is also willing to work with vendors in the initial design and implementation of the system.  The overall quality and completeness of the vendor’s response will also be taken into account, and successful candidates will be expected to provide assurances of their financial viability.

Instructions to Vendors

The competition will be run in two stages.  In the first stage, vendors are invited to submit expressions of interest in which they address questions of price, program administration, minimum aggregated job-sizes, maximum turn-around time, and differential practice for handling specific types of primary material (if necessary). Vendors should discuss their accuracy, quality control procedures and financial viability at this stage, although no test of accuracy will be required at this point.  For the second stage, selected vendors will be asked to complete a test digitization and encoding project using TEI Tite and a variety of sample materials. The TEI will be seeking from vendors full-text, machine-readable versions of the source materials. A set of representative sample files are included with this RFP for informational purposes; a second set will be issued for use in the second stage of the competition.

The TEI has a strong preference for working with a single supplier. At the same time, the TEI recognizes that the range of material TEI members are interested in digitizing may require some flexibility in policies on workflow, pricing, aggregation, and turnaround time. For this reason the TEI is prepared to consider bids that propose to implement different processes for different types of material (for example, a flat rate price and automated workflow for some types of material with parameters for negotiated contracts and custom workflows for others) or (less optimally) bids focused on one or more types of material. Regardless of the material in question, however, all vendor proposals should enable TEI members to achieve significant savings and/or other benefits on their digitization work as a result of their participation in this program and enable the TEI to cover the cost of the program by increasing its membership base.

In conducting the digitization program, the TEI will assess vendor work on the basis of accuracy of transcription and accuracy of tagging, using randomly chosen samples. Work that does not meet the specified levels of transcriptional accuracy or tagging will be sent back for redigitization. 

Notwithstanding anything to the contrary set forth herein, TEI expressly reserves the right, in its sole discretion, to qualify, accept or reject any vendor or proposal.  After the completion of the testing phase, the TEI will announce the preferred vendor(s) and begin work on implementing the program for announcement at the November 2009 TEI Conference and Members' Meeting at the University of Michigan.

Appendix I Technical Requirements

Transcribed materials should be encoded according to the guidelines for TEI Tite. Documentation for Tite can be found at http://www.tei-c.org/release/doc/tei-p5-exemplars/html/tei_tite.doc.html

DTDs and schemas are available by following the Tite-related links at http://www.tei-c.org/Guidelines/Customization/. As reference, vendors may use the TEI Tite DTD available at http://www.tei-c.org/release/xml/tei/custom/schema/dtd/tei_tite.dtd. All transcriptions must at a minimum validate against this DTD. [Note (2009-05-19 1800 UTC): Please note addition of direct link to relevant DTD.]

Appendix II Sample Materials

The attached file (tite.samples.zip), contains samples of material similar to that commonly digitized by our membership. These or similar files will be used in the second phase of the RFP:

  • newspaper.jpg: a 1923 article from the Times (London) describing King Tut's tomb, to test ability to handle basic encoding of newspaper structure
  • early-modern-printing.pdf: The first Canto of the 1590 Faerie Queene, to test ability to encode simple verse structure & deal with Early Modern printing; please only transcribe pages 1 through 3 (the rest is given for context)
  • manuscript.jpg: A handwritten court document, to test nineteenth-century manuscript transcription
  • non-latin-charset.pdf: Excerpt from printed union publication, in Russian, to test ability to transcribe non-Latin material.