A Summary of a Report Published by the Council on Library and Information Resources
Preservation in the Age of Large-Scale Digitization
A White Paper
By Oya Y. Rieger
The digitization of millions of books under programs such as Google Book Search and Microsoft Live Search Books is dramatically expanding our ability to search and find information. The aim of these large-scale projects-to make content accessible-is interwoven with the question of how one keeps that content, whether digital or print, fit for use over time.
Preservation in the Age of Large-Scale Digitization, by Oya Y. Rieger, examines large-scale digital initiatives (LSDIs) to identify issues that will influence the availability and usability, over time, of the digital books these projects create. Ms. Rieger is interim assistant university librarian for digital library and information technologies at the Cornell University Library.
The paper describes four large-scale projects-Google Book Search, Microsoft Live Search Books, Open Content Alliance, and the Million Book Project-and their digitization strategies. It then discusses a range of issues affecting the stewardship of the digital collections they create: selection, quality in content creation, technical infrastructure, and organizational infrastructure. The paper also attempts to foresee the likely impacts of large-scale digitization on book collections.
Given the enormous investment that digitizing partners and participating libraries are making in LSDIs, how can we secure-or improve-a long-term return on the LSDI investment? Ms. Rieger addresses this question with 13 recommendations, summarized as follows.
- Reassess Digitization Requirements for Archival Images. Prevailing digitization standards and best practices were established 15 years ago. We need new digitization metrics that are based on current imaging technologies, quality assessment tools, archiving practices, and evolving user needs. To evaluate the suitability of digital objects for preservation, it may be useful to conduct a systematic image-quality study based on inspection of sample images and associated metadata.
- Develop a Feasible Quality Control Program. We need to reassess the quality control (QC) policies, tools, and workflows that were created to support small-scale digitization projects and to acknowledge that it is neither practical nor feasible to apply existing QC protocols to LSDIs. Creating good-quality images during the initial capture should be emphasized. The library community should negotiate rigorous technical specifications with digitization partners to ensure that QC is an assurance process rather than a frontline strategy for catching missing or unacceptable images.
- Seek Compromise to Balance Preservation and Access Requirements. Because of the scale of LSDIs, participating institutions are seeking compromises in digitizing practices. LSDI institutions must also implement space-efficient digitization strategies to reduce long-term storage costs and boost transmission efficiency. Our community needs to reach agreement about what is “good enough” quality in LSDIs and to clarify what future needs the digital collections are intended to serve.
- Enhance Access to Digitized Content. Digital content that is not used is prone to loss. Thus, archiving investments will be more worthwhile if efforts are made to improve discovery, access, and delivery. Some LSDI libraries plan to experiment with enhanced access and with discovery tools and text-mining techniques. This can be accomplished only if the libraries pool their resources and build on each other’s accomplishments.
- Understand the Impact of Contractual Restriction on Preservation Responsibilities. Commercial LSDI partners often restrict the sharing of full-text digitized content. Such restrictions may impede some preservation strategies, such as redundancy arrangements. The library community will benefit from forming a united front to address with commercial partners the limitations they place on their copies of digital materials.
- Support Shared Print-Storage Initiatives. Increasingly, research institutions will be pressured to justify investments in maintaining their legacy print collections. Consolidation of holdings in a shared storage environment can save space and offer better environmental controls. National and regional shared-storage efforts demonstrating strong leadership need firm support from the library community.
- Promote the Use of the DLF/OCLC Registry of Digital Masters. The DLF/OCLC Registry of Digital Masters is a central place for libraries to search for digitally preserved materials. The registry can record an array of information that will support the preservation of content and the planning of future digitization efforts. Rather than relying on LSDI libraries to register digitized content, it may be more effective for OCLC to work with Google, Microsoft, OCA, and the Million Books Project to automatically ingest and record such information, with pointers to the university’s digital copies.
- Outline a Large-Scale Digitization Initiative Archiving Action Agenda. A wide range of archival models and policies has been customized to suit institutional goals, resources, and content types; however, diversity of preservation strategies has merit as the library community continues to learn about the various options and approaches. Although a joint archival solution is ideal, the collaboration agenda is not limited to providing a common preservation repository.
- Devise Policies for Designating Digital Preservation Levels. Organizationally and financially, we cannot keep and preserve all digital content at the same level of service and functionality. LSDI libraries must therefore determine the extent and type of their preservation efforts. There are two options: all files can be automatically preserved at the same level; or metrics may be used to make a decision on the basis of the material’s perceived value and use. This topic is worth exploring further by means of a risk analysis of cost-efficient preservation strategies for low-use content.
- Capture and Share Cost Information. Participating libraries invest much time in a range of tasks associated with LSDIs. It is important to document the expenses for all the partners associated with LSDIs. Often neglected or underestimated in cost analysis are the accumulated investments that libraries have made in selecting, purchasing, housing, and preserving their collections.
- Revisit Library Priorities and Strategies. Libraries are under increasing pressure to focus digital preservation efforts on unpublished and born-digital information. It is important to try to define the role of LSDIs within the broader scope of library activities and midterm strategies.
- Shift to an Agile and Open Planning Model. Traditional strategic planning and consensus models are unlikely to support the decision-making processes of research libraries in today’s fluid information technology environment. Libraries must develop scalable and flexible infrastructures that facilitate rapid execution, and they must be willing to take calculated risks.
- Reenvision Collection Development for Research Libraries. Research libraries must consider how future selection and acquisition decisions will be shaped in light of increased online content and worldwide access to core collections.
Many of the paper’s recommendations will require collaboration among cultural institutions. In her conclusion, the author asks, “What makes the LSDI agenda appealing enough to overcome the barriers to collaboration and what are the incentives to work together?” She then addresses this question from the perspectives of stewardship responsibility, enduring access, cost-effectiveness, and the future role of research libraries.
More About this Report
Preservation in the Age of Large-Scale Digitization. A White Paper.
Oya Y. Rieger.
February 2008. ISBN 978-1-932326-29-1. 52 pages.
Report text is available free at http://staging-clir.wordpress.clir.org/pubs/abstract/pub141. Print copies can be ordered at this URL for $20 per copy plus shipping.