Papers by Laurent Duplouy
La Gazette des archives, 2016
Archiving Conference, 2017

A number of the world's major libraries are embarking on large-scale projects to digitize books, ... more A number of the world's major libraries are embarking on large-scale projects to digitize books, journals, newspapers, and other printed materials. The main purpose of these mass digitization initiatives is to make published content visible to text indexing engines and accessible online for viewing and printing. The projects also present significant archiving challenges: capabilities must be developed within libraries to manage the hundreds of millions of files comprising their master volumes. Even with the cheap disks and fast networks available today, projects of this scale must implement space-efficient imaging strategies to minimize long-term storage costs and maximize efficiencies for processing tasks such as file transfer, dynamic generation of deliverables, and migration. Libraries have had to accept that page image masters must be compressed, and that lossless compression of grayscale and color data will not achieve the efficiencies they are seeking in mass digitization. In this paper, we present findings from studies coordinated by the California Digital Library, the Internet Archive, the Harvard University Library, and the Bibliothèque nationale de France to evaluate relationships between file size and perceived image quality for lossy compressed JPEG 2000 (JP2) images. We employed similar, but not identical, methods to create small test suites of source page images, which were then processed by four command-line JP2 codecs to produce images that observers rated from "perfect" to "unacceptable." We present viable technical profiles for lossy JP2 encoding of page image masters, with recommended settings for selected command-line codecs. We are maintaining test suites of digitized book pages and invite others to use them to extend efforts to develop robust image processing algorithms that balance quality and file size in a variety of page image products.
Uploads
Papers by Laurent Duplouy