Carnegie Mellon University

The Value of Digital Libraries

The original vision of the Million Book Project was to create a free-to-read, searchable collection of one million books, available to everyone over the Internet. By any measure, it succeeded, and now includes over 1.5 million books. Google Books contains over 40 million titles. But how are such large book collections actually used? Do people read the books?

Machines are more diligent readers than humans, and are able to ingest vastly more material. Even at one page per minute, 16 hours a day, for 100 years, a human can only read about 100,000 books in a lifetime, and very little would actually be retained at that pace. Several of the world’s largest bookstores have over ten times that number just on their shelves. Google Books would require 400 lifetimes to read.

By far, the consumers of large digital book collections are computer programs, not people. Their primary function is indexing and retrieval, but APIs are sometimes offered to allow development of sophisticated applications. Increasingly, these collections have been used as input to machine learning programs. This talk focuses on automated uses of digital libraries to enhance human efficiency and amplify the capabilities of information systems.