periodicals such as newspapers). This final database contains bibliographic information for each of these 129 million editions (Ref. $1). The country of publication is known for 85.3% of these editions, authors for 87.8%, publication dates for 92.6%, and the language for 91.6%. Of the 15 million books scanned, the country of publication is known for 91.5%, authors for 92.1%, publication dates for 95.1%, and the language for 98.6%. 1.2. Digitization We describe the way books are scanned and digitized. For publisher-provided books, Google removes the spines and scans the pages with industrial sheet-fed scanners. For library-provided books, Google uses custom-built scanning stations designed to impose only as much wear on the book as would result from someone reading the book. As the pages are turned, stereo cameras overhead photograph each page, as shown in Figure $1. One crucial difference between sheet-fed scanners and the stereo scanning process is the flatness of the page as the image is captured. In sheet-fed scanning, the page is kept flat, similar to conventional flatbed scanners. With stereo scanning, the book is cradled at an angle that minimizes stress on the spine of the book (this angle is not shown in Figure $1). Though less damaging to the book, a disadvantage of the latter approach is that it results in a page that is curved relative to the plane of the camera. The curvature changes every time a page is turned, for several reasons: the attachment point of the page in the spine differs, the two stacks of pages change in thickness, and the tension with which the book is held open may vary. Thicker books have more page curvature and more variation in curvature. This curvature is measured by projecting a fixed infrared pattern onto each page of the book, subsequently captured by cameras. When the image is later processed, this pattern is used to identify the location of the spine and to determine the curvature of the page. Using this curvature information,