II. Construction of Historical N-grams Corpora As noted in the paper text, we did not analyze the entire set of 15 million books digitized by Google. Instead, we 1. Performed further filtering steps to select only a subset of books with highly accurate metadata. 2. Subdivided the books into ‘base corpora’ using such metadata fields as language, country of publication, and subject. 3. For each base corpus, construct a massive numerical table that lists, for each n-gram (often a word or phrase), how often it appears in the given base corpus in every single year between 1550 and 2008. In this section, we will describe these three steps. These additional steps ensure high data quality, and also make it possible to examine historical trends without violating the ‘fair use’ principle of copyright law: our object of study is the frequency tables produced in step 3 (which are available as supplemental data), and not the full-text of the books. II.1. Additional filtering of books II.1A. Accuracy of Date-of-Publication metadata Accurate date-of-publication data is crucial component in the production of time-resolved n-grams data. Because our study focused most centrally on the English language corpus, we decided to apply more stringent inclusion criteria in order to make sure the accuracy of the date-of-publication data was as high as possible. We found that the lion's share of date-of-publication errors were due to so-called 'bound-withs' - single volumes that contain multiple works, such as anthologies or collected works of a given author. Among these bound-withs, the most inaccurately dated subclass were serial publications, such as journals and periodicals. For instance, many journals had publication dates which were erroneously attributed to the year in which the first issue of the journal had been published. These journals and serial publications also represented a different aspect of culture than the books did. For these reasons, we decided to filter out all serial pub