II.3B. Generation of historical n-grams corpora To generate a particular historical n-grams corpus, a subset of book editions is chosen to serve as the base corpus. The chosen editions are divided by publication year. For each publication year, total counts for each n-gram are obtained by summing n-gram counts for each book edition that was published in that year. In particular, three counts are generated: (1) the total number of times the n-gram appears; (2) the number of pages on which the n-gram appears; and (3) the number of books in which the n-gram appears. We then generate tables showing all three counts for each n-gram, resolved by year. In order to ensure that n-grams could not be easily used to identify individual text sources, we did not report counts for any n-grams that appeared fewer than 40 times in the corpus. (As a point of reference, the total number of 1- grams that appear in the 3.2 million books written in English with highest date accuracy (‘eng-all’, see below) is 360 billion: a 1-gram that would appear fewer than 40 times occurs at a frequency of the order of 10° ') As a result, rare spelling and OCR errors were also omitted. Since most n-grams are infrequent, this also served to dramatically reduce the size of the n-gram tables. Of course, the most robust historical trends are associated with frequent n-grams, so our ability to discern these trends was not compromised by this approach. By dividing the reported counts by the corpus size (measured in either words, pages, or books), it is possible to determine the normalized frequency with which an n-gram appears in the base corpus. Note that the different counts can be used for different purposes. The usage frequency of an n-gram, normalized by the total number of words, reflects both the number of authors using an n-gram, and how frequently they use it. It can be driven upward markedly by a single author who uses an n-gram very frequently, for instance in a biography of 'Gottlieb Daimler’ whi