(“3.14159”) and typos (“excesss”). An n-gram is sequence of enormous growth: the addition of ~8500 words/year has 1-grams, such as the phrases “stock market” (a 2-gram) and increased the size of the language by over 70% during the last “the United States of America” (a 5-gram). We restricted n to fifty years (Fig. 2A). 5, and limited our study to n-grams occurring at least 40 times Notably, we found more words than appear in any in the corpus. dictionary. For instance, the 2002 Webster’s Third New Usage frequency is computed by dividing the number of International Dictionary [W3], which keeps track of the instances of the n-gram in a given year by the total number of contemporary American lexicon, lists approximately 348,000 words in the corpus in that year. For instance, in 1861, the 1- single-word wordforms (/0); the American Heritage gram “slavery” appeared in the corpus 21,460 times, on Dictionary of the English Language, Fourth Edition (AHD4) 11,687 pages of 1,208 books. The corpus contains lists 116,161 (//). (Both contain additional multi-word 386,434,758 words from 1861; thus the frequency is 5.5x10°. entries.) Part of this gap is because dictionaries often exclude “slavery” peaked during the civil war (early 1860s) and then proper nouns and compound words (“whalewatching”). Even again during the civil rights movement (1955-1968) (Fig. 1B) accounting for these factors, we found many undocumented In contrast, we compare the frequency of “the Great War” words, such as “aridification” (the process by which a to the frequencies of “World War I” and “World War II.” “the geographic region becomes dry), “slenthem” (a musical Great War” peaks between 1915 and 1941. But although its instrument), and, appropriately, the word “deletable.” frequency drops thereafter, interest in the underlying events This gap between dictionaries and the lexicon results from = had not disappeared; instead, they are referred to as “World a balance that every dictionary must strike: it mu