IlI.4C. Dictionary Coverage To determine the coverage of the OED and Merriam-Webster’s Unabridge Dictionary (MW), we performed the above analysis on randomly generated subsets of the lexicon in eight frequency deciles (ranging from 10°- 10° to 10° - 10°). The samples contained 500 candidate words each for all but the top 3 deciles; the samples corresponding to the top 3 deciles (10° - 10%, 10*- 10°, 10°- 10°) contained 100 candidate words each. A native speaker with no knowledge of the experiment being performed determined which words from our random samples fell into the P, B, or R categories (to enable a fair comparison, we excluded the N category from our analysis as both OED an MW exclude them). The annotator then attempted to find a definition for the words in both the online edition of the Merriam-Webster Unabridged Dictionary or in the online version of the Oxford English Dictionary’s 2™ edition. Notably, the performance of the latter was boosted appreciably by its inclusion of Merriam-Webster’s Medical Dictionary. Results of this analysis are shown in Appendix. To estimate the fraction of dark matter in the English language, we applied the formula: sum over all deciles of Pworg*Poepiw*Nigram, with: - Nigram the number of 1grams in the decile - Pworg the proportion of words (R,B or P) in this decile - Poepmw the proportion of words of that decile that are covered in OED or MW. We obtain 52% of dark matter, words not listed in either MW or the OED. With the procedure above, we estimate the number of words excluding proper nouns at 572,000; this results in 297,000 words unlisted in even the most comprehensive commercial and historical dictionaries. II.4D. Analysis New and Obsolete words in the American Heritage Dictionary We obtained a list of the 4804 vocabulary items that were added to the AHD4 in 2000 from the dictionary’s editorial staff. These 4804 words were not in AHD3 (1992) — although, on rare occasions a word could have featured in earlier editions