HOUSE_OVERSIGHT_017022

← Prev Next →

Loading document…

a particular n-gram in year X as shown in the plots is the mean of the raw frequency value for the n-gram in the year X, the year X-1, and the year X+1. Note that for each n-gram in the corpus, we can provide three measures as a function of year of publication: 1- the number of times it appeared 2- the number of pages where it appeared 3- the number of books where it appeared. Throughout the paper, we make use only of the first measure; but the two others remain available. They are generally all in agreement, but can denote distinct cultural effects. These distinctions are not explored in this paper. For example, we give in Appendix measures for the frequency of the word ‘evolution’. In the first three columns, we give the number of times it appeared, the normalized number of times it appeared (relative to #words that year), the normalized number of pages it appeared in, and the normalized number of books it appeared in, as a function of the date. III.1B. Multiple Query/Cohort Timelines Where indicated, timeline plots may reflect the aggregates of multiple query results, such as a cohort of individuals or inventions. In these cases, the raw data for each query we used to associate each year with a set of frequencies. The plot was generated by choosing a measure of central tendency to characterize the set of frequencies (either mean or median) and associating the resulting value with the corresponding year. Such methods can be confounded by the vast frequency differences among the various constituent queries. For instance, the mean will tend to be dominated by the most frequent queries, which might be several orders of magnitude more frequent than the least frequent queries. If the absolute frequency of the various query results is not of interest, but only their relative change over time, then individual query results may be normalized so that they yield a total of 1. This results in a probability mass function for each query describing the likelihood that a rand

HOUSE_OVERSIGHT_017022 — Epstein Files

This document is part of the DOJ Epstein Files Transparency Act production (Public Law 119-38) — a corpus of 1,416,848 documents (2,915,593 pages) including prosecution files, FBI investigation records, court filings, and defense materials.

Enable JavaScript to view full page images, metadata, and cross-references.

Search the corpus: epstein-data.com/search

REST API: Get full document data as JSON

PDF: Download original PDF

Source Data Investigation Reports DOJ EFTA CC BY-NC-SA 4.0 Contact

You are leaving epstein-data.com

You are being redirected to an external website not operated by this project. We are not responsible for the content or privacy practices of external sites.

Research

Explore

Entities

Reports

Source

HOUSE_OVERSIGHT_017022

You are leaving epstein-data.com