home

epstein-data
Research ▼
🔍 SearchFull-text document search 🤖 Ask AIAI research assistant 🔎 Evidence MapFBI serial resolution 📷 Reverse Image SearchCLIP + face across 614K images 🧑 Find Face BETASearch 29K faces by photo 💻 Run Your OwnDownload & search locally
Explore ▼
📚 Full Text Corpus1.39M docs, 2.77M pages 🌎 Global Heatmap145 countries mentioned 📈 Coverage MapWhat's here 🌌 AtlasSemantic map · 1.29M docs ⚖ Cases53 federal & state cases · per-case briefings 🎤 DepositionsTranscribed audio & video 💬 Hear from the SurvivorsSurvivors in their own words 📖 Cover to Cover-Up24-hour public reading, synced to the video ✉ Wolff–Epstein Emails2,009 messages · 2009–2019
📷 Images92K analyzed photographs 🔍 Multi-DB SearchSearch all databases individually 🗃 All Databases14 searchable databases
Entities Reports
News ▼
📰 NewsCoverage & reporting ⚖ Justice MonitorArrests, charges, lawsuits, firings
Source ▼
🏛 DOJ ProductionOfficial EFTA disclosures 📜 EFTA Law TextPublic Law 119-38 📁 Source Data (GitHub)Open source databases
🌐 Community ResourcesCurated external projects ✉ ContactGeneral · privacy · DMCA · press
❤️ Donate 🎧 Podcast

Research

🔍 Search Documents 🤖 Ask AI 🔎 Evidence Map 📷 Reverse Image Search 🧑 Find Face BETA 💻 Run Your Own Investigator

Explore

📚 Full Text Corpus 🌎 Global Heatmap 📈 Coverage Map 🌌 Atlas ⚖ Cases 🎤 Depositions 💬 Hear from the Survivors 📖 Cover to Cover-Up ✉ Wolff–Epstein Emails 📷 Images 🔍 Multi-DB Search 🗃 All Databases

Entities

👥 Entity Directory

Reports

Browse All Reports 📰 News ⚖ Justice Monitor

Source

🏛 DOJ Production 📜 EFTA Law 📁 Source Data (GitHub) 🌐 Community Resources ✉ Contact
🎧 Podcast & Newsletter ❤️ Donate Privacy Policy

HOUSE_OVERSIGHT_017020

← Prev Next →
Loading document…

The computations required to generate these corpora were performed at Google using the MapReduce framework for distributed computing (Ref $5). Many computers were used as these computations would take many years on a single ordinary computer. Note that the ability to study the frequency of words or phrases in English over time was our primary focus in this study. As such, we went to significant lengths to ensure the quality of the general English corpora and their date metadata (i.e., Eng-all, Eng-1M, and Eng-Modern-1M). As a result, the accuracy of place- of-publication data in English is not as reliable as the accuracy of date metadata. In addition, the foreign language corpora are affected by issues that were improved and largely eliminated in the English data. For instance, their date metadata is not as accurate. In the case of Hebrew, the metadata for language is an oversimplification: a significant fraction of the earliest texts annotated as Hebrew are in fact hybrids formed from Hebrew and Aramaic, the latter written in Hebrew script. The size of these base corpora is described in Tables S3-S6. III. Culturomic Analyses In this section we describe the computational techniques we use to analyze the historical n-grams corpora. III.0. General Remarks III.0.1 On Corpora. There is significant variation in the quality of the various corpora during various time periods and their suitability for culturomic research. All the corpora are adequate for the uses to which they are put in the paper. In particular, the primary object of study in this paper is the English language from 1800-2000; this corpus during this period is therefore the most carefully curated of the datasets. However, to encourage further research, we are releasing all available datasets - far more data than was used in the paper. We therefore take a moment to describe the factors a culturomic researcher ought to consider before relying on results of new queries not highlighted in the paper. 1) Volume o

Suggest a category
Misclassified? Pick a better fit.
Community Notes
▸ People Mentioned
▸ Interest Level
Routine Notable Significant
▸ Dates Mentioned
▸ Related Topics
▸ Places & Organizations
▸ Transcription Correction
▸ Research Notes 0
No notes yet.
Related documents
Source Data Investigation Reports DOJ EFTA CC BY-NC-SA 4.0 Contact
Independent research project. Not affiliated with the U.S. Department of Justice, FBI, any government agency, or Anthropic. All analytical text on this site is AI-generated (Claude, Anthropic) and iteratively fact-checked against source documents, but may contain errors. Verify all claims against linked EFTA sources before citing.
Powered by Datasette  ·  ❤️ Buy me a coffee

You are leaving epstein-data.com

You are being redirected to an external website not operated by this project. We are not responsible for the content or privacy practices of external sites.

Powered by Datasette