home

epstein-data
Research ▼
🔍 SearchFull-text document search 🤖 Ask AIAI research assistant 🔎 Evidence MapFBI serial resolution 📷 Reverse Image SearchCLIP + face across 614K images 🧑 Find Face BETASearch 29K faces by photo 💻 Run Your OwnDownload & search locally
Explore ▼
📚 Full Text Corpus1.39M docs, 2.77M pages 🌎 Global Heatmap145 countries mentioned 📈 Coverage MapWhat's here 🌌 AtlasSemantic map · 1.29M docs ⚖ Cases53 federal & state cases · per-case briefings 🎤 DepositionsTranscribed audio & video 💬 Hear from the SurvivorsSurvivors in their own words 📖 Cover to Cover-Up24-hour public reading, synced to the video ✉ Wolff–Epstein Emails2,009 messages · 2009–2019
📷 Images92K analyzed photographs 🔍 Multi-DB SearchSearch all databases individually 🗃 All Databases14 searchable databases
Entities Reports
News ▼
📰 NewsCoverage & reporting ⚖ Justice MonitorArrests, charges, lawsuits, firings
Source ▼
🏛 DOJ ProductionOfficial EFTA disclosures 📜 EFTA Law TextPublic Law 119-38 📁 Source Data (GitHub)Open source databases
🌐 Community ResourcesCurated external projects ✉ ContactGeneral · privacy · DMCA · press
❤️ Donate 🎧 Podcast

Research

🔍 Search Documents 🤖 Ask AI 🔎 Evidence Map 📷 Reverse Image Search 🧑 Find Face BETA 💻 Run Your Own Investigator

Explore

📚 Full Text Corpus 🌎 Global Heatmap 📈 Coverage Map 🌌 Atlas ⚖ Cases 🎤 Depositions 💬 Hear from the Survivors 📖 Cover to Cover-Up ✉ Wolff–Epstein Emails 📷 Images 🔍 Multi-DB Search 🗃 All Databases

Entities

👥 Entity Directory

Reports

Browse All Reports 📰 News ⚖ Justice Monitor

Source

🏛 DOJ Production 📜 EFTA Law 📁 Source Data (GitHub) 🌐 Community Resources ✉ Contact
🎧 Podcast & Newsletter ❤️ Donate Privacy Policy

HOUSE_OVERSIGHT_017021

← Prev Next →
Loading document…

3) Quality of OCR. This varies from corpus to corpus as described above. For English, we spent a great deal of time examining the data by hand as an additional check on its reliability. The other corpora may not be as reliable. 4) Quality of Metadata. Again, the English language corpus was checked very carefully and systematically on multiple occasions, as described above and in the following sections. The metadata for the other corpora may not be equally reliable for all periods. In particular, the Hebrew corpus during the 19th century is composed largely of reprinted works, whose original publication dates farpredate the metadata date for the publication of the particular edition in question. This must be borne in mind for researchers intent on working with that corpus. In addition to these four general issues, we note that earlier portions of the Hebrew corpus contain a large quantity of Aramaic text written in Hebrew script. As these texts often oscillate back and forth between Hebrew and Aramaic, they are particularly hard to accurately classify. All the above issues will likely improve in the years to come. In the meanwhile, users must use extra caution in interpreting the results of culturomic analyses, especially those based on the various non- English corpora. Nevertheless, as illustrated in the main text, these corpora already contain a great treasury of useful material, and we have therefore made them available to the scientific community without delay. We have no doubt that they will enable many more fascinating discoveries. III.0.2 On the number of books published In the text, we report that our corpus contains about 4% of all books ever published. Obtaining this estimate relies on knowing how many books are in the corpus (5,195,769) and estimating the total number of books ever published. The latter quantity is extremely difficult to estimate, because the record of published books is fragmentary and incomplete, and because the definition of book is its

Suggest a category
Misclassified? Pick a better fit.
Community Notes
▸ People Mentioned
▸ Interest Level
Routine Notable Significant
▸ Dates Mentioned
▸ Related Topics
▸ Places & Organizations
▸ Transcription Correction
▸ Research Notes 0
No notes yet.
Related documents
Source Data Investigation Reports DOJ EFTA CC BY-NC-SA 4.0 Contact
Independent research project. Not affiliated with the U.S. Department of Justice, FBI, any government agency, or Anthropic. All analytical text on this site is AI-generated (Claude, Anthropic) and iteratively fact-checked against source documents, but may contain errors. Verify all claims against linked EFTA sources before citing.
Powered by Datasette  ·  ❤️ Buy me a coffee

You are leaving epstein-data.com

You are being redirected to an external website not operated by this project. We are not responsible for the content or privacy practices of external sites.

Powered by Datasette