The Impunity Index | Epstein Case NLP Analysis

0

Documents Analyzed

0

Individuals Tracked

0

ML Models Trained

0

Face No Consequences

Connection Network

Co-occurrence relationships between individuals across case documents. Node size reflects impunity index. Click any node for details.

The Individuals

Sort

Geographic Distribution

Individuals mapped by country of origin. Node size reflects average impunity index. Click a country to filter individuals.

Semantic Space

t-SNE projection of 384-dimensional person embeddings built from 1.4M case documents. Each point is a person — proximity reflects similar document contexts. Color = impunity level.

Models & Evaluation

Three complementary ML approaches evaluated with multiple metrics and stress tests to understand model limitations and guide improvement.

ML Pipeline

01

Feature Extraction

7 tabular features per person: Epstein email/EFTA document count, DOJ corpus mentions, keyword co-occurrence with abuse-related terms, Epstein flight legs, person-to-person connections, black book inclusion, and a raw evidence index.

02

Semantic Embeddings

all-MiniLM-L6-v2 (SentenceTransformer) encodes each person's full document evidence text into 384-dimensional vectors via sliding-window mean pooling, capturing contextual meaning across 1.4 million documents.

03

Classification

Two models trained on labeled individuals: Logistic Regression on 7 tabular features (class-balanced, StandardScaler). ST + SVC combines 384-dim embeddings with tabular features into a 391-dim space, classified by a calibrated LinearSVC.

04

Consensus

Final probability is the mean of both models' calibrated scores. Inference runs over all 1,264 individuals, producing per-person consequence probabilities that surface patterns invisible to simple rule-based approaches.

Feature Ablation Study

How much does each feature contribute? Bars show F1 score when each feature is removed. Dashed line = full model baseline.

The Gap

Does evidence of involvement predict real-world consequences?

THE IMPUNITY
INDEX

Connection Network

The Individuals

Geographic Distribution

Semantic Space

Models & Evaluation

ML Pipeline

Feature Extraction

Semantic Embeddings

Classification

Consensus

Feature Ablation Study

The Gap

Impunity Gap: Evidence vs. Consequences

Content Warning

THE IMPUNITYINDEX

ML Pipeline

Feature Extraction

Semantic Embeddings

Classification

Consensus

Feature Ablation Study

Impunity Gap: Evidence vs. Consequences

Content Warning

THE IMPUNITY
INDEX