Project Charta – Epstein Files Network Analysis

1 Context and Scope

This project analyzes the publicly released Epstein Files – legal documents, emails, and records published by the U.S. Department of Justice and the U.S. House Oversight Committee. The dataset is sourced from HuggingFace (Nikity/Epstein-Files), which provides pre-extracted text with direct DOJ source URLs for provenance verification.

The project addresses the following research question:

Which network structure underlies the documented connections of Jeffrey Epstein, and which persons or institutions functioned as structural bridges between otherwise separate domains?

The visualization product will be published as a publicly accessible Streamlit web application and a Quarto-based data story on GitHub Pages.


2 Project Objectives and Success Criteria

The project enables users to explore the documented relationship network of key individuals in the Epstein Files. Specifically:

  • Identify which actors function as structural bridges between otherwise disconnected clusters (betweenness centrality / structural holes)
  • Explore the network interactively, filtered by domain, time period, and relationship type
  • Trace every visualized connection back to its source document

Success criteria:

  • Research question is answered visually and analytically
  • Dataset meets course requirements: 100+ observations, 6+ features, mix of numerical and categorical variables
  • Full pipeline is reproducible via GitHub repository and documented in Quarto
  • Visualization product is publicly deployed and accessible

Out of scope:

  • Full ingestion of the complete DOJ corpus – a defined, documented subset is used
  • LLM/RAG-based question answering
  • Speculation or unverified claims about individuals

3 Stakeholder Analysis

Stakeholder Role Goals Relationship
Project team – Gruppe 7 Developers Successfully complete the project, learn new methods, achieve a good grade Internal – responsible for all deliverables
Course instructors Evaluators Assess technical quality, reproducibility, and visualization design External – define requirements, provide feedback, grade the project
General public / researchers End users Explore documented connections in a transparent, data-driven way External – secondary audience for the public platform
Journalists / investigators End users Cross-reference documented relationships with source material External – potential future users of the platform

4 User Analysis

Personas


5 Situation Assessment

Available resources:

  • Dataset: Nikity/Epstein-Files on HuggingFace (4.1M rows, Parquet, pre-extracted text, DOJ source URLs)
  • Personnel: 2 team members, approx. 2–3 hours/week each over 8–10 weeks (~20–30 total hours)
  • Tools: Python, NetworkX, spaCy, Streamlit, Quarto, GitHub

Constraints and risks:

  • Limited time requires strict scope – no bulk infrastructure projects
  • Entity resolution (name disambiguation) is the main analytical risk; mitigated by alias mapping and manual validation of top entities
  • Dataset may not cover 100% of released documents; coverage will be documented as a limitation

6 Visualization Concept

The product combines two components:

  • Interactive network dashboard (Streamlit): force-directed graph with nodes as entities, edges as documented co-occurrences. Node size encodes betweenness centrality. Filterable by domain, time, and hop distance.
  • Guided data story (Quarto/GitHub Pages): narrative walkthrough of key findings with annotated charts and methodology documentation.

[Detailed design decisions and value mapping (cognitive, communicative, experiential) – to be completed by team]


7 Project Plan

gantt
    title Project Plan – Epstein Files Network Analysis
    dateFormat YYYY-MM-DD
    tickInterval 7day

    section Setup & Data
        Repository & Quarto setup             :a1, 2026-02-17, 7d
        Dataset loading & EDA                 :a2, 2026-02-24, 14d

    section Analysis Pipeline
        Entity extraction tests               :a3, 2026-03-03, 14d
        Data model & feature engineering      :a4, 2026-03-17, 7d
        Entity resolution & alias mapping     :a5, 2026-03-31, 14d
        Network construction (NetworkX)       :a6, 2026-03-31, 14d
        Network metrics (betweenness, structural holes) :a7, 2026-04-07, 14d

    section Visualization
        Streamlit prototype                   :a8, 2026-03-10, 14d
        Dashboard refinement                  :a9, 2026-04-14, 14d
        Data story drafting                   :a10, 2026-04-14, 14d
        Data story finalized                  :a11, 2026-04-28, 7d
        Deployment (Streamlit / GitHub Pages) :a12, 2026-04-28, 7d

    section Coaching & Feedback
        Concept coaching                      :milestone, m1, 2026-03-24, 1d
        On-site coaching                      :milestone, m2, 2026-04-21, 1d
        Online coaching                       :milestone, m3, 2026-04-28, 1d
        On-site coaching                      :milestone, m4, 2026-05-05, 1d

    section Finalization
        Final testing & documentation         :a13, 2026-05-05, 14d
        Presentation slides & rehearsal       :a14, 2026-05-19, 7d
        Final presentation                    :milestone, m5, 2026-05-26, 1d
Figure 1: Preliminary project plan in the form of a Gantt chart.

8 Roles and Contact Details

Role Name Contact
Student Sendogan Kulakci kulaksen@students.zhaw.ch
Student Lewis Birrer birrelew@students.zhaw.ch
Lecturer Dr. Manuel Dömer doem@zhaw.ch