Is Simulated Evidence Still Evidence?
This project investigates how high-fidelity synthetic data challenges our concept of evidence across scientific, legal, and historical domains. As generative AI technologies increasingly produce data that mimic empirical records, foundational questions arise: What qualifies synthetic content as legitimate evidence? Can current domain-specific standards adapt through relativistic interpretation, or does synthetic data require a deeper conceptual renovation of evidence itself? Through conceptual analysis, exploratory interviews with scientists, and examination of the EU AI Act, GDPR, and DSA, this project aims to surface philosophical distinctions that can inform governance design and preserve epistemic trust.This research project examines a brewing philosophical crisis: artificial intelligence (AI)’s ability to generate high-fidelity synthetic data is challenging our basic understanding of what counts as “evidence” across domains, from legal proceedings to scientific research to historical documentation. Synthetic data is artificial data generated to reproduce characteristics of original data (EDPS 2021). Evidence is popularly thought of as a fragment of objective reality, something tangible like blood at a crime scene or visible like a photograph. Within forensics and science, however, evidence functions as symbolic representations that allow human minds to grasp complex realities. An astronomical scan represents the universe; economic figures represent market activity. Yet, when data can be easily fabricated, how do we determine what legitimately functions as evidence? What conditions allow synthetic data to be trusted, and when does it threaten knowledge foundations?
While synthetic data has legitimate uses e.g., augmenting datasets and protecting privacy (Jordan et al. 2024), AI has transformed both the scale and accessibility of generation. Courts now confront the possibility of deepfaked evidence entering into legal proceedings (Dixon 2024), scientists work with synthetic datasets whose provenance is increasingly uncertain (Ng 2025), and historians and journalists face questions about sourcing and documentation authenticity (Gibson 2021). With the EU’s AI Act requiring synthetic content labeling but failing to adequately address the “black box” problem of synthetic evidence (Zambelli 2025), philosophical foundations are indeed urgently needed.
My aim is to lay philosophical groundwork for policy by determining whether the challenge posed by synthetic data can be successfully met by developing a relativistic understanding of evidence, or whether a deeper conceptual renovation is required. In a relativistic understanding, something can qualify as evidence under domain-specific conditions, while in a renovated understanding, evidence’s meaning-making function itself shifts. Coming to an answer will prove crucial for developing governance frameworks that can adapt to AI’s evolving capabilities.