Test material for the detectors your defense relies on.
A licensed dataset of labeled real and AI-generated face images, delivered by API, for teams that build or evaluate deepfake and face-detection systems. It measures where a detector fails, broken down by demographic and by platform compression.
Labeled real and synthetic faces, ready to score a detector against.
A curated collection of face images, each labeled as a real (bona-fide) face or an AI-generated face. You pull the data through an API and score your own detector or evaluation pipeline against it.
"Attack-data" means test material for detectors, not malicious content.
Because every item is labeled and grouped by demographic and by platform condition, the result is not a single accuracy number. It is a map of where a detector fails, by group and by condition.
SyntheticBrownFemale
SyntheticLightMaleA sample of the synthetic faces.
Teams that build or evaluate face and deepfake detection.
One dataset, read three ways depending on what you are defending.
Find where your detector fails before a customer does.
Score your model against labeled real and synthetic faces, and track regressions across releases so a shipped model never quietly gets worse on a group that matters.
What you get
A map of where a detector fails, not a single score.
Per-demographic breakdown
Results per demographic cell, not one pooled average that hides where a detector fails by group.
Platform conditions
Clean imagery plus platform re-encode variants for Facebook, Instagram, TikTok, and X.
Real and fake, paired
Every synthetic face traces back to a real source image, for like-for-like scoring.
Commercially licensed
Cleared for building and evaluating detection systems under the Margen data license.
A single accuracy number hides the failures that matter.
A detector can look strong on a clean, balanced test set and still fail on a specific demographic group, or fall apart once an image has been through a social platform's re-encoding.
This dataset is built to surface those gaps. It reports results per demographic cell and per platform condition, so you can see exactly where a detector fails rather than trusting an average that papers over it.
Licensed for commercial use, including building detection.
All imagery is obtained and licensed for commercial use, including building and evaluating detection systems. Real faces are licensed from commercial stock-media providers; synthetic faces are generated in house. Each delivered item is licensed for your evaluation use under the Margen data license.
Questions about licensing terms or fit for your use case? Contact us and we will walk you through it.
Evaluate a fixed sample before you buy credits.
A free test tier returns a fixed sample of the dataset, so your team can run it through your own pipeline and confirm it fits your needs before committing.
When you are ready, credits unlock the full catalog through the same API.