Detectors collapse from near-perfect to near-random

Overview

We evaluated leading open-source deepfake detectors the way a buyer should: on attacks held out of training, under the conditions production imposes, and stratified so no group hides behind a strong average. The two detectors that scored a perfect clean number fell the furthest; six others lost less ground but still slipped. The pattern is the same across the field: clean-benchmark accuracy does not survive the test getting harder.

Detectors that score near-perfect on clean benchmarks fall to near-random once attacks are unseen and media is reprocessed the way platforms reprocess it.

Bar chart of detector ROC-AUC on SDXL and InstantID, colored by training-corpus family, with per-cell AUC range whiskers across 12 demographic cells. — Fig. 1Overall AUC by detector, with the black whisker showing the per-group range. The spread inside a single bar is the fairness story a pooled number hides.

The clean leaderboard

On a clean benchmark, the field looks healthy. A handful of detectors post near-perfect scores and a tidy ranking emerges. This is the table a buyer usually sees, and the number a vendor usually quotes.

Detector	Training family	Clean AUC	Clean-board tier
DMimageDetection	Diffusion-trained	1.0000	Top of clean board
Fusion	Diffusion-trained	1.0000	Top of clean board
SigLIP2	Mixed community	0.8373	Mid clean board
Smogy	Mixed community	0.7519	Mid clean board
Xception	Face-swap	0.7065	Mid clean board
F3Net	Face-swap	0.5996	Near random
UCF	Face-swap	0.4992	Near random
clipdet_latent10k	Diffusion-trained	0.3005	Near random
SBI (FF c23)	Face-swap	0.2731	Near random

Clean-benchmark ROC-AUC on leading open-source detectors. The rest of this piece is about what happens to these numbers once the test stops being clean.

The format confound

A large share of reported detector skill comes from reading compression signature, not synthesis artifacts. When real and fake media are forced through one encoding pipeline so the two share a format, the apparent accuracy collapses.

Slope chart showing detector skill collapsing once real and fake media share one encoding pipeline. — Fig. 2Format parity. Apparent skill that came from compression signature disappears once both classes pass through the same pipeline.

The control is a few lines. Score the detector as-is, then re-encode both classes through one pipeline and score again. The gap is the share of accuracy that was reading format rather than synthesis.

format_parity.pyPython

1from sklearn.metrics import roc_auc_score
2 
3# Score without parity: real and fake arrive in different formats
4auc_raw = roc_auc_score(y_true, detector.score(images))
5 
6# Re-encode every image through ONE pipeline, then score again
7parity = [reencode(img, codec="h264", quality=70) for img in images]
8auc_parity = roc_auc_score(y_true, detector.score(parity))
9 
10# The gap is the "skill" that was reading format, not artifacts
11print(round(auc_raw - auc_parity, 3))

Per-group failure

Pooled accuracy hides subgroup failure. We break every result out across skin-tone and gender groups and report the maximum disparity, not the average, so a failing group cannot be averaged away.

Heatmap of detector accuracy across skin-tone and gender groups. — Fig. 3Per-group true-positive rate by skin tone and gender. Darker cells indicate lower detector confidence.

Platform degradation

Real-world re-encoding, resizing, and recompression move numbers that clean-lab benchmarks never test. A detector validated only on pristine images is not validated for production.

Chart of detector performance dropping under platform re-encoding and recompression. — Fig. 4Platform degradation. Performance falls under the reprocessing a real platform applies.

What this means for buyers

A published accuracy near 1.0 is not evidence a detector will hold against fraud. Before you rely on one, it should be tested on unseen generators, under platform-realistic conditions, and broken out by group. That is what a Margen evaluation produces.

See how an evaluation works Find your use case

Cite this paper

This benchmark is published openly with a permanent DOI. Cite the immutable record, not this page.

Cite this paper

Pick a format. Copy the string.

Babalola, D.. (2026). Deepfake Detector Robustness Under Social-Media Re-encoding. Zenodo. https://doi.org/10.5281/zenodo.20781389

DOI 10.5281/zenodo.20781389·All citations point to the immutable DOI, not the paper page.

Detectors collapse from near-perfect to near-random.

Overview

The clean leaderboard

The format confound

Per-group failure

Platform degradation

What this means for buyers

Cite this paper