Open benchmark

Which AI-image detectors actually work?

A neutral, reproducible measurement of off-the-shelf synthetic-image detectors on a balanced synthetic-face corpus (1,200 real + 1,200 generated, 12 demographic cells). We report tie-correct AUC, robustness under platform re-encoding, and demographic fairness. No vendor scores, no marketing, just the same numbers our evaluation engine serves. Want your own model measured? Submit a detector.

On the clean corpus the strongest detector, Deepfake-Detect-Siglip2, reaches only AUC 0.717. The weakest, Corvi2023, sits at 0.243, below chance, i.e. systematically inverted on this distribution.

Want a specific slice? Query any detector under any condition and demographic cell.

Run a live query

Leaderboard, clean corpus

Overall tie-correct AUC with the detector's own operating point (FNR = fakes missed, FPR = real images falsely flagged) on subset 2026-05-31_perturbation_full_v1.

#	Detector	AUC	Missed fakes	False alarms	N
1	Deepfake-Detect-Siglip2prithivMLmods	0.717	85.9%	3.0%	1200/1200
2	xceptionSCLBD	0.711	48.5%	26.3%	1200/1200
3	SMOGY-Ai-images-detectorSmogy	0.700	0.3%	92.1%	1200/1200
4	Deep-Fake-Detector-v2-ModelprithivMLmods	0.661	47.9%	31.3%	1200/1200
5	f3netSCLBD	0.572	58.1%	34.2%	1200/1200
6	efficientnetb4SCLBD	0.537	51.0%	44.9%	1200/1200
7	fusiongrip-unina	0.340	20.9%	92.3%	1200/1200
8	Corvi2023grip-unina	0.243	100.0%	0.8%	1200/1200

AUC below 0.5 means the detector's score is anti-correlated with truth on this corpus (it tends to call generated faces more “real” than the reals).

Robustness under platform processing

Per-condition AUC. Each column is a perturbation the corpus was re-encoded through (JPEG, resize, blur, noise, and the Instagram/Facebook/TikTok/X upload pipelines). Shading is a single hue keyed to AUC: darker = weaker.

Detector	Clean	Clean (native encoding)	Blur 1px	Blur 2px	Blur 4px	Facebook pipeline	Instagram pipeline	JPEG q30	JPEG q50	JPEG q70	JPEG q80	JPEG q95	Noise sigma 10	Noise sigma 5	Resize 0.5x	Resize 0.75x	TikTok pipeline	X pipeline
Deepfake-Detect-Siglip2	0.717	0.836	0.534	0.281	0.257	0.709	0.725	0.394	0.430	0.491	0.543	0.679	0.711	0.711	0.602	0.681	0.733	0.710
xception	0.711	0.714	0.689	0.538	0.538	0.739	0.736	0.489	0.518	0.554	0.587	0.693	0.622	0.659	0.696	0.668	0.776	0.739
SMOGY-Ai-images-detector	0.700	0.750	0.700	0.721	0.709	0.687	0.678	0.672	0.652	0.646	0.652	0.711	0.609	0.632	0.705	0.712	0.677	0.686
Deep-Fake-Detector-v2-Model	0.661	0.723	0.676	0.565	0.499	0.571	0.570	0.650	0.642	0.642	0.642	0.657	0.652	0.656	0.654	0.665	0.585	0.571
f3net	0.572	0.603	0.598	0.480	0.407	0.595	0.595	0.451	0.451	0.461	0.477	0.549	0.506	0.527	0.594	0.542	0.622	0.596
efficientnetb4	0.537	0.551	0.519	0.475	0.457	0.471	0.467	0.453	0.461	0.475	0.485	0.533	0.474	0.495	0.516	0.522	0.487	0.471
fusion	0.340	1.000	0.374	0.401	0.417	0.318	0.299	0.322	0.330	0.240	0.228	0.393	0.405	0.434	0.456	0.353	0.306	0.318
Corvi2023	0.243	1.000	0.405	0.436	0.426	0.709	0.700	0.409	0.379	0.356	0.330	0.272	0.327	0.351	0.546	0.372	0.705	0.709

Demographic fairness (clean corpus)

The AUC gap between each detector's best- and worst-performing demographic cell (skin tone × gender). A strong pooled number can hide a subgroup that falls toward random. Sorted by gap, widest first.

Detector	Fairness gap	Worst cell	Best cell
Deep-Fake-Detector-v2-Model	0.434	very light/female 0.378	brown/female 0.812
Corvi2023	0.403	intermediate/male 0.078	dark/female 0.480
Deepfake-Detect-Siglip2	0.347	dark/male 0.521	tan/female 0.868
f3net	0.315	intermediate/female 0.411	tan/female 0.726
SMOGY-Ai-images-detector	0.283	brown/male 0.571	intermediate/female 0.854
xception	0.259	very light/female 0.609	tan/female 0.868
efficientnetb4	0.237	light/male 0.460	tan/female 0.697
fusion	0.196	light/female 0.240	brown/female 0.436

Wider clean-only comparison

A larger clean corpus covering 14 detectors (a separate image set, so do not compare these AUCs against the leaderboard above). Overall AUC only.

#	Detector	AUC	N
1	Corvi2023grip-unina	1.000	19842/6650
2	fusiongrip-unina	1.000	19842/6650
3	Deepfake-Detect-Siglip2prithivMLmods	0.837	19842/6650
4	SMOGY-Ai-images-detectorSmogy	0.752	19842/6650
5	Deep-Fake-Detector-v2-ModelprithivMLmods	0.727	19842/6650
6	xceptionSCLBD	0.707	19842/6650
7	f3netSCLBD	0.600	19842/6650
8	efficientnetb4SCLBD	0.548	19842/6650
9	ucfSCLBD	0.499	19842/6650
10	AI-image-detectorumm-maybe	0.492	19842/6650
11	sdxl-detectorOrganika	0.448	19842/6650
12	clipdet_latent10k_plusgrip-unina	0.301	19842/6650
13	FFrawmapooon	0.285	19842/6650
14	FFc23mapooon	0.273	19842/6650