NewThe detectors that scored perfect collapsed the hardest under attack.
← Detector collapse study
Open benchmark

Which AI-image detectors actually work?

A neutral, reproducible measurement of off-the-shelf synthetic-image detectors on a balanced synthetic-face corpus (1,200 real + 1,200 generated, 12 demographic cells). We report tie-correct AUC, robustness under platform re-encoding, and demographic fairness. No vendor scores, no marketing, just the same numbers our evaluation engine serves. Want your own model measured? Submit a detector.

On the clean corpus the strongest detector, Deepfake-Detect-Siglip2, reaches only AUC 0.717. The weakest, Corvi2023, sits at 0.243, below chance, i.e. systematically inverted on this distribution.

Want a specific slice? Query any detector under any condition and demographic cell.

Run a live query

Leaderboard, clean corpus

Overall tie-correct AUC with the detector's own operating point (FNR = fakes missed, FPR = real images falsely flagged) on subset 2026-05-31_perturbation_full_v1.

#DetectorAUCMissed fakesFalse alarmsN
1Deepfake-Detect-Siglip2prithivMLmods0.71785.9%3.0%1200/1200
2xceptionSCLBD0.71148.5%26.3%1200/1200
3SMOGY-Ai-images-detectorSmogy0.7000.3%92.1%1200/1200
4Deep-Fake-Detector-v2-ModelprithivMLmods0.66147.9%31.3%1200/1200
5f3netSCLBD0.57258.1%34.2%1200/1200
6efficientnetb4SCLBD0.53751.0%44.9%1200/1200
7fusiongrip-unina0.34020.9%92.3%1200/1200
8Corvi2023grip-unina0.243100.0%0.8%1200/1200

AUC below 0.5 means the detector's score is anti-correlated with truth on this corpus (it tends to call generated faces more “real” than the reals).

Robustness under platform processing

Per-condition AUC. Each column is a perturbation the corpus was re-encoded through (JPEG, resize, blur, noise, and the Instagram/Facebook/TikTok/X upload pipelines). Shading is a single hue keyed to AUC: darker = weaker.

DetectorCleanClean (native encoding)Blur 1pxBlur 2pxBlur 4pxFacebook pipelineInstagram pipelineJPEG q30JPEG q50JPEG q70JPEG q80JPEG q95Noise sigma 10Noise sigma 5Resize 0.5xResize 0.75xTikTok pipelineX pipeline
Deepfake-Detect-Siglip20.7170.8360.5340.2810.2570.7090.7250.3940.4300.4910.5430.6790.7110.7110.6020.6810.7330.710
xception0.7110.7140.6890.5380.5380.7390.7360.4890.5180.5540.5870.6930.6220.6590.6960.6680.7760.739
SMOGY-Ai-images-detector0.7000.7500.7000.7210.7090.6870.6780.6720.6520.6460.6520.7110.6090.6320.7050.7120.6770.686
Deep-Fake-Detector-v2-Model0.6610.7230.6760.5650.4990.5710.5700.6500.6420.6420.6420.6570.6520.6560.6540.6650.5850.571
f3net0.5720.6030.5980.4800.4070.5950.5950.4510.4510.4610.4770.5490.5060.5270.5940.5420.6220.596
efficientnetb40.5370.5510.5190.4750.4570.4710.4670.4530.4610.4750.4850.5330.4740.4950.5160.5220.4870.471
fusion0.3401.0000.3740.4010.4170.3180.2990.3220.3300.2400.2280.3930.4050.4340.4560.3530.3060.318
Corvi20230.2431.0000.4050.4360.4260.7090.7000.4090.3790.3560.3300.2720.3270.3510.5460.3720.7050.709

Demographic fairness (clean corpus)

The AUC gap between each detector's best- and worst-performing demographic cell (skin tone × gender). A strong pooled number can hide a subgroup that falls toward random. Sorted by gap, widest first.

DetectorFairness gapWorst cellBest cell
Deep-Fake-Detector-v2-Model0.434very light/female 0.378brown/female 0.812
Corvi20230.403intermediate/male 0.078dark/female 0.480
Deepfake-Detect-Siglip20.347dark/male 0.521tan/female 0.868
f3net0.315intermediate/female 0.411tan/female 0.726
SMOGY-Ai-images-detector0.283brown/male 0.571intermediate/female 0.854
xception0.259very light/female 0.609tan/female 0.868
efficientnetb40.237light/male 0.460tan/female 0.697
fusion0.196light/female 0.240brown/female 0.436

Wider clean-only comparison

A larger clean corpus covering 14 detectors (a separate image set, so do not compare these AUCs against the leaderboard above). Overall AUC only.

#DetectorAUCN
1Corvi2023grip-unina1.00019842/6650
2fusiongrip-unina1.00019842/6650
3Deepfake-Detect-Siglip2prithivMLmods0.83719842/6650
4SMOGY-Ai-images-detectorSmogy0.75219842/6650
5Deep-Fake-Detector-v2-ModelprithivMLmods0.72719842/6650
6xceptionSCLBD0.70719842/6650
7f3netSCLBD0.60019842/6650
8efficientnetb4SCLBD0.54819842/6650
9ucfSCLBD0.49919842/6650
10AI-image-detectorumm-maybe0.49219842/6650
11sdxl-detectorOrganika0.44819842/6650
12clipdet_latent10k_plusgrip-unina0.30119842/6650
13FFrawmapooon0.28519842/6650
14FFc23mapooon0.27319842/6650
AUC computed with tie-correct Mann-Whitney ranking, identical to our per-detector reports. Detectors are evaluated as published, with no fine-tuning. See why detectors collapse and our methodology.