Which AI-image detectors actually work?
A neutral, reproducible measurement of off-the-shelf synthetic-image detectors on a balanced synthetic-face corpus (1,200 real + 1,200 generated, 12 demographic cells). We report tie-correct AUC, robustness under platform re-encoding, and demographic fairness. No vendor scores, no marketing, just the same numbers our evaluation engine serves. Want your own model measured? Submit a detector.
On the clean corpus the strongest detector, Deepfake-Detect-Siglip2, reaches only AUC 0.717. The weakest, Corvi2023, sits at 0.243, below chance, i.e. systematically inverted on this distribution.
Want a specific slice? Query any detector under any condition and demographic cell.
Run a live queryLeaderboard, clean corpus
Overall tie-correct AUC with the detector's own operating point (FNR = fakes missed, FPR = real images falsely flagged) on subset 2026-05-31_perturbation_full_v1.
| # | Detector | AUC | Missed fakes | False alarms | N |
|---|---|---|---|---|---|
| 1 | Deepfake-Detect-Siglip2prithivMLmods | 0.717 | 85.9% | 3.0% | 1200/1200 |
| 2 | xceptionSCLBD | 0.711 | 48.5% | 26.3% | 1200/1200 |
| 3 | SMOGY-Ai-images-detectorSmogy | 0.700 | 0.3% | 92.1% | 1200/1200 |
| 4 | Deep-Fake-Detector-v2-ModelprithivMLmods | 0.661 | 47.9% | 31.3% | 1200/1200 |
| 5 | f3netSCLBD | 0.572 | 58.1% | 34.2% | 1200/1200 |
| 6 | efficientnetb4SCLBD | 0.537 | 51.0% | 44.9% | 1200/1200 |
| 7 | fusiongrip-unina | 0.340 | 20.9% | 92.3% | 1200/1200 |
| 8 | Corvi2023grip-unina | 0.243 | 100.0% | 0.8% | 1200/1200 |
AUC below 0.5 means the detector's score is anti-correlated with truth on this corpus (it tends to call generated faces more “real” than the reals).
Robustness under platform processing
Per-condition AUC. Each column is a perturbation the corpus was re-encoded through (JPEG, resize, blur, noise, and the Instagram/Facebook/TikTok/X upload pipelines). Shading is a single hue keyed to AUC: darker = weaker.
| Detector | Clean | Clean (native encoding) | Blur 1px | Blur 2px | Blur 4px | Facebook pipeline | Instagram pipeline | JPEG q30 | JPEG q50 | JPEG q70 | JPEG q80 | JPEG q95 | Noise sigma 10 | Noise sigma 5 | Resize 0.5x | Resize 0.75x | TikTok pipeline | X pipeline |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Deepfake-Detect-Siglip2 | 0.717 | 0.836 | 0.534 | 0.281 | 0.257 | 0.709 | 0.725 | 0.394 | 0.430 | 0.491 | 0.543 | 0.679 | 0.711 | 0.711 | 0.602 | 0.681 | 0.733 | 0.710 |
| xception | 0.711 | 0.714 | 0.689 | 0.538 | 0.538 | 0.739 | 0.736 | 0.489 | 0.518 | 0.554 | 0.587 | 0.693 | 0.622 | 0.659 | 0.696 | 0.668 | 0.776 | 0.739 |
| SMOGY-Ai-images-detector | 0.700 | 0.750 | 0.700 | 0.721 | 0.709 | 0.687 | 0.678 | 0.672 | 0.652 | 0.646 | 0.652 | 0.711 | 0.609 | 0.632 | 0.705 | 0.712 | 0.677 | 0.686 |
| Deep-Fake-Detector-v2-Model | 0.661 | 0.723 | 0.676 | 0.565 | 0.499 | 0.571 | 0.570 | 0.650 | 0.642 | 0.642 | 0.642 | 0.657 | 0.652 | 0.656 | 0.654 | 0.665 | 0.585 | 0.571 |
| f3net | 0.572 | 0.603 | 0.598 | 0.480 | 0.407 | 0.595 | 0.595 | 0.451 | 0.451 | 0.461 | 0.477 | 0.549 | 0.506 | 0.527 | 0.594 | 0.542 | 0.622 | 0.596 |
| efficientnetb4 | 0.537 | 0.551 | 0.519 | 0.475 | 0.457 | 0.471 | 0.467 | 0.453 | 0.461 | 0.475 | 0.485 | 0.533 | 0.474 | 0.495 | 0.516 | 0.522 | 0.487 | 0.471 |
| fusion | 0.340 | 1.000 | 0.374 | 0.401 | 0.417 | 0.318 | 0.299 | 0.322 | 0.330 | 0.240 | 0.228 | 0.393 | 0.405 | 0.434 | 0.456 | 0.353 | 0.306 | 0.318 |
| Corvi2023 | 0.243 | 1.000 | 0.405 | 0.436 | 0.426 | 0.709 | 0.700 | 0.409 | 0.379 | 0.356 | 0.330 | 0.272 | 0.327 | 0.351 | 0.546 | 0.372 | 0.705 | 0.709 |
Demographic fairness (clean corpus)
The AUC gap between each detector's best- and worst-performing demographic cell (skin tone × gender). A strong pooled number can hide a subgroup that falls toward random. Sorted by gap, widest first.
| Detector | Fairness gap | Worst cell | Best cell |
|---|---|---|---|
| Deep-Fake-Detector-v2-Model | 0.434 | very light/female 0.378 | brown/female 0.812 |
| Corvi2023 | 0.403 | intermediate/male 0.078 | dark/female 0.480 |
| Deepfake-Detect-Siglip2 | 0.347 | dark/male 0.521 | tan/female 0.868 |
| f3net | 0.315 | intermediate/female 0.411 | tan/female 0.726 |
| SMOGY-Ai-images-detector | 0.283 | brown/male 0.571 | intermediate/female 0.854 |
| xception | 0.259 | very light/female 0.609 | tan/female 0.868 |
| efficientnetb4 | 0.237 | light/male 0.460 | tan/female 0.697 |
| fusion | 0.196 | light/female 0.240 | brown/female 0.436 |
Wider clean-only comparison
A larger clean corpus covering 14 detectors (a separate image set, so do not compare these AUCs against the leaderboard above). Overall AUC only.
| # | Detector | AUC | N |
|---|---|---|---|
| 1 | Corvi2023grip-unina | 1.000 | 19842/6650 |
| 2 | fusiongrip-unina | 1.000 | 19842/6650 |
| 3 | Deepfake-Detect-Siglip2prithivMLmods | 0.837 | 19842/6650 |
| 4 | SMOGY-Ai-images-detectorSmogy | 0.752 | 19842/6650 |
| 5 | Deep-Fake-Detector-v2-ModelprithivMLmods | 0.727 | 19842/6650 |
| 6 | xceptionSCLBD | 0.707 | 19842/6650 |
| 7 | f3netSCLBD | 0.600 | 19842/6650 |
| 8 | efficientnetb4SCLBD | 0.548 | 19842/6650 |
| 9 | ucfSCLBD | 0.499 | 19842/6650 |
| 10 | AI-image-detectorumm-maybe | 0.492 | 19842/6650 |
| 11 | sdxl-detectorOrganika | 0.448 | 19842/6650 |
| 12 | clipdet_latent10k_plusgrip-unina | 0.301 | 19842/6650 |
| 13 | FFrawmapooon | 0.285 | 19842/6650 |
| 14 | FFc23mapooon | 0.273 | 19842/6650 |