SyntheticIntermediateFemaleCheck detectors from our latest benchmark
Pick a detector and a condition and see how it scores on our corpus, down to a single demographic cell. The same measurement we run in an evaluation. Full results are on our benchmark page.
Query the benchmark
AUC is a 0 to 1 score for how well a detector tells fake images from real ones. Picture showing it one fake photo and one real photo: AUC is how often it rates the fake as the more likely fake. 1.0 means it always gets it right, 0.5 is a coin flip (no better than guessing), and below 0.5 means it is getting them backwards.
Put in a request
Public model
We'll run it and put it up on the interactive.
Private model
Share it privately. Weights or API Key.
Data and methodology
Self service on your own model.
Request a generator
Suggest one to add to the interactive.
Want the full report on your detector?
The interactive shows one score at a time. A full evaluation runs your model across the entire sequestered corpus and returns a per-group report card: AUC with DeLong confidence intervals, how it holds up under each platform condition, and where it fails, the recipe that broke it.