Detection accuracy measured against the attacks your funnel actually sees.
Synthetic candidates are already interviewing their way into real jobs. We evaluate your detection layer against the personas, accents, and lighting conditions that real interview traffic carries.
A synthetic candidate only has to clear one live interview.
Hiring funnels move thousands of candidates, and no recruiter verifies that each face on the call is real. The detection layer is the check that has to hold across every one of these flows.
A real-time face swap on the call.
A proxy candidate runs a real-time face swap during the live interview, so the person on screen is not the person who will take the job. At interview volume, no recruiter is verifying that each face is genuine. The detection layer is the only check.
One strong number can hide a group the detection system does worse than guessing on.
A detection system can post a healthy overall accuracy while specific groups perform far worse. We report performance per group so the weakest one is visible, not averaged away.
The average hides the gap
one overall number
A single strong accuracy figure across all candidates.
split by group
Groups the detector does worse than guessing on, hidden inside a healthy-looking average.
Our per-group breakdown shows some groups falling below a coin flip while the overall number still looks healthy.
Read the benchmarkThe deliverables you get back
Interview-realistic conditions
Evaluated against what real interview traffic carries.
Geographic and accent drift
Performance reported per group.
Vendor-selection evidence
Reports that hold up in procurement conversations.
A high average is not the same as protection for every candidate.
The core challenge
Why a strong-looking detector still lets fraud through.
Trained on yesterday
Detectors learn from the generators that existed when they were built. New models ship every month, and the detector has never seen them.
Tested in a lab
Vendor numbers come from clean, pristine images. Real fraud arrives compressed, resized, and re-encoded by the platforms it passes through.
Measured on the average
A strong overall score can hide groups the detector barely catches. The average looks fine while a whole subgroup is an open lane.
Everything procurement asks for is in the box.
Per-group performance
Results broken out by demographic and platform, with the worst case shown.
Both kinds of mistake
Fakes let through and real users wrongly blocked, with the margin of error on each.
Bypass recipes
Every failure annotated with the recipe that surfaced it.
Methodology, documented
Public, versioned, and signed by the lead researcher.
Platform-realistic conditions
Scored under the re-encoding your deployment actually applies.
Audit-ready exhibits
Findings packaged to hand to a board or a regulator.
One measurement layer, every side of it.
Whichever side you are on, the same arms race runs underneath. See how we serve the rest of the market, or go straight to scoping your own evaluation.
See if your detection catches the synthetic candidate.
Tell us how candidates reach you. We will scope an evaluation against interview-realistic attacks, broken out by group.