Vendor · Builders that need third-party validation.

The third-party red team that helps you close the deal.

Your buyers ask for proof that goes beyond your own benchmark. We are the independent red team that supplies it: an evaluation grounded in a corpus your team did not assemble and a method your team did not design, so the number holds up in the room where the deal is won.

The evidence

Your benchmark vs an independent one

What you were told

~1.00

on your own benchmark

The score your team measured and put in the deck.

What holds in your funnel

in front of a buyer

What that number is worth once the buyer knows your team picked the test, the data, and the threshold.

Buyers increasingly discount a vendor's own benchmark. A score from a party that did not build the model, on attacks it has never seen, is the one that moves a deal.

Read the benchmark

How an evaluation unfolds

Six weeks. Five milestones. One report on the other side.

W1
Kickoff and scope lock
Attack classes, demographic axes, and the bar for a finding are agreed and frozen.
W2
Integration and calibration
Your detector is wired into the harness. Baseline performance captured before any perturbation.
W3 to W4
Adversarial runs
Benchmark and perturbation pipeline executed. Every decision logged for replay.
W5
Analysis and bypass recipes
Per-group margins of error computed. Failures annotated with the recipe that surfaced them.
W6
Verdict and engineering debrief
Evaluation report delivered. Working session with engineering. Verdict signed.

Co-delivered engagements are matched to the host scope and may run shorter or longer.

The key deliverable

We hand back the recipe that broke your model, ready to fine-tune.

Proof that closes deals

Independent evidence buyers trust, because your team did not pick the test, the data, or the threshold.

Re-evaluate fast

Re-run against fresh attacks as they appear, and see exactly where the detector starts to slip.

The exact attack, not just a score

For every miss, the attack that produced it, formatted to drop straight into your next training set.

Before you ship

A second, independent read on the model before it reaches a customer.

What the report looks like

Every miss comes with the recipe to fix it.

We rank what we find by severity and hand back the exact attack behind each one, so your team can reproduce it and fold it into the next fine-tune.

Severity	Finding	The recipe
Critical	Platform compression	Re-compress uploads at H.264 quality 70, the way real platforms do. The score fell from 1.00 to 0.34, below a coin flip.
Critical	Weakest group	Break results out per group: one cell fell below a coin flip while the overall average still looked healthy.
High	Format tell	Re-encode real and fake identically, removing the file-format signal the model had been leaning on instead of content.

Example findings drawn from our open benchmark, shown to illustrate the report format. Your evaluation returns findings specific to the detector you submit.

What you get

Everything procurement asks for is in the box.

Per-group performance
Results broken out by demographic and platform, with the worst case shown.
Both kinds of mistake
Fakes let through and real users wrongly blocked, with the margin of error on each.
Bypass recipes
Every failure annotated with the recipe that surfaced it.
Methodology, documented
Public, versioned, and signed by the lead researcher.
Platform-realistic conditions
Scored under the re-encoding your deployment actually applies.
Audit-ready exhibits
Findings packaged to hand to a board or a regulator.

The open benchmark

Put your detector on the public benchmark.

We publish an open, independent benchmark of deepfake detectors, all measured against the same attacks. Submit yours to be ranked alongside the field. If it holds, that is third-party proof you can put in front of buyers. The methodology is fixed and public, so a place on the board is earned by the result, never bought.

Submit to the benchmark See the current benchmark

The rest of the market

One measurement layer, every side of it.

Whichever side you are on, the same arms race runs underneath. See how we serve the rest of the market, or go straight to scoping your own evaluation.

02Identity verification and KYCBuyerLiveness-flow operators under continuous attack.03Hiring and interview platformsBuyerHigh-volume detection, real candidates, real funnels.04Red team and security-awareness firmsPartnerExtend social-engineering into the technical layer.05Enterprise security leadershipBuyerCISOs procuring detection technology directly.

Put an independent number on your detector.

Submit your detector and we return where it holds, where it breaks, and the recipe behind every failure, ready for your next fine-tune.

Request an evaluation See evaluation services