Attack-Data API

API reference

Attack-data is labeled real and AI-generated media built to stress-test deepfake detectors. Pull it here, organized into benchmarks you select and filter by type. Authenticated with an API key and metered in credits. Test keys pull a free fixed sample. For the why behind attack-data, see the attack-data overview.

Getting started

The base URL is https://www.margensoftware.com and all endpoints live under /api/v1/data. Responses are JSON. Image bytes are never returned inline: a download returns a short-lived signed URL you fetch directly. The unversioned /api/data prefix still works as a permanent alias, so older integrations keep running.

You need a Margen account to get a key. Sign up at /eval/signup or log in at /eval/login, then manage keys at /keys.

HTTP

1https://www.margensoftware.com/api/v1/data

Client SDK

The official Python SDK is on PyPI. Install it and construct a client with your key. The typed operations are list_benchmarks, get_catalog, list_items, download_item, and get_usage; a list value for any filter is sent as a comma-separated list. list_items returns a paginated wrapper you read through .result (so .result.data), while list_benchmarks, get_catalog, get_usage, and download_item return the object directly. For notebook workflows, margen.ergonomics adds iter_items / iter_lineages (paginate the whole result set) and download_selection (one-call bulk download to a folder). Other languages can call the REST endpoints directly (see the curl tab on each endpoint below).

1pip install margen

Authentication

Generate a key in the portal at /keys. The raw key is shown once; store it. Send it as a bearer token on every request.

HTTP

1Authorization: Bearer mgn_live_xxxxxxxxxxxxxxxx

Test keys (mgn_test_...): free, no credits needed, open to any account. A test key sees only the test sample, a fixed subset, so its catalog counts and available items differ from a live key.
Live keys (mgn_live_...): the full corpus, credit-metered. Open to any account, no approval. Buying credits is the access, a live pull debits one credit per image. Enterprise or volume terms are arranged separately via a volume order.

Credits and pricing

One credit pulls one image. Credits are prepaid and debited in real time at download. Buy credits in the portal at /keys (pick any amount; a flat per-image rate is shown at purchase).
Credit usage is per unique image, per account. Your account is charged once per item id, and credits and ownership are account-scoped, so they survive a key rotation. Re-downloading an item your account already owns is free: the response carries charged: false and already_owned: true. So pulling the same selection twice never double-charges.
To pull only images you do not already own, pass exclude_owned=true to /items; it also reports remaining/owned and a subset_exhausted message when you own the whole matching subset.
A download is charged only on success; a retry with the same Idempotency-Key is never charged twice.
With a zero balance a live-tier download returns 402 (code insufficient_credits) and delivers nothing. Top up in the portal.
Test downloads are free and never touch your balance.

NOTE

Downloads use a sign-then-debit flow: the credit is reserved, the signed URL is issued, and the balance is debited only once the URL is returned. A repeated request with the same Idempotency-Key returns the original result without a second charge.

Rate limits

Each key has its own per-minute limit (set when you create the key, default 60). Exceeding it returns 429. Set a low limit to protect a credit balance from a runaway script.

Benchmarks

The data is organized into benchmarks. A benchmark is a versioned dataset (for example synthetic-face-v1) with its own set of queryable dimensions, the same way different models expose different parameters. You choose a benchmark, discover its dimensions, then query items within it. Every request that touches data takes a benchmark parameter.

List the benchmarks your key can query with /api/v1/data/benchmarks, then pass ?benchmark=<id> to /catalog (to see its dimensions and values) and /items (to select images). The dimensions differ per benchmark; /catalog is always the source of truth for what a given benchmark supports.

NOTE

Choosing a benchmark. If your key sees exactly one benchmark you may omit the benchmark parameter and it is used by default. Once your key can see more than one, the parameter is required and omitting it returns a 400 listing the available ids. New benchmarks are added without changing this contract, so integrations built against one benchmark keep working.

Available benchmarks

For full dataset specs (composition, image spec, labeling) see the Synthetic Face Image benchmark page.

Synthetic Face ImageAvailable

Real (genuine, unmodified) vs AI-generated face crops across demographic cells (each cell is one skin-tone x gender combination) and generator models, each re-encoded through platform pipelines and image perturbations. Real and generated images are linked by lineage (a real image plus everything derived from it).

Product: Faces·Tiers: test (free sample), live (full corpus, credit-metered)

skin_tonegenderkindgeneratorperturbationlayerbase_idsource_real_id

Face-swap, puppeteering, and livenessIn development

Additional benchmarks for face-swap video, image-to-video puppeteering, and active-liveness presentation attacks are in development. Each will expose its own dimensions and appear here when released; no code change is needed to query a new benchmark.

Product: Swaps / Puppets / Liveness·Tiers: not yet available

Selecting images

Pulling data is three steps: discover a benchmark's dimensions with /catalog, select the items you want with /items, then fetch each with /download. Selection happens entirely in the /items query string, so a request fully describes the set you are pulling. The recipes below are copy-ready against synthetic-face-v1.

Two rules cover every query: a comma-separated value matches any of the listed values (OR within a dimension), and separate parameters must all hold (AND across dimensions). Omit a dimension to include all of its values.

Two terms used throughout: a cell is one skin-tone x gender combination, and a lineage is a sourced real image plus every fake and perturbed variant derived from it, all sharing one source_real_id.

One specific type

The finest-grained selection: a single cell, one condition. Every filter is a single value, so exactly one type of image comes back.

Request

1items = list(iter_items(
2    client, benchmark="synthetic-face-v1",
3    kind="fake", skin_tone="dark", gender="female", perturbation="clean",
4))

Response

JSON

1{
2  "object": "list",
3  "total": 12,
4  "has_more": true,
5  "benchmark": "synthetic-face-v1",
6  "data": [
7    { "object": "attack_data_item", "id": "8f3c1d2e-...",
8      "kind": "fake", "skin_tone": "dark", "gender": "female",
9      "generator": "diffusion-v1", "perturbation": "clean",
10      "layer": "clean", "base_id": "b2d4e6f8-...", "source_real_id": "real_0001" }
11  ]
12}

Several values at once

A list value builds a set in one call: dark or brown, at JPEG q70 or q80, that are fakes. The client sends a list as a comma-separated value; applied_filters echoes exactly what the query understood.

Request

1items = list(iter_items(
2    client, benchmark="synthetic-face-v1", kind="fake",
3    skin_tone=["dark", "brown"], perturbation=["jpeg_q70", "jpeg_q80"],
4))

Response

JSON

1{
2  "object": "list",
3  "total": 64,
4  "has_more": false,
5  "benchmark": "synthetic-face-v1",
6  "applied_filters": {
7    "skin_tone": ["dark","brown"], "kind": ["fake"],
8    "perturbation": ["jpeg_q70","jpeg_q80"], "gender": null
9  },
10  "data": [ /* dark+brown x q70+q80 fakes */ ]
11}

A matched lineage (real + everything from it)

Pull a sourced real image and every fake and perturbed variant derived from it, all sharing one source_real_id. Use this to build matched real/fake pairs for paired evaluation. To page over whole lineages at once, add lineage="true".

Request

1items = list(iter_items(
2    client, benchmark="synthetic-face-v1", source_real_id="real_0001",
3))

Response

JSON

1{
2  "object": "list",
3  "benchmark": "synthetic-face-v1",
4  "data": [
5    { "object": "attack_data_item", "id": "r1a2b3c4-...", "kind": "real",
6      "perturbation": "clean", "base_id": "img_r1a2", "source_real_id": "real_0001" },
7    { "object": "attack_data_item", "id": "f5d6e7f8-...", "kind": "fake",
8      "perturbation": "clean", "generator": "diffusion-v1",
9      "base_id": "img_f5d6", "source_real_id": "real_0001" },
10    { "object": "attack_data_item", "id": "f9a0b1c2-...", "kind": "fake",
11      "perturbation": "fb_pipeline", "generator": "diffusion-v1",
12      "base_id": "img_f5d6", "source_real_id": "real_0001" }
13  ]
14}

Other conditions of the SAME image

Hold an item's base_id and change perturbation to pull another condition of the exact same image. Every perturbation of one image shares a base_id (distinct from source_real_id, which spans the whole real-source family).

Request

1# you have an item; pull every perturbation of that same base image
2variants = list(iter_items(
3    client, benchmark="synthetic-face-v1", base_id=item.base_id,
4))
5# or jump straight to one condition of the same image:
6one = client.list_items(
7    benchmark="synthetic-face-v1", base_id=item.base_id, perturbation="jpeg_q70",
8).result.data[0]

Response

JSON

1{
2  "object": "list",
3  "benchmark": "synthetic-face-v1",
4  "data": [
5    { "object": "attack_data_item", "id": "f7c8d9e0-...", "kind": "fake",
6      "perturbation": "jpeg_q70", "generator": "diffusion-v1",
7      "base_id": "img_f5d6", "source_real_id": "real_0001" }
8  ]
9}

Then fetch, and page

Fetch each item with GET /api/v1/data/download/<id>. It returns a signed URL that delivers one JPEG image and expires after 300 seconds (5 minutes), so fetch it promptly and send no auth header on that request. On the live tier downloads are credit-metered; check your balance first with /api/v1/data/usage to avoid a mid-run 402.

Three ways to page, by size and use:

offset + limit for one-shot small pulls (returns an exact total).
cursor for large or repeated pulls: stable if items are added while you page (pass the response next_cursor back in). This is the only mode a generated SDK auto-pages; offset and lineage modes are paged manually.
lineage=true to page by matched sets rather than rows (each page is whole lineages).

Endpoints

GET /api/v1/data/benchmarks

The benchmarks your key can query, each with its id, product, title, and the dimension parameters it exposes. Use a benchmark id as the benchmark parameter on the other endpoints.

1benchmarks = client.list_benchmarks().data

GET /api/v1/data/catalog?benchmark=<id>

The filter dimensions a benchmark exposes, each with its allowed values (labeled where the raw value is opaque, e.g. conditions and layers), plus the total item count for your tier. The filters block maps each /api/v1/data/items query parameter to the values allowed for your key, so you can build a valid query without memorizing slugs. Omit benchmark only if your key sees a single benchmark.

1catalog = client.get_catalog(benchmark="synthetic-face-v1")

GET /api/v1/data/items

A filtered list of items for a benchmark (ids and attributes, no storage paths). The filterable dimensions are defined by the benchmark, so you select images with exactly the discrimination it supports. The table below is the synthetic-face-v1 benchmark; call /api/v1/data/catalog?benchmark=<id> for any benchmark's parameters and values. Every parameter is optional; omit a parameter to include all values for that dimension, and each accepts a comma-separated list matching any of the given values (OR within the dimension), e.g. skin_tone=dark,brown or perturbation=jpeg_q70,jpeg_q80. Unknown values for a fixed dimension return 400 with the allowed set.

Parameter	Meaning	Allowed values
`benchmark`	Which benchmark to query (see /api/v1/data/benchmarks). Omit only if your key sees one benchmark	e.g. synthetic-face-v1
`skin_tone`	Skin-tone band on a 6-level light-to-dark scale	very_light, light, intermediate, tan, brown, dark
`gender`	Perceived gender of the face	female, male
`kind`	Real (a genuine, unmodified photo) or fake (AI-generated)	real, fake
`generator`	Model that produced the image (fake only; null for real)	see /catalog generators
`perturbation`	Image condition applied after generation. Alias: condition. jpeg_q* = JPEG at that quality; blur_/noise_/resize_* = that transform; *_pipeline = a re-encode through that platform's upload pipeline (fb=Facebook, ig=Instagram, tt=TikTok, x=X)	clean, jpeg_q30/50/70/80/95, blur_1/2/4, noise_5/10, resize_0.5/0.75, fb_pipeline, ig_pipeline, tt_pipeline, x_pipeline
`layer`	Coarse grouping of conditions: clean = no perturbation; layer1 = one lossy transform (jpeg/blur/noise/resize); layer2 = a platform pipeline; layer2_recropped = a platform pipeline then re-detected and re-cropped to the face	clean, layer1, layer2, layer2_recropped
`base_id`	Pull every perturbation of ONE base image. Take an item's base_id, add perturbation=... to fetch a specific condition of the same image	any item's base_id
`source_real_id`	Pull the full lineage descended from one sourced real image (the real, its fakes, and their perturbed variants)	any item's source_real_id
`limit`	Page size (values above 500 are clamped; response sets limit_clamped:true)	1-500 (default 100)
`offset`	Pagination offset (order: created_at ascending)	>=0
`cursor`	Stable keyset pagination over a growing table (use instead of offset). Pass the response next_cursor to get the next page	opaque string from next_cursor
`lineage`	Page over whole lineages: filters select which lineages match, and every row of each matched lineage is returned (limit/offset count lineages, not rows)	true
`exclude_owned`	Offset mode only. Omit items you already own (credits are used per unique image). Response adds remaining/owned/total_matching and subset_exhausted with a message when you own the whole matching subset	true

NOTE

When you have pulled everything from a selection, this is not an error. With exclude_owned=true, once you own every item matching a filter, /items returns a normal 200 with data: [], remaining: 0, and subset_exhausted: true plus a message. Check remaining (or subset_exhausted), not a status code: it means there is nothing new to pull for that selection. Broaden the filter (another cell, generator, or perturbation) to get more. Likewise, downloading an item you already own is not an error, it returns the URL for free with charged: false, already_owned: true.

1# fake, dark or brown cell, JPEG q70 or q80, first 2
2page = client.list_items(
3    benchmark="synthetic-face-v1",
4    kind="fake",
5    skin_tone=["dark", "brown"],
6    perturbation=["jpeg_q70", "jpeg_q80"],
7    limit=2,
8).result
9for item in page.data:
10    print(item.id, item.skin_tone, item.perturbation)

NOTE

To assemble matched real/fake sets, pull a lineage with source_real_id: it returns the sourced real image plus every fake and perturbed variant derived from it, all sharing that id.

GET /api/v1/data/download/:itemId

Returns a short-lived signed URL for one item. For live keys this debits one credit before the URL is returned. Sending an Idempotency-Key header is optional but recommended: it de-duplicates retries so a repeated request returns the original result without a second charge. Omit it and a retried download is not de-duplicated, so it could be charged again. Fetch the returned url directly (no auth header on that request).

1import urllib.request
2dl = client.download_item(item_id="8f3c1d2e-...")   # Idempotency-Key set for you
3# dl.url is a short-lived signed URL; fetch it with no auth header
4urllib.request.urlretrieve(dl.url, "image.jpg")

GET /api/v1/data/usage

Your current credit balance and tier. Check before a large pull to avoid a mid-run 402.

1usage = client.get_usage()   # usage.tier, usage.balance

Objects

The resources returned by the API. Every object carries an object discriminator. The paginated /items list wraps results in the standard envelope { object: "list", data: [...], has_more, next_cursor, total }. /benchmarks is a simple list ({ object: "list", data: [...] }) and is not paginated, so it carries no has_more or next_cursor. Fields shared across benchmarks are typed; benchmark-specific fields are carried in attributes.

The /items list also echoes the query back at the top level: benchmark, mode (offset | cursor | lineage), applied_filters, and limit / offset / limit_clamped. In cursor and lineage modes total is null (the full set is not counted); lineage mode adds total_lineages and lineages (the count on the current page). For future benchmarks, an attributes-backed dimension appears in /catalog with source: attribute and is queried by its key like any other dimension; for synthetic-face-v1 there are none, so attributes is always {}.

The benchmark object

Returned by /api/v1/data/benchmarks and /api/v1/data/catalog. Describes a benchmark and the dimensions it exposes.

Field	Type	Description
`object`	string	Always "benchmark".
`id`	string	Versioned benchmark id, used as the benchmark parameter (e.g. synthetic-face-v1).
`product`	string	Portfolio grouping (faces, swaps, puppets, liveness).
`title`	string	Human-readable name.
`description`	string	What the benchmark contains.
`dimensions`	array	The queryable dimensions. Each has key (the query param), label, source (column \| attribute), and either values [{value,label}] for a fixed set or lineage:true for a lineage key.

The item object

Returned by /api/v1/data/items and /api/v1/data/download/:itemId. One deliverable image. Fields that do not apply to a benchmark are null.

Field	Type	Description
`object`	string	Always "attack_data_item".
`id`	string	Item id. Pass to /api/v1/data/download/:itemId to fetch the image.
`benchmark`	string	The benchmark this item belongs to.
`kind`	string	real (a genuine, unmodified photo) or fake (AI-generated).
`skin_tone`	string \| null	Skin-tone band (6-level light-to-dark scale).
`gender`	string \| null	Demographic cell gender.
`generator`	string \| null	Generator model (fake items only).
`perturbation`	string \| null	Condition applied (e.g. clean, jpeg_q70, fb_pipeline).
`layer`	string \| null	Perturbation layer (clean, layer1, layer2, layer2_recropped).
`base_id`	string \| null	The base image this variant derives from. Hold base_id and change perturbation to pull another condition of the SAME image; all perturbations of one image share it.
`source_real_id`	string \| null	Lineage key: the sourced real image this item descends from. All variants of one source share it.
`attributes`	object	Benchmark-specific fields as key/value pairs; empty {} when the benchmark has none.

The download object

Returned by /api/v1/data/download/:itemId. Carries the short-lived signed URL plus the item and updated balance.

Field	Type	Description
`object`	string	Always "download".
`url`	string	Short-lived signed URL that delivers one JPEG image. Fetch it directly with no auth header.
`expires_in`	number	Seconds until the signed URL expires (e.g. 300).
`item`	object	The item object for the downloaded image.
`balance`	number \| null	Credit balance after this download (live tier). null on the test tier, which is free.
`charged`	boolean	true if this pull debited a credit. false for free test items and for re-downloads of an item you already own.
`already_owned`	boolean	true if you had already pulled this item; the URL is returned again for free, no debit.

Errors

Every error body carries a stable machine-readable code alongside the human-readable error message. Branch on code, not on the message text or the HTTP status alone (one status can map to more than one code).

Two things are deliberately not errors: owning every item in a selection (a 200 with subset_exhausted: true on /items?exclude_owned=true) and re-downloading an item you already own (a 200 with charged: false). Neither returns an error code.

Status	Code	Meaning
400	invalid_param	An unknown value for a fixed dimension; the response gives param + allowed.
400	invalid_cursor	The cursor passed for keyset paging is malformed or expired.
400	ambiguous_benchmark	Benchmark omitted while the key sees more than one; the response lists available.
401	unauthorized	Missing, invalid, or revoked API key.
402	insufficient_credits	Out of credits (live tier). Top up in the portal.
403	forbidden_tier	Key not permitted for this item (e.g. a test key requesting a live-tier item).
403	forbidden_scope	The item is outside this key's content scope (a scoped/siloed key requested content it may not pull).
404	not_found	Item not found (or not visible to this key).
404	unknown_benchmark	The requested benchmark id does not exist for this key; the response lists available.
429	rate_limited	Per-key rate limit exceeded.
500	server_error	Unexpected server error.

Quickstart

Create a key at /keys, then pull with the SDK (pip install margen).

1import urllib.request
2from margen import Margen
3 
4client = Margen(bearer_auth="mgn_test_...")   # your key from /keys
5 
6# one dark female fake, clean, from synthetic-face-v1
7item = client.list_items(
8    benchmark="synthetic-face-v1",
9    kind="fake", skin_tone="dark", gender="female",
10    perturbation="clean", limit=1,
11).result.data[0]
12 
13# download it (debits 1 credit on the live tier; free on test)
14dl = client.download_item(item_id=item.id)
15urllib.request.urlretrieve(dl.url, "image.jpg")   # signed URL, no auth header
16print("saved image.jpg, balance:", dl.balance)