API reference
Attack-data is labeled real and AI-generated media built to stress-test deepfake detectors. Pull it here, organized into benchmarks you select and filter by type. Authenticated with an API key and metered in credits. Test keys pull a free fixed sample. For the why behind attack-data, see the attack-data overview.
Getting started
The base URL is https://www.margensoftware.com and all endpoints live under /api/v1/data. Responses are JSON. Image bytes are never returned inline: a download returns a short-lived signed URL you fetch directly. The unversioned /api/data prefix still works as a permanent alias, so older integrations keep running.
You need a Margen account to get a key. Sign up at /eval/signup or log in at /eval/login, then manage keys at /keys.
1https://www.margensoftware.com/api/v1/data
Client SDK
The official Python SDK is on PyPI. Install it and construct a client with your key. The typed operations are list_benchmarks, get_catalog, list_items, download_item, and get_usage; a list value for any filter is sent as a comma-separated list. list_items returns a paginated wrapper you read through .result (so .result.data), while list_benchmarks, get_catalog, get_usage, and download_item return the object directly. For notebook workflows, margen.ergonomics adds iter_items / iter_lineages (paginate the whole result set) and download_selection (one-call bulk download to a folder). Other languages can call the REST endpoints directly (see the curl tab on each endpoint below).
1pip install margen
Authentication
Generate a key in the portal at /keys. The raw key is shown once; store it. Send it as a bearer token on every request.
1Authorization: Bearer mgn_live_xxxxxxxxxxxxxxxx
- Test keys (
mgn_test_...): free, no credits needed, open to any account. A test key sees only the test sample, a fixed subset, so its catalog counts and available items differ from a live key. - Live keys (
mgn_live_...): the full corpus, credit-metered. Open to any account, no approval. Buying credits is the access, a live pull debits one credit per image. Enterprise or volume terms are arranged separately via a volume order.
Credits and pricing
- One credit pulls one image. Credits are prepaid and debited in real time at download. Buy credits in the portal at /keys (pick any amount; a flat per-image rate is shown at purchase).
- Credit usage is per unique image, per account. Your account is charged once per item id, and credits and ownership are account-scoped, so they survive a key rotation. Re-downloading an item your account already owns is free: the response carries
charged: falseandalready_owned: true. So pulling the same selection twice never double-charges. - To pull only images you do not already own, pass
exclude_owned=trueto/items; it also reportsremaining/ownedand asubset_exhaustedmessage when you own the whole matching subset. - A download is charged only on success; a retry with the same
Idempotency-Keyis never charged twice. - With a zero balance a live-tier download returns
402(codeinsufficient_credits) and delivers nothing. Top up in the portal. - Test downloads are free and never touch your balance.
Idempotency-Key returns the original result without a second charge.Rate limits
Each key has its own per-minute limit (set when you create the key, default 60). Exceeding it returns 429. Set a low limit to protect a credit balance from a runaway script.
Benchmarks
The data is organized into benchmarks. A benchmark is a versioned dataset (for example synthetic-face-v1) with its own set of queryable dimensions, the same way different models expose different parameters. You choose a benchmark, discover its dimensions, then query items within it. Every request that touches data takes a benchmark parameter.
List the benchmarks your key can query with /api/v1/data/benchmarks, then pass ?benchmark=<id> to /catalog (to see its dimensions and values) and /items (to select images). The dimensions differ per benchmark; /catalog is always the source of truth for what a given benchmark supports.
benchmark parameter and it is used by default. Once your key can see more than one, the parameter is required and omitting it returns a 400 listing the available ids. New benchmarks are added without changing this contract, so integrations built against one benchmark keep working.Available benchmarks
For full dataset specs (composition, image spec, labeling) see the Synthetic Face Image benchmark page.
Real (genuine, unmodified) vs AI-generated face crops across demographic cells (each cell is one skin-tone x gender combination) and generator models, each re-encoded through platform pipelines and image perturbations. Real and generated images are linked by lineage (a real image plus everything derived from it).
Product: Faces·Tiers: test (free sample), live (full corpus, credit-metered)
skin_tonegenderkindgeneratorperturbationlayerbase_idsource_real_idAdditional benchmarks for face-swap video, image-to-video puppeteering, and active-liveness presentation attacks are in development. Each will expose its own dimensions and appear here when released; no code change is needed to query a new benchmark.
Product: Swaps / Puppets / Liveness·Tiers: not yet available
Selecting images
Pulling data is three steps: discover a benchmark's dimensions with /catalog, select the items you want with /items, then fetch each with /download. Selection happens entirely in the /items query string, so a request fully describes the set you are pulling. The recipes below are copy-ready against synthetic-face-v1.
Two rules cover every query: a comma-separated value matches any of the listed values (OR within a dimension), and separate parameters must all hold (AND across dimensions). Omit a dimension to include all of its values.
Two terms used throughout: a cell is one skin-tone x gender combination, and a lineage is a sourced real image plus every fake and perturbed variant derived from it, all sharing one source_real_id.
One specific type
The finest-grained selection: a single cell, one condition. Every filter is a single value, so exactly one type of image comes back.
Request
1items = list(iter_items(2client, benchmark="synthetic-face-v1",3kind="fake", skin_tone="dark", gender="female", perturbation="clean",4))
Response
1{2"object": "list",3"total": 12,4"has_more": true,5"benchmark": "synthetic-face-v1",6"data": [7{ "object": "attack_data_item", "id": "8f3c1d2e-...",8"kind": "fake", "skin_tone": "dark", "gender": "female",9"generator": "diffusion-v1", "perturbation": "clean",10"layer": "clean", "base_id": "b2d4e6f8-...", "source_real_id": "real_0001" }11]12}
Several values at once
A list value builds a set in one call: dark or brown, at JPEG q70 or q80, that are fakes. The client sends a list as a comma-separated value; applied_filters echoes exactly what the query understood.
Request
1items = list(iter_items(2client, benchmark="synthetic-face-v1", kind="fake",3skin_tone=["dark", "brown"], perturbation=["jpeg_q70", "jpeg_q80"],4))
Response
1{2"object": "list",3"total": 64,4"has_more": false,5"benchmark": "synthetic-face-v1",6"applied_filters": {7"skin_tone": ["dark","brown"], "kind": ["fake"],8"perturbation": ["jpeg_q70","jpeg_q80"], "gender": null9},10"data": [ /* dark+brown x q70+q80 fakes */ ]11}
A matched lineage (real + everything from it)
Pull a sourced real image and every fake and perturbed variant derived from it, all sharing one source_real_id. Use this to build matched real/fake pairs for paired evaluation. To page over whole lineages at once, add lineage="true".
Request
1items = list(iter_items(2client, benchmark="synthetic-face-v1", source_real_id="real_0001",3))
Response
1{2"object": "list",3"benchmark": "synthetic-face-v1",4"data": [5{ "object": "attack_data_item", "id": "r1a2b3c4-...", "kind": "real",6"perturbation": "clean", "base_id": "img_r1a2", "source_real_id": "real_0001" },7{ "object": "attack_data_item", "id": "f5d6e7f8-...", "kind": "fake",8"perturbation": "clean", "generator": "diffusion-v1",9"base_id": "img_f5d6", "source_real_id": "real_0001" },10{ "object": "attack_data_item", "id": "f9a0b1c2-...", "kind": "fake",11"perturbation": "fb_pipeline", "generator": "diffusion-v1",12"base_id": "img_f5d6", "source_real_id": "real_0001" }13]14}
Other conditions of the SAME image
Hold an item's base_id and change perturbation to pull another condition of the exact same image. Every perturbation of one image shares a base_id (distinct from source_real_id, which spans the whole real-source family).
Request
1# you have an item; pull every perturbation of that same base image2variants = list(iter_items(3client, benchmark="synthetic-face-v1", base_id=item.base_id,4))5# or jump straight to one condition of the same image:6one = client.list_items(7benchmark="synthetic-face-v1", base_id=item.base_id, perturbation="jpeg_q70",8).result.data[0]
Response
1{2"object": "list",3"benchmark": "synthetic-face-v1",4"data": [5{ "object": "attack_data_item", "id": "f7c8d9e0-...", "kind": "fake",6"perturbation": "jpeg_q70", "generator": "diffusion-v1",7"base_id": "img_f5d6", "source_real_id": "real_0001" }8]9}
Then fetch, and page
Fetch each item with GET /api/v1/data/download/<id>. It returns a signed URL that delivers one JPEG image and expires after 300 seconds (5 minutes), so fetch it promptly and send no auth header on that request. On the live tier downloads are credit-metered; check your balance first with /api/v1/data/usage to avoid a mid-run 402.
Three ways to page, by size and use:
offset+limitfor one-shot small pulls (returns an exacttotal).cursorfor large or repeated pulls: stable if items are added while you page (pass the responsenext_cursorback in). This is the only mode a generated SDK auto-pages; offset and lineage modes are paged manually.lineage=trueto page by matched sets rather than rows (each page is whole lineages).
Endpoints
GET /api/v1/data/benchmarks
The benchmarks your key can query, each with its id, product, title, and the dimension parameters it exposes. Use a benchmark id as the benchmark parameter on the other endpoints.
1benchmarks = client.list_benchmarks().data
GET /api/v1/data/catalog?benchmark=<id>
The filter dimensions a benchmark exposes, each with its allowed values (labeled where the raw value is opaque, e.g. conditions and layers), plus the total item count for your tier. The filters block maps each /api/v1/data/items query parameter to the values allowed for your key, so you can build a valid query without memorizing slugs. Omit benchmark only if your key sees a single benchmark.
1catalog = client.get_catalog(benchmark="synthetic-face-v1")
GET /api/v1/data/items
A filtered list of items for a benchmark (ids and attributes, no storage paths). The filterable dimensions are defined by the benchmark, so you select images with exactly the discrimination it supports. The table below is the synthetic-face-v1 benchmark; call /api/v1/data/catalog?benchmark=<id> for any benchmark's parameters and values. Every parameter is optional; omit a parameter to include all values for that dimension, and each accepts a comma-separated list matching any of the given values (OR within the dimension), e.g. skin_tone=dark,brown or perturbation=jpeg_q70,jpeg_q80. Unknown values for a fixed dimension return 400 with the allowed set.
| Parameter | Meaning | Allowed values |
|---|---|---|
benchmark | Which benchmark to query (see /api/v1/data/benchmarks). Omit only if your key sees one benchmark | e.g. synthetic-face-v1 |
skin_tone | Skin-tone band on a 6-level light-to-dark scale | very_light, light, intermediate, tan, brown, dark |
gender | Perceived gender of the face | female, male |
kind | Real (a genuine, unmodified photo) or fake (AI-generated) | real, fake |
generator | Model that produced the image (fake only; null for real) | see /catalog generators |
perturbation | Image condition applied after generation. Alias: condition. jpeg_q* = JPEG at that quality; blur_*/noise_*/resize_* = that transform; *_pipeline = a re-encode through that platform's upload pipeline (fb=Facebook, ig=Instagram, tt=TikTok, x=X) | clean, jpeg_q30/50/70/80/95, blur_1/2/4, noise_5/10, resize_0.5/0.75, fb_pipeline, ig_pipeline, tt_pipeline, x_pipeline |
layer | Coarse grouping of conditions: clean = no perturbation; layer1 = one lossy transform (jpeg/blur/noise/resize); layer2 = a platform pipeline; layer2_recropped = a platform pipeline then re-detected and re-cropped to the face | clean, layer1, layer2, layer2_recropped |
base_id | Pull every perturbation of ONE base image. Take an item's base_id, add perturbation=... to fetch a specific condition of the same image | any item's base_id |
source_real_id | Pull the full lineage descended from one sourced real image (the real, its fakes, and their perturbed variants) | any item's source_real_id |
limit | Page size (values above 500 are clamped; response sets limit_clamped:true) | 1-500 (default 100) |
offset | Pagination offset (order: created_at ascending) | >=0 |
cursor | Stable keyset pagination over a growing table (use instead of offset). Pass the response next_cursor to get the next page | opaque string from next_cursor |
lineage | Page over whole lineages: filters select which lineages match, and every row of each matched lineage is returned (limit/offset count lineages, not rows) | true |
exclude_owned | Offset mode only. Omit items you already own (credits are used per unique image). Response adds remaining/owned/total_matching and subset_exhausted with a message when you own the whole matching subset | true |
exclude_owned=true, once you own every item matching a filter, /items returns a normal 200 with data: [], remaining: 0, and subset_exhausted: true plus a message. Check remaining (or subset_exhausted), not a status code: it means there is nothing new to pull for that selection. Broaden the filter (another cell, generator, or perturbation) to get more. Likewise, downloading an item you already own is not an error, it returns the URL for free with charged: false, already_owned: true.1# fake, dark or brown cell, JPEG q70 or q80, first 22page = client.list_items(3benchmark="synthetic-face-v1",4kind="fake",5skin_tone=["dark", "brown"],6perturbation=["jpeg_q70", "jpeg_q80"],7limit=2,8).result9for item in page.data:10print(item.id, item.skin_tone, item.perturbation)
source_real_id: it returns the sourced real image plus every fake and perturbed variant derived from it, all sharing that id.GET /api/v1/data/download/:itemId
Returns a short-lived signed URL for one item. For live keys this debits one credit before the URL is returned. Sending an Idempotency-Key header is optional but recommended: it de-duplicates retries so a repeated request returns the original result without a second charge. Omit it and a retried download is not de-duplicated, so it could be charged again. Fetch the returned url directly (no auth header on that request).
1import urllib.request2dl = client.download_item(item_id="8f3c1d2e-...") # Idempotency-Key set for you3# dl.url is a short-lived signed URL; fetch it with no auth header4urllib.request.urlretrieve(dl.url, "image.jpg")
GET /api/v1/data/usage
Your current credit balance and tier. Check before a large pull to avoid a mid-run 402.
1usage = client.get_usage() # usage.tier, usage.balance
Objects
The resources returned by the API. Every object carries an object discriminator. The paginated /items list wraps results in the standard envelope { object: "list", data: [...], has_more, next_cursor, total }. /benchmarks is a simple list ({ object: "list", data: [...] }) and is not paginated, so it carries no has_more or next_cursor. Fields shared across benchmarks are typed; benchmark-specific fields are carried in attributes.
The /items list also echoes the query back at the top level: benchmark, mode (offset | cursor | lineage), applied_filters, and limit / offset / limit_clamped. In cursor and lineage modes total is null (the full set is not counted); lineage mode adds total_lineages and lineages (the count on the current page). For future benchmarks, an attributes-backed dimension appears in /catalog with source: attribute and is queried by its key like any other dimension; for synthetic-face-v1 there are none, so attributes is always {}.
The benchmark object
Returned by /api/v1/data/benchmarks and /api/v1/data/catalog. Describes a benchmark and the dimensions it exposes.
| Field | Type | Description |
|---|---|---|
object | string | Always "benchmark". |
id | string | Versioned benchmark id, used as the benchmark parameter (e.g. synthetic-face-v1). |
product | string | Portfolio grouping (faces, swaps, puppets, liveness). |
title | string | Human-readable name. |
description | string | What the benchmark contains. |
dimensions | array | The queryable dimensions. Each has key (the query param), label, source (column | attribute), and either values [{value,label}] for a fixed set or lineage:true for a lineage key. |
The item object
Returned by /api/v1/data/items and /api/v1/data/download/:itemId. One deliverable image. Fields that do not apply to a benchmark are null.
| Field | Type | Description |
|---|---|---|
object | string | Always "attack_data_item". |
id | string | Item id. Pass to /api/v1/data/download/:itemId to fetch the image. |
benchmark | string | The benchmark this item belongs to. |
kind | string | real (a genuine, unmodified photo) or fake (AI-generated). |
skin_tone | string | null | Skin-tone band (6-level light-to-dark scale). |
gender | string | null | Demographic cell gender. |
generator | string | null | Generator model (fake items only). |
perturbation | string | null | Condition applied (e.g. clean, jpeg_q70, fb_pipeline). |
layer | string | null | Perturbation layer (clean, layer1, layer2, layer2_recropped). |
base_id | string | null | The base image this variant derives from. Hold base_id and change perturbation to pull another condition of the SAME image; all perturbations of one image share it. |
source_real_id | string | null | Lineage key: the sourced real image this item descends from. All variants of one source share it. |
attributes | object | Benchmark-specific fields as key/value pairs; empty {} when the benchmark has none. |
The download object
Returned by /api/v1/data/download/:itemId. Carries the short-lived signed URL plus the item and updated balance.
| Field | Type | Description |
|---|---|---|
object | string | Always "download". |
url | string | Short-lived signed URL that delivers one JPEG image. Fetch it directly with no auth header. |
expires_in | number | Seconds until the signed URL expires (e.g. 300). |
item | object | The item object for the downloaded image. |
balance | number | null | Credit balance after this download (live tier). null on the test tier, which is free. |
charged | boolean | true if this pull debited a credit. false for free test items and for re-downloads of an item you already own. |
already_owned | boolean | true if you had already pulled this item; the URL is returned again for free, no debit. |
Errors
Every error body carries a stable machine-readable code alongside the human-readable error message. Branch on code, not on the message text or the HTTP status alone (one status can map to more than one code).
Two things are deliberately not errors: owning every item in a selection (a 200 with subset_exhausted: true on /items?exclude_owned=true) and re-downloading an item you already own (a 200 with charged: false). Neither returns an error code.
| Status | Code | Meaning |
|---|---|---|
| 400 | invalid_param | An unknown value for a fixed dimension; the response gives param + allowed. |
| 400 | invalid_cursor | The cursor passed for keyset paging is malformed or expired. |
| 400 | ambiguous_benchmark | Benchmark omitted while the key sees more than one; the response lists available. |
| 401 | unauthorized | Missing, invalid, or revoked API key. |
| 402 | insufficient_credits | Out of credits (live tier). Top up in the portal. |
| 403 | forbidden_tier | Key not permitted for this item (e.g. a test key requesting a live-tier item). |
| 403 | forbidden_scope | The item is outside this key's content scope (a scoped/siloed key requested content it may not pull). |
| 404 | not_found | Item not found (or not visible to this key). |
| 404 | unknown_benchmark | The requested benchmark id does not exist for this key; the response lists available. |
| 429 | rate_limited | Per-key rate limit exceeded. |
| 500 | server_error | Unexpected server error. |
Quickstart
Create a key at /keys, then pull with the SDK (pip install margen).
1import urllib.request2from margen import Margen34client = Margen(bearer_auth="mgn_test_...") # your key from /keys56# one dark female fake, clean, from synthetic-face-v17item = client.list_items(8benchmark="synthetic-face-v1",9kind="fake", skin_tone="dark", gender="female",10perturbation="clean", limit=1,11).result.data[0]1213# download it (debits 1 credit on the live tier; free on test)14dl = client.download_item(item_id=item.id)15urllib.request.urlretrieve(dl.url, "image.jpg") # signed URL, no auth header16print("saved image.jpg, balance:", dl.balance)