Integrating File Scans
Learn how to design and implement a robust file scan integration.
Choosing a Scan Mode
Blazelock offers two ways to submit a file scan:
| Mode | Endpoint | Best for | Response behavior |
|---|---|---|---|
async | POST /file-scans | Most production integrations, background workflows, and larger files | Returns immediately with a scan in processing state |
sync | POST /file-scans/sync | Request-response flows that need a result during the same client request | Returns a terminal result or a timeout error |
When to Prefer async
Use asynchronous scans by default when:
- Your application can continue work in the background
- You want to track status through webhooks or later polling
- You want to avoid holding an HTTP request open while the scan runs
- You need to scan files larger than the synchronous upload limit
The asynchronous endpoint accepts files up to 100 MB and returns the created scan in processing state. You can then track progress through webhooks or the status endpoints.
When sync Is a Good Fit
Use synchronous scans when all of the following are true:
- The caller benefits from an inline result in the same request-response cycle
- Waiting for the scan is acceptable for the user experience
- The file stays within the synchronous upload limit
The synchronous endpoint accepts files up to 10 MB. It waits inside a bounded polling window for a terminal result and then returns either completed or failed.
If the scan does not finish in time, the endpoint returns 408 sync_scan_timeout together with file_scan_id. The scan itself still exists, so your integration should continue with GET /file-scans/{id} or webhooks instead of treating the timeout as a lost submission.
In most integrations, async is the better default. Choose sync only when an immediate result meaningfully changes the calling flow.
Polling vs. Webhooks
There are two common ways to track asynchronous scans:
| Approach | How it works | Strengths | Tradeoffs |
|---|---|---|---|
| Polling | Your system repeatedly requests the latest data until the status changes. | Simple to understand and easy to test at the beginning. | Creates unnecessary requests, adds latency, and shifts more coordination logic into your application. |
| Webhooks | Blazelock sends an event to your endpoint when the status changes. | Near real-time updates, fewer API calls, and a cleaner asynchronous architecture. | Requires a public HTTPS endpoint and proper request validation. |
If you only need an occasional status check, polling can be enough. If you want an event-driven integration that scales better and reacts faster, webhooks are usually the better choice.
Using external_reference_id
external_reference_id is an optional identifier that you can send in attributes.external_reference_id when submitting a scan.
Example:
{
"attributes": {
"file_name": "invoice.pdf",
"external_reference_id": "invoice-4711"
}
}If you provide it:
- The value is returned in scan responses
- The value is included in webhook payloads
- You can look up the scan through
GET /file-scans/external-reference/{external_reference_id}
The value is resolved within the authenticated API integration. In practice, that means it must be unique per integration. If you submit the same value again for the same integration, the API returns 409 duplicate_external_reference_id.
When It Is Useful
external_reference_id is useful when you want to connect the scan to an ID that already exists in your own system, for example:
- An upload record ID
- A document or invoice ID
- A job, workflow, or queue item ID
This lets your application and webhook consumers work with a business identifier they already know instead of depending only on the Blazelock scan ID.
Format
external_reference_id is optional, can be up to 255 characters, and may contain only letters, digits, ., _, :, /, #, @, and -.
Recommendations
- If you decide to use
external_reference_id, choose a stable value that already identifies the related business object in your system. - Namespace values when multiple workflows share one integration, for example
invoice/4711orupload:01HXYZ. - Treat the value as an identifier, not as a secret.
- If the same business object can be scanned more than once, version the value explicitly, for example
invoice/4711/scan/2.