GC AI’s inference endpoints run asynchronously. When you call one, GC AI creates a background job, runs the model, and returns a job envelope: a small object that describes the job and carries its result once it’s ready.
The endpoint holds your request open for a short window. If the job finishes within it, the envelope comes back with the result already filled in; if it doesn’t, the envelope comes back describing a job that’s still running, with a job_id you use to fetch the result later. Once a job is created you always get a job envelope, never a separate synchronous response, and this holds for every inference endpoint. (Calls that fail before the job starts return a standard HTTP error instead of an envelope; see Job failures vs. request errors.) You control how long that window is. Waiting inline vs. fire-and-forget covers it in full.
Why inference is asynchronous
Legal AI inference is slow and variable. Reviewing a long contract against a playbook, or answering a question grounded in several uploaded files, can take anywhere from a couple of seconds to well over a minute. A plain request/response call that blocks for that entire time is fragile: it runs into edge-network and proxy timeouts, it ties up a connection, and it gives you nothing to hold onto if the connection drops mid-flight.
Modeling every inference call as a job solves this:
- The work has a stable identity. Each job has a
job_id you can come back to. Once you hold that id, a dropped connection costs you nothing: you poll for the result instead of losing it. The job_id arrives with the response, so to survive the initial request dropping, use wait=0 to receive the id immediately (see Waiting inline vs. fire-and-forget).
- You choose how long to wait. You can block for a result inline (up to 90 seconds, the default), or return immediately and collect the result later (fire-and-forget). Both use the same machinery.
- The response shape is uniform. Every inference endpoint, today and in the future, returns the same envelope, so one client integration handles all of them.
Today, two endpoints create jobs:
| Endpoint | kind | Reference |
|---|
POST /chat/completions | chat/completions | Create a Chat Completion |
POST /playbooks/{id}/run | playbooks/run | Run a Playbook |
Each one creates a job and returns the same envelope. The only things that change per endpoint are the shape of result and the value of kind. As we add more inference endpoints over time, they will all work this way, and the rest of this guide applies to all of them.
Non-inference endpoints (uploading a file, creating a folder, listing playbooks, materializing a chat) are ordinary synchronous calls and do not return a job envelope. Asynchronous jobs are specific to inference.
The job envelope
Every inference endpoint returns the same top-level object:
{
"job_id": "123e4567-e89b-12d3-a456-426614174000",
"kind": "chat/completions",
"status": "succeeded",
"result": {
"result": "When reviewing a software license agreement, key terms to examine include...",
"chat_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
},
"error": null,
"created_at": "2026-04-30T20:00:00.000Z",
"completed_at": "2026-04-30T20:00:02.000Z"
}
| Field | Type | Description |
|---|
job_id | string (UUID) | Stable identifier for the job. Use it to poll for the result. |
kind | string | Which inference produced this job: chat/completions or playbooks/run. Lets one client branch on the result shape. |
status | string | Lifecycle state. One of pending, running, succeeded, failed, canceled. |
result | object | null | The endpoint-specific payload. null until the job succeeds; the shape depends on kind. |
error | object | null | { "code": string, "message": string } when the job has failed; otherwise null. |
created_at | string (ISO 8601) | When the job was created. |
completed_at | string (ISO 8601) | null | When the job reached a terminal state, or null while still in flight. |
The result field is the only part that differs between endpoints. For chat/completions it carries the answer text and a chat_id; for playbooks/run it carries the review payload. See each endpoint’s reference for its exact result schema.
Job status values
| Status | Terminal? | Meaning |
|---|
pending | No | Created and queued, not yet picked up. |
running | No | Currently executing. |
succeeded | Yes | Finished successfully; result is populated. |
failed | Yes | Finished with an error; error is populated. |
canceled | Yes | Stopped before completion. |
A job is done when it reaches a terminal state (succeeded, failed, or canceled). Until then it is either pending or running, and you should keep waiting or polling.
Waiting inline vs. fire-and-forget
When you call an inference endpoint, you decide how long GC AI should hold the connection open waiting for the job to finish before it returns the envelope. This is the wait window, the one control that drives the async model: a single mechanism that gives you both “block until done” and “return immediately” behavior.
You control it two ways, which must agree if you supply both:
- A
wait query parameter: POST /chat/completions?wait=30
- A
Prefer: wait=30 request header (RFC 7240)
The value is in seconds. The default is 90, and 90 is also the maximum; values above it are clamped down. (90 seconds sits just under the edge-network timeout, leaving headroom to flush the response.) If you pass both the query parameter and the header and they disagree, the request is rejected with 400.
The endpoint behaves like a long poll. It returns as soon as either of these happens:
- The job reaches a terminal state, and you get the finished envelope, including the
result.
- The wait window elapses, and you get the envelope in whatever non-terminal state it is in (
pending or running), with result still null.
So the two common modes are just two values of the same knob:
- Wait inline (default). Omit
wait, or set it up to 90. Most calls finish inside the window and return 200 with the result already populated. It feels synchronous, but it is still a job under the hood.
- Fire-and-forget. Set
wait=0 (or Prefer: wait=0). The endpoint enqueues the job and returns immediately with a pending envelope. You collect the result later by polling. Use this when you do not want to hold a connection, for example kicking off many reviews in a batch.
The HTTP status code tells you whether the job is done, not whether it succeeded:
200 OK: the job is in a terminal state (succeeded, failed, or canceled). The envelope is final; check status to see which.
202 Accepted: the job is still pending or running. Come back for the result.
A job that ran and failed still returns 200, because it is done, with status: "failed" and a populated error. See Job failures vs. request errors.
Polling for the result
When a call returns 202 (the wait window elapsed, or you used wait=0), the job is still running. Use the Get Async Job Status endpoint to retrieve it by job_id:
curl https://app.gc.ai/api/external/v1/jobs/123e4567-e89b-12d3-a456-426614174000 \
-H "Authorization: gcai_your_api_key_here"
This returns the exact same envelope shape. As with the inference endpoints, you get 200 once the job is terminal and 202 while it is still in flight.
Long-poll instead of busy-poll
The jobs endpoint also accepts the wait parameter, and this is the recommended way to poll. Rather than requesting repeatedly in a tight loop, ask the jobs endpoint to hold the connection until the job finishes:
# Blocks up to 90s, returning the moment the job reaches a terminal state
curl "https://app.gc.ai/api/external/v1/jobs/123e4567-e89b-12d3-a456-426614174000?wait=90" \
-H "Authorization: gcai_your_api_key_here"
If the job is still running after the window, you get a 202 and simply request again. This keeps the result latency low while using far fewer requests than fixed-interval polling. If you do poll on a fixed interval instead (for example with wait=0), leave a few seconds between requests.
A complete fire-and-forget flow
# 1. Enqueue the job and return immediately.
JOB_ID=$(curl -s -X POST "https://app.gc.ai/api/external/v1/chat/completions?wait=0" \
-H "Authorization: gcai_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"message": "Summarize the indemnification terms in this contract.", "file_ids": ["123e4567-e89b-12d3-a456-426614174000"]}' \
| jq -r '.job_id')
# 2. Long-poll for the result until it is terminal.
while true; do
RESPONSE=$(curl -s "https://app.gc.ai/api/external/v1/jobs/$JOB_ID?wait=90" \
-H "Authorization: gcai_your_api_key_here")
STATUS=$(echo "$RESPONSE" | jq -r '.status')
case "$STATUS" in
succeeded) echo "$RESPONSE" | jq '.result'; break ;;
failed|canceled) echo "$RESPONSE" | jq '.error'; break ;;
pending|running) ;; # the wait window elapsed, poll again
*) echo "Unexpected response:" >&2; echo "$RESPONSE" >&2; break ;;
esac
done
Most integrations do not need fire-and-forget at all. If a single inline call with the default wait finishes inside 90 seconds, as the large majority do, you get the result in one request and never touch the jobs endpoint. Reach for the polling flow when you expect long-running jobs, are running many in parallel, or cannot hold a connection open.
Job failures vs. request errors
There are two distinct ways an inference call can go wrong, and they surface differently.
Request errors are problems with the call itself; the job never starts. These come back as standard HTTP error codes with an error body, not a job envelope:
| Status | Cause |
|---|
400 | Invalid request body, or conflicting wait controls. |
401 | Missing or malformed API key. |
402 | Billing isn’t set up (BILLING_NOT_CONFIGURED) or the account is out of credits (INSUFFICIENT_CREDITS). The body carries a code. |
403 | Invalid API key, or the feature is not enabled for your account. |
404 | A referenced file_id, playbook_id, or job_id was not found. |
422 | A referenced file failed text extraction or isn’t ready yet (playbook runs). Poll GET /files/{id} until its status is ready. |
500 | An unexpected internal error before the job was created. Safe to retry; if it persists, contact support. |
503 | Service temporarily unavailable. Includes a Retry-After header (in seconds). |
Job failures happen after the job has started and the model runs into a problem. The job reaches the terminal failed state, so the call returns 200 with a job envelope: status: "failed" and a populated error object:
{
"job_id": "123e4567-e89b-12d3-a456-426614174000",
"kind": "playbooks/run",
"status": "failed",
"result": null,
"error": {
"code": "extraction_failed",
"message": "Could not extract text from one or more files."
},
"created_at": "2026-04-30T20:00:00.000Z",
"completed_at": "2026-04-30T20:00:05.000Z"
}
The rule of thumb: a 4xx/5xx status means your request didn’t run; a 200 with status: "failed" means it ran and failed. A 202 is neither — the job was created and is still running, so come back for it (poll for the result). Handle all three: branch on the HTTP status first, then on the envelope status.