Asynchronous Requests

GC AI’s inference endpoints run asynchronously. When you call one, GC AI creates a background job, runs the model, and returns a job envelope: a small object that describes the job and carries its result once it’s ready. The endpoint holds your request open for a short window. If the job finishes within it, the envelope comes back with the result already filled in; if it doesn’t, the envelope comes back describing a job that’s still running, with a job_id you use to fetch the result later. Once a job is created you always get a job envelope, never a separate synchronous response, and this holds for every inference endpoint. (Calls that fail before the job starts return a standard HTTP error instead of an envelope; see Job failures vs. request errors.) You control how long that window is. Waiting inline vs. fire-and-forget covers it in full.

Why inference is asynchronous

Legal AI inference is slow and variable. Reviewing a long contract against a playbook, or answering a question grounded in several uploaded files, can take anywhere from a couple of seconds to well over a minute. A plain request/response call that blocks for that entire time is fragile: it runs into edge-network and proxy timeouts, it ties up a connection, and it gives you nothing to hold onto if the connection drops mid-flight. Modeling every inference call as a job solves this:

The work has a stable identity. Each job has a job_id you can come back to. Once you hold that id, a dropped connection costs you nothing: you poll for the result instead of losing it. The job_id arrives with the response, so to survive the initial request dropping, use wait=0 to receive the id immediately (see Waiting inline vs. fire-and-forget).
You choose how long to wait. You can block for a result inline (up to 90 seconds, the default), or return immediately and collect the result later (fire-and-forget). Both use the same machinery.
The response shape is uniform. Every inference endpoint, today and in the future, returns the same envelope, so one client integration handles all of them.

Today, two endpoints create jobs:

Endpoint	`kind`	Reference
`POST /chat/completions`	`chat/completions`	Create a Chat Completion
`POST /playbooks/{id}/run`	`playbooks/run`	Run a Playbook

Each one creates a job and returns the same envelope. The only things that change per endpoint are the shape of result and the value of kind. As we add more inference endpoints over time, they will all work this way, and the rest of this guide applies to all of them.

Non-inference endpoints (uploading a file, creating a folder, listing playbooks, materializing a chat) are ordinary synchronous calls and do not return a job envelope. Asynchronous jobs are specific to inference.

The job envelope

Every inference endpoint returns the same top-level object:

{
  "job_id": "123e4567-e89b-12d3-a456-426614174000",
  "kind": "chat/completions",
  "status": "succeeded",
  "result": {
    "result": "When reviewing a software license agreement, key terms to examine include...",
    "chat_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  },
  "error": null,
  "created_at": "2026-04-30T20:00:00.000Z",
  "completed_at": "2026-04-30T20:00:02.000Z"
}

Field	Type	Description
`job_id`	string (UUID)	Stable identifier for the job. Use it to poll for the result.
`kind`	string	Which inference produced this job: `chat/completions` or `playbooks/run`. Lets one client branch on the result shape.
`status`	string	Lifecycle state. One of `pending`, `running`, `succeeded`, `failed`, `canceled`.
`result`	object \| null	The endpoint-specific payload. `null` until the job succeeds; the shape depends on `kind`.
`error`	object \| null	`{ "code": string, "message": string }` when the job has `failed`; otherwise `null`.
`created_at`	string (ISO 8601)	When the job was created.
`completed_at`	string (ISO 8601) \| null	When the job reached a terminal state, or `null` while still in flight.

The result field is the only part that differs between endpoints. For chat/completions it carries the answer text and a chat_id; for playbooks/run it carries the review payload. See each endpoint’s reference for its exact result schema.

Job status values

Status	Terminal?	Meaning
`pending`	No	Created and queued, not yet picked up.
`running`	No	Currently executing.
`succeeded`	Yes	Finished successfully; `result` is populated.
`failed`	Yes	Finished with an error; `error` is populated.
`canceled`	Yes	Stopped before completion.

A job is done when it reaches a terminal state (succeeded, failed, or canceled). Until then it is either pending or running, and you should keep waiting or polling.

Waiting inline vs. fire-and-forget

When you call an inference endpoint, you decide how long GC AI should hold the connection open waiting for the job to finish before it returns the envelope. This is the wait window, the one control that drives the async model: a single mechanism that gives you both “block until done” and “return immediately” behavior. You control it two ways, which must agree if you supply both:

A wait query parameter: POST /chat/completions?wait=30
A Prefer: wait=30 request header (RFC 7240)

The value is in seconds. The default is 90, and 90 is also the maximum; values above it are clamped down. (90 seconds sits just under the edge-network timeout, leaving headroom to flush the response.) If you pass both the query parameter and the header and they disagree, the request is rejected with 400. The endpoint behaves like a long poll. It returns as soon as either of these happens:

The job reaches a terminal state, and you get the finished envelope, including the result.
The wait window elapses, and you get the envelope in whatever non-terminal state it is in (pending or running), with result still null.

So the two common modes are just two values of the same knob:

Wait inline (default). Omit wait, or set it up to 90. Most calls finish inside the window and return 200 with the result already populated. It feels synchronous, but it is still a job under the hood.
Fire-and-forget. Set wait=0 (or Prefer: wait=0). The endpoint enqueues the job and returns immediately with a pending envelope. You collect the result later by polling. Use this when you do not want to hold a connection, for example kicking off many reviews in a batch.

The HTTP status code tells you whether the job is done, not whether it succeeded:

200 OK: the job is in a terminal state (succeeded, failed, or canceled). The envelope is final; check status to see which.
202 Accepted: the job is still pending or running. Come back for the result.

A job that ran and failed still returns 200, because it is done, with status: "failed" and a populated error. See Job failures vs. request errors.

Polling for the result

When a call returns 202 (the wait window elapsed, or you used wait=0), the job is still running. Use the Get Async Job Status endpoint to retrieve it by job_id:

curl https://app.gc.ai/api/external/v1/jobs/123e4567-e89b-12d3-a456-426614174000 \
  -H "Authorization: gcai_your_api_key_here"

This returns the exact same envelope shape. As with the inference endpoints, you get 200 once the job is terminal and 202 while it is still in flight.

Long-poll instead of busy-poll

The jobs endpoint also accepts the wait parameter, and this is the recommended way to poll. Rather than requesting repeatedly in a tight loop, ask the jobs endpoint to hold the connection until the job finishes:

# Blocks up to 90s, returning the moment the job reaches a terminal state
curl "https://app.gc.ai/api/external/v1/jobs/123e4567-e89b-12d3-a456-426614174000?wait=90" \
  -H "Authorization: gcai_your_api_key_here"

If the job is still running after the window, you get a 202 and simply request again. This keeps the result latency low while using far fewer requests than fixed-interval polling. If you do poll on a fixed interval instead (for example with wait=0), leave a few seconds between requests.

A complete fire-and-forget flow

# 1. Enqueue the job and return immediately.
JOB_ID=$(curl -s -X POST "https://app.gc.ai/api/external/v1/chat/completions?wait=0" \
  -H "Authorization: gcai_your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"message": "Summarize the indemnification terms in this contract.", "file_ids": ["123e4567-e89b-12d3-a456-426614174000"]}' \
  | jq -r '.job_id')

# 2. Long-poll for the result until it is terminal.
while true; do
  RESPONSE=$(curl -s "https://app.gc.ai/api/external/v1/jobs/$JOB_ID?wait=90" \
    -H "Authorization: gcai_your_api_key_here")
  STATUS=$(echo "$RESPONSE" | jq -r '.status')
  case "$STATUS" in
    succeeded) echo "$RESPONSE" | jq '.result'; break ;;
    failed|canceled) echo "$RESPONSE" | jq '.error'; break ;;
    pending|running) ;; # the wait window elapsed, poll again
    *) echo "Unexpected response:" >&2; echo "$RESPONSE" >&2; break ;;
  esac
done

Most integrations do not need fire-and-forget at all. If a single inline call with the default wait finishes inside 90 seconds, as the large majority do, you get the result in one request and never touch the jobs endpoint. Reach for the polling flow when you expect long-running jobs, are running many in parallel, or cannot hold a connection open.

Job failures vs. request errors

There are two distinct ways an inference call can go wrong, and they surface differently. Request errors are problems with the call itself; the job never starts. These come back as standard HTTP error codes with an error body, not a job envelope:

Status	Cause
`400`	Invalid request body, or conflicting `wait` controls.
`401`	Missing or malformed API key.
`402`	Billing isn’t set up (`BILLING_NOT_CONFIGURED`), the organization hasn’t started a usage-based trial (`TRIAL_NOT_STARTED`), or the account is out of credits (`INSUFFICIENT_CREDITS`). The body carries a `code`.
`403`	Invalid API key, or the feature is not enabled for your account.
`404`	A referenced `file_id`, `playbook_id`, or `job_id` was not found.
`422`	A referenced file failed text extraction or isn’t ready yet (playbook runs). Poll `GET /files/{id}` until its status is `ready`.
`429`	Too many requests: you’ve hit a rate limit. Carries `Retry-After` and `RateLimit-*` headers; wait and retry.
`500`	An unexpected internal error before the job was created. Safe to retry; if it persists, contact support.
`503`	Service temporarily unavailable. Includes a `Retry-After` header (in seconds).

Job failures happen after the job has started and the model runs into a problem. The job reaches the terminal failed state, so the call returns 200 with a job envelope: status: "failed" and a populated error object:

{
  "job_id": "123e4567-e89b-12d3-a456-426614174000",
  "kind": "playbooks/run",
  "status": "failed",
  "result": null,
  "error": {
    "code": "extraction_failed",
    "message": "Could not extract text from one or more files."
  },
  "created_at": "2026-04-30T20:00:00.000Z",
  "completed_at": "2026-04-30T20:00:05.000Z"
}

The rule of thumb: a 4xx/5xx status means your request didn’t run; a 200 with status: "failed" means it ran and failed. A 202 is neither — the job was created and is still running, so come back for it (poll for the result). Handle all three: branch on the HTTP status first, then on the envelope status.

Create a Chat Completion: the chat/completions inference endpoint and its result shape.
Run a Playbook: the playbooks/run inference endpoint and its result shape.
Get Async Job Status: poll a job by job_id.
API Introduction: base URL, authentication, and a first request.

REST

MCP

Asynchronous Requests

Why inference is asynchronous

The job envelope

Job status values

Waiting inline vs. fire-and-forget

Polling for the result

Long-poll instead of busy-poll

A complete fire-and-forget flow

Job failures vs. request errors

​Why inference is asynchronous

​The job envelope

​Job status values

​Waiting inline vs. fire-and-forget

​Polling for the result

​Long-poll instead of busy-poll

​A complete fire-and-forget flow

​Job failures vs. request errors

​Related

Why inference is asynchronous

The job envelope

Job status values

Waiting inline vs. fire-and-forget

Polling for the result

Long-poll instead of busy-poll

A complete fire-and-forget flow

Job failures vs. request errors

Related