Rate Limits - GC AI

GC AI rate-limits the API in two tiers. Inference endpoints (the model-backed calls) are limited tightly. Everything else (listing, creating, and updating projects, playbooks, files, and so on) gets a far more forgiving limit.

Limits are intentionally conservative during the free beta, where the rate limit is the main guardrail on usage. They will loosen as the API moves toward general availability.

The two tiers

Tier	Endpoints	Limit (beta)
Inference	`POST /chat/completions`, `POST /playbooks/{id}/run`	~1 request / minute per organization, with a burst of 3
Everything else	All other endpoints (files, folders, projects, playbooks CRUD, profiles, …)	120 requests / minute per organization

The inference limit is a window of 3 requests per 180 seconds: you can spend a burst of 3 right away, after which it averages out to roughly one per minute.

What counts against the limit

Limits apply per organization, and within that per API key. GC AI checks both on every request, and whichever is exhausted first blocks it:

The per-organization limit is the binding ceiling. It is shared across every key the organization holds, so minting extra keys does not raise your total throughput.
The per-key limit keeps one integration from monopolizing the organization’s budget. During the beta it matches the org limit; as limits rise toward GA it becomes the per-integration sub-limit.

When you exceed a limit

A throttled request returns 429 Too Many Requests:

{
  "error": "Rate limit exceeded",
  "code": "RATE_LIMITED"
}

It also carries headers describing the limit and when to retry:

Header	Meaning
`Retry-After`	Seconds to wait before retrying.
`RateLimit-Limit`	The request quota for the window that was hit.
`RateLimit-Remaining`	Requests remaining in the current window.
`RateLimit-Reset`	Seconds until the quota resets.

Honor Retry-After: wait that many seconds before sending the next request. For batch workloads, use fire-and-forget and pace your enqueues rather than retrying in a tight loop.

Asynchronous Requests: inference calls are jobs; pace batches with fire-and-forget.
API Introduction: base URL, authentication, and a first request.

Multi-turn Conversations Contract Risk Heatmap

⌘I

​The two tiers

​What counts against the limit

​When you exceed a limit

​Related

The two tiers

What counts against the limit

When you exceed a limit

Related