Skip to main content
GC AI rate-limits the API in two tiers. Inference endpoints (the model-backed calls) are limited tightly. Everything else (listing, creating, and updating projects, playbooks, files, and so on) gets a far more forgiving limit.
Limits are intentionally conservative during the free beta, where the rate limit is the main guardrail on usage. They will loosen as the API moves toward general availability.

The two tiers

TierEndpointsLimit (beta)
InferencePOST /chat/completions, POST /playbooks/{id}/run~1 request / minute per organization, with a burst of 3
Everything elseAll other endpoints (files, folders, projects, playbooks CRUD, profiles, …)120 requests / minute per organization
The inference limit is a window of 3 requests per 180 seconds: you can spend a burst of 3 right away, after which it averages out to roughly one per minute.

What counts against the limit

Limits apply per organization, and within that per API key. GC AI checks both on every request, and whichever is exhausted first blocks it:
  • The per-organization limit is the binding ceiling. It is shared across every key the organization holds, so minting extra keys does not raise your total throughput.
  • The per-key limit keeps one integration from monopolizing the organization’s budget. During the beta it matches the org limit; as limits rise toward GA it becomes the per-integration sub-limit.

When you exceed a limit

A throttled request returns 429 Too Many Requests:
{
  "error": "Rate limit exceeded",
  "code": "RATE_LIMITED"
}
It also carries headers describing the limit and when to retry:
HeaderMeaning
Retry-AfterSeconds to wait before retrying.
RateLimit-LimitThe request quota for the window that was hit.
RateLimit-RemainingRequests remaining in the current window.
RateLimit-ResetSeconds until the quota resets.
Honor Retry-After: wait that many seconds before sending the next request. For batch workloads, use fire-and-forget and pace your enqueues rather than retrying in a tight loop.