Burst & Rate Limits
How burst capacity, 429 retries, and parallel requests work on OpenAdapter.
Special offering
Per-key customisation is unique to OpenAdapter — no other gateway ships it by default. Burst capacity scales with your plan tier.
OpenAdapter enforces quotas as rolling windows at 5h, 7d, and 30d intervals, with a short-term burst allowance on top. This page explains the model and shows the recommended client code.
How the windows work
| Window | Purpose |
|---|---|
| Burst | Refills within a couple of minutes. Lets you fire parallel requests without hitting 429. |
| Sustained | After the burst is spent, requests flow at the plan's steady rate. |
| 5h / 7d / 30d | Long-term caps. Once any window is exhausted, requests return 429 with Retry-After. |
Higher plans have larger burst and sustained allowances.
Handling 429
When the gateway throttles you, it returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 5Honor the header. Don't aggressive-retry without backoff — you'll just compound the throttle.
JavaScript
async function callAPI(body, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const res = await fetch('https://api.openadapter.in/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer sk-cv-...',
'Content-Type': 'application/json',
},
body: JSON.stringify(body),
});
if (res.status === 429) {
const wait = parseInt(res.headers.get('Retry-After') || '5', 10);
await new Promise((r) => setTimeout(r, wait * 1000));
continue;
}
if (!res.ok) throw new Error(`API error: ${res.status}`);
return res.json();
}
throw new Error('Max retries exceeded');
}Python
import time, requests
def call_api(body, max_retries=3):
for _ in range(max_retries):
res = requests.post(
'https://api.openadapter.in/v1/chat/completions',
headers={
'Authorization': 'Bearer sk-cv-...',
'Content-Type': 'application/json',
},
json=body,
)
if res.status_code == 429:
time.sleep(int(res.headers.get('Retry-After', 5)))
continue
res.raise_for_status()
return res.json()
raise Exception("Max retries exceeded")Parallel requests
Burst capacity is designed for fan-out workloads. Fire requests in parallel with Promise.allSettled so one rate-limited call doesn't tank the whole batch:
async function parallelCalls(prompts) {
const results = await Promise.allSettled(
prompts.map((prompt) =>
callAPI({
model: 'glm-4.7',
messages: [{ role: 'user', content: prompt }],
max_tokens: 500,
}),
),
);
return results.map((r, i) => ({
prompt: prompts[i],
success: r.status === 'fulfilled',
data: r.status === 'fulfilled' ? r.value : r.reason.message,
}));
}Best practices
- Use streaming for long completions — keeps connections warm.
- Fire parallel requests up to ~8 concurrent for typical plans.
- Always read
Retry-Afterinstead of guessing a sleep duration. - Don't retry 4xx (auth, validation) errors. Only 429 and 5xx are retry-safe.
- Track per-window usage from the dashboard — you'll see when you're trending toward a hard cap.