OpenAdapterOpenAdapter
Guides

Burst & Rate Limits

How burst capacity, 429 retries, and parallel requests work on OpenAdapter.

Special offering

Per-key customisation is unique to OpenAdapter — no other gateway ships it by default. Burst capacity scales with your plan tier.

OpenAdapter enforces quotas as rolling windows at 5h, 7d, and 30d intervals, with a short-term burst allowance on top. This page explains the model and shows the recommended client code.

How the windows work

WindowPurpose
BurstRefills within a couple of minutes. Lets you fire parallel requests without hitting 429.
SustainedAfter the burst is spent, requests flow at the plan's steady rate.
5h / 7d / 30dLong-term caps. Once any window is exhausted, requests return 429 with Retry-After.

Higher plans have larger burst and sustained allowances.

Handling 429

When the gateway throttles you, it returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 5

Honor the header. Don't aggressive-retry without backoff — you'll just compound the throttle.

JavaScript

async function callAPI(body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const res = await fetch('https://api.openadapter.in/v1/chat/completions', {
      method: 'POST',
      headers: {
        Authorization: 'Bearer sk-cv-...',
        'Content-Type': 'application/json',
      },
      body: JSON.stringify(body),
    });

    if (res.status === 429) {
      const wait = parseInt(res.headers.get('Retry-After') || '5', 10);
      await new Promise((r) => setTimeout(r, wait * 1000));
      continue;
    }

    if (!res.ok) throw new Error(`API error: ${res.status}`);
    return res.json();
  }
  throw new Error('Max retries exceeded');
}

Python

import time, requests

def call_api(body, max_retries=3):
    for _ in range(max_retries):
        res = requests.post(
            'https://api.openadapter.in/v1/chat/completions',
            headers={
                'Authorization': 'Bearer sk-cv-...',
                'Content-Type': 'application/json',
            },
            json=body,
        )
        if res.status_code == 429:
            time.sleep(int(res.headers.get('Retry-After', 5)))
            continue
        res.raise_for_status()
        return res.json()
    raise Exception("Max retries exceeded")

Parallel requests

Burst capacity is designed for fan-out workloads. Fire requests in parallel with Promise.allSettled so one rate-limited call doesn't tank the whole batch:

async function parallelCalls(prompts) {
  const results = await Promise.allSettled(
    prompts.map((prompt) =>
      callAPI({
        model: 'glm-4.7',
        messages: [{ role: 'user', content: prompt }],
        max_tokens: 500,
      }),
    ),
  );
  return results.map((r, i) => ({
    prompt: prompts[i],
    success: r.status === 'fulfilled',
    data: r.status === 'fulfilled' ? r.value : r.reason.message,
  }));
}

Best practices

  1. Use streaming for long completions — keeps connections warm.
  2. Fire parallel requests up to ~8 concurrent for typical plans.
  3. Always read Retry-After instead of guessing a sleep duration.
  4. Don't retry 4xx (auth, validation) errors. Only 429 and 5xx are retry-safe.
  5. Track per-window usage from the dashboard — you'll see when you're trending toward a hard cap.

On this page