Embeddings
Create vector embeddings for search, RAG, clustering, and semantic similarity. The gateway exposes a standard **OpenAI Embeddings** API at `POST /v1/embeddings`
Base URL & authentication
Base URL: https://api.openadapter.in
All requests need an Authorization: Bearer sk-cv-... header. Generate or copy your API key from the Dashboard → API Keys page.
Create vector embeddings for search, RAG, clustering, and semantic similarity. The gateway exposes a standard OpenAI Embeddings API at POST /v1/embeddings. Use your normal OpenAdapter API key; requests count toward your plan quota like chat completions.
POST https://api.openadapter.in/v1/embeddingsDiscover models
Call GET /v1/models and use entries where model_type is "embedding". Your plan may restrict which models appear (for example, curated lists on specific plans).
Aliases are case-sensitive — use the exact id from the models list.
Some aliases (for example qwen3-embedding) may show model_type: "chat" in the models list for historical routing reasons but still work with POST /v1/embeddings. If the embeddings call succeeds, the alias is valid.
Request body (OpenAI-compatible)
| Field | Required | Notes |
|---|---|---|
model | Yes | Embedding alias, e.g. qwen3-embedding-small |
input | Yes | A string or an array of strings to embed in one call |
encoding_format | No | float (default) |
dimensions | No | Reduces vector size only if the upstream model supports it |
user | No | Optional end-user id for logging (string) |
The response shape matches OpenAI: object, data[] with embedding, index, model, and usually usage with total_tokens.
curl https://api.openadapter.in/v1/embeddings \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-embedding-small",
"input": ["hello world", "second line"]
}'from openai import OpenAI
client = OpenAI(
api_key="$API_KEY",
base_url="https://api.openadapter.in/v1",
)
resp = client.embeddings.create(
model="qwen3-embedding-small",
input=["Hello from OpenAdapter", "Batch item two"],
)
for item in resp.data:
print(len(item.embedding), "dimensions")import OpenAI from 'openai';
const client = new OpenAI({
apiKey: '$API_KEY',
baseURL: 'https://api.openadapter.in/v1',
});
const resp = await client.embeddings.create({
model: 'jina-embeddings-v5',
input: 'Single string input is also valid.',
});
console.log(resp.data[0].embedding.length);Choosing a model
Use one model per index or vector store — mixing embeddings from different models breaks similarity search.
| Model alias | Role | Vector size (typical) | Notes |
|---|---|---|---|
qwen3-embedding-small | General text (default workhorse) | 1024 | Alias for Qwen3 0.6B embedding on OpenAdapter Edge |
Qwen3-Embedding-0.6B | Same family as above | 1024 | Alternative display name for tooling that expects this id |
qwen3-embedding | Same stack | 1024 | Convenience alias |
jina-embeddings-v5 | Strong multilingual / general retrieval | 1024 | Hosted on OpenAdapter Edge (Jina v5 text small) |
bge-m3 | Dense retrieval, popular for RAG | 1024 | OpenAdapter Edge (BAAI bge-m3) |
nv-embedqa-e5-v5 | NVIDIA embedding model | 1024 | Routed via NVIDIA |
Sizes above are what the gateway returns today; always verify with a test call if you rely on exact dimensionality for your DB schema.
RAG and the dashboard
The Vector DB page in the dashboard uses embeddings for ingest and query. Pick an embedding model in the collection settings and keep it consistent for that collection.
Limits and behavior
- Quota: Each call consumes request quota (and token-based accounting where applicable), similar to other
/v1routes. - Rate limits: Your plan RPM and per-model tier limits apply unless your plan explicitly disables them.
- Routing: The gateway picks a healthy provider key for the alias; you do not pass provider credentials.
- Input size: Very long inputs may be truncated or rejected by the upstream model — stay within reasonable chunk sizes for your use case.
If model is not allowed for your plan, the API returns an error listing allowed models when your plan uses model restrictions.
Speech to Speech (LiveKit)
Wire OpenAdapter's STT (Parakeet), LLM, and TTS into a LiveKit voice agent for real-time, low-latency voice interaction.
Tool Calls / Function Calling
Let the model call functions you define. The gateway handles format conversion between providers — define tools once in OpenAI format and it works everywhere.