Zhipuai API: hosted inference and integration patterns
A technical reference for the Zhipuai API — covering the OpenAI-compatible contract, base_url override, Python and Node examples, rate-limit ranges, error codes, and billing tiers.
OAI
Compatible
128K
Max context
REST
Protocol
SSE
Streaming
The Zhipuai API contract
The Zhipuai API mirrors the OpenAI chat-completions contract closely enough that most existing integrations require only a base URL change and a key swap to begin working.
The Zhipuai API is the hosted inference endpoint for the GLM model family, managed through the BigModel open platform. At the chat-completions level, the request and response shapes are OpenAI-compatible: the same message array format, the same role labels (system, user, assistant), the same streaming flag, and the same response envelope. This compatibility is intentional — the Zhipu AI team designed the API surface to minimise switching friction for teams already operating against the OpenAI endpoint.
The practical implication is that any code using the openai Python package or the openai Node package can be redirected to the Zhipuai API with two changes: base_url set to the BigModel endpoint and api_key set to a key obtained from the BigModel console. All method calls, parameter names, response parsing, and error handling patterns remain identical for the basic chat-completions use case. Teams that have built wrapper layers around the OpenAI SDK for logging, retry logic, or prompt templating typically find that the wrapper continues to work without modification.
Top Considerations
The Zhipuai API key is distinct from any OpenAI key — generate it in the BigModel console under API Keys, not through any other provider. Keep the key in an environment variable, never in source code. The free trial credit is applied automatically to new accounts; no payment method is required to test the API, but rate limits on the trial tier are conservative enough that you will hit them quickly on any non-trivial test suite.
Base URL and SDK configuration
A one-line change to base_url is all that most existing OpenAI SDK integrations need to start hitting the Zhipuai API endpoint.
The BigModel endpoint base URL follows the pattern https://open.bigmodel.cn/api/paas/v4/ for the current API version. The trailing slash is required by the SDK's path construction; omitting it produces a malformed endpoint URL on some SDK versions. The Python and Node SDK snippets below illustrate the complete configuration for chat completions. Note that these snippets use placeholder key strings — substitute your actual key from the BigModel console.
Python example
from openai import OpenAI
client = OpenAI(
api_key="YOUR_ZHIPUAI_KEY_HERE",
base_url="https://open.bigmodel.cn/api/paas/v4/"
)
response = client.chat.completions.create(
model="glm-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain rate limiting in one paragraph."}
],
temperature=0.7,
max_tokens=512
)
print(response.choices[0].message.content)
Node.js example
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.ZHIPUAI_KEY,
baseURL: "https://open.bigmodel.cn/api/paas/v4/"
});
const response = await client.chat.completions.create({
model: "glm-4",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain streaming in one paragraph." }
],
stream: false
});
console.log(response.choices[0].message.content);
Rate limits
Rate limits on the Zhipuai API are applied at the request-per-minute and token-per-minute level, with values that scale across billing tiers.
The Zhipuai API enforces two parallel rate limit dimensions: requests per minute (RPM) and tokens per minute (TPM). The free trial tier applies conservative limits on both axes — enough for feature testing and small prompt evaluations, but not enough for batch workloads or concurrent multi-user applications. Paid tiers raise both limits substantially; the current values for each tier are published in the BigModel console and are also readable from the X-RateLimit-Requests-Limit, X-RateLimit-Requests-Remaining, and X-RateLimit-Requests-Reset response headers on each API call. Implementing header-driven back-off in your client is more reliable than hardcoding sleep intervals, because tier limits can change without notice.
When a rate limit is exceeded, the Zhipuai API returns HTTP 429 with a structured error body. The retry_after field in the error response specifies the number of seconds to wait before the next attempt is likely to succeed. Clients should respect this value rather than retrying immediately, which would keep triggering 429 responses and extend the effective blackout window.
Error codes
The Zhipuai API error taxonomy follows the OpenAI convention closely, making it straightforward to handle in existing error-handling middleware.
The Zhipuai API returns standard HTTP status codes with structured JSON error bodies. The error object in the response body includes a code string, a message string, and a type string. Common codes include invalid_api_key (401), model_not_found (404), rate_limit_exceeded (429), and internal_server_error (500). The 500 class errors are transient — a short exponential back-off with two or three retries resolves the majority of them without human intervention.
Billing tiers
The Zhipuai API bills per token at rates that vary by model variant — the pricing page on this site documents the current tiers.
Zhipuai API billing is per token, with separate input and output token rates for each model variant. The general-purpose GLM-4.5+ tier is priced higher than the smaller GLM variants; the code-specialised variant sits between them. The BigModel console shows a live usage dashboard with daily and monthly breakdowns by model and project. Spend alerts can be configured to notify at a threshold before the hard limit is reached. The Z.ai pricing page on this site walks through the per-token tiers for each model class in more detail.
Guidance from AI.gov on responsible AI procurement is a useful reference for teams formalising their vendor evaluation and API usage governance processes before committing to production.
Zhipuai API key parameters
Six parameters cover the majority of practical API usage — a quick reference for parameter name, type, default value, and notes.
| Parameter | Type | Default | Notes |
|---|---|---|---|
model | string | required | GLM model ID (e.g. "glm-4"). Must match a model available on your account tier. |
messages | array | required | Array of role/content objects. Supports system, user, assistant roles in the same format as OpenAI. |
temperature | float | 0.95 | Range 0.0–1.0. Lower values produce more deterministic output; higher values increase variety. |
max_tokens | integer | 1024 | Output token cap. Does not affect input token billing. Set explicitly to avoid unexpectedly long responses. |
stream | boolean | false | Set to true for SSE streaming. Client must handle the data: prefixed event lines and the [DONE] sentinel. |
top_p | float | 0.7 | Nucleus sampling threshold. Use either temperature or top_p, not both, for predictable generation behaviour. |
Practitioner note
"Migrating our internal toolchain from the OpenAI endpoint to the Zhipuai API took an afternoon. The base_url swap was the entire change; the rest of the integration layer — retry logic, response parsing, prompt templates — ran without a single modification."
Zhipuai API — frequently asked questions
Five questions across three tabs covering the API contract, rate limits, and billing.
What is the Zhipuai API?
The Zhipuai API is the hosted inference endpoint for the GLM model family, managed through the BigModel open platform. It is OpenAI-compatible at the chat-completions level, meaning an existing OpenAI SDK can be redirected to it with a base_url change and a new API key — no other code changes required for basic usage.
How do I switch from the OpenAI API to the Zhipuai API?
Set base_url to the BigModel endpoint URL and replace your OpenAI API key with a Zhipuai API key from the BigModel console. All method calls, parameter names, response parsing, and error handling patterns remain identical for the basic chat-completions use case. Most wrapper layers and middleware continue to work without modification.
What rate limits does the Zhipuai API enforce?
Rate limits vary by tier across requests per minute and tokens per minute. The free trial tier applies conservative limits sufficient for feature testing. Paid tiers raise both limits substantially. Current values are in the BigModel console and in the X-RateLimit-* response headers on every API call. Implement header-driven back-off rather than hardcoded sleep intervals.
What error codes does the Zhipuai API return?
Standard HTTP status codes aligned with the OpenAI error taxonomy: 400 for malformed requests, 401 for invalid credentials, 429 for rate limit exceeded, 500 for upstream model errors. The response body includes a structured error object with code, message, and type fields. The 500 class is transient and resolves with short exponential back-off retries.
How is Zhipuai API billing structured?
The Zhipuai API meters per token with separate input and output rates that vary by model variant. The BigModel console shows a live usage dashboard with daily and monthly breakdowns. Spend alerts can be configured before the hard limit is reached. A free trial credit applies to new accounts without requiring a payment method upfront. The pricing page documents current per-token rates by model class.
How the Zhipuai API connects to the broader Z.ai ecosystem
The API is one of three main access paths into the GLM model family — knowing how it relates to the other surfaces prevents duplication and configuration confusion.
The Zhipuai API sits at the centre of the developer-facing Z.ai surface. It is the programmatic layer that powers the chat Z AI browser interface and the Z AI chatbot integration patterns — both of those surfaces call the same underlying model through the same BigModel infrastructure. Keys are managed through the Zhipu AI open platform console, and the overall billing and project dashboard lives there as well. For teams arriving from the Zhipu AI chat product interface and wanting to automate their workflows, the API is the natural next step. Account setup and key generation require a Zhipuai login, which accepts international email registration in most regions. The pricing reference on this site documents the current per-token rates across the full model catalog.