Zhipu AI pricing: API tiers and per-token cost classes

A reference breakdown of zhipu ai pricing on the BigModel platform — how per-token billing works, which cost class applies to each model tier, how the free trial credit is structured, and how to estimate your monthly API spend before committing to a workload.

Per-token

Billing model

5 tiers

Cost classes

Free credit

Trial included

CNY base

Billing currency

Reader Brief

Zhipu AI pricing is usage-based and per-token. Input tokens are cheaper than output tokens at every tier. The free trial credit activates automatically at account creation. The flagship GLM-4.5+ tier costs more than the small or mid variants, but for most workloads the mid-tier represents the practical performance-per-cost optimum.

How Zhipu AI pricing is structured

The BigModel platform meters every API call at the token level, with separate input and output rates per model tier and a free trial credit applied at sign-up.

The zhipu ai pricing model on the BigModel platform is usage-based and billed per token. Every request to a GLM model consumes input tokens (the prompt plus any system message) and output tokens (the generated response). These two token classes are billed at different rates; output is typically priced higher because generation requires more compute than prefill. The billing granularity is fine enough that even very small requests appear on the usage dashboard within minutes of being made.

The cost varies across the model catalog. The small-tier variants carry the lowest rate and suit high-volume workloads where quality requirements are moderate — classification, entity extraction, short summarisation. The mid-tier covers the majority of production use cases at a rate that most teams find acceptable for everyday assistant and document-processing workloads. The flagship tier, covering the latest GLM-4.5+ generation, carries a higher rate that is justified for tasks where quality gains translate directly to user or product outcomes.

Beyond the standard chat-completions billing, the platform has separate rates for the code-specialised and multimodal variants. The code variants are typically priced at a rate close to the mid-tier chat models; multimodal calls that include image tokens are priced at a different schedule that accounts for the vision processing step.

Free trial credit and account activation

Every new BigModel account receives a free trial credit that covers meaningful evaluation volume without requiring a payment method upfront.

The free trial credit is applied automatically when an account is fully verified — email confirmation plus phone SMS. No payment method is needed to activate it. The credit amount is denominated in CNY tokens and is sufficient for a thorough evaluation run across the model catalog: testing the small, mid, and flagship tiers with the same set of prompts, running a few hundred requests through the code variants, and exploring the multimodal surface with a batch of sample images.

Once the trial credit is exhausted, the console surfaces a payment prompt. The platform accepts major international credit cards — Visa, Mastercard, and similar networks — as well as domestic Chinese payment methods. The billing cycle is monthly, and the dashboard shows a running tally of token spend in real time so a team can see exactly when the trial is approaching exhaustion and plan a payment method before it runs out.

Cost classes by model tier

Five model tiers on the BigModel platform each carry distinct input and output token cost classes.

The table below maps each model tier to its relative cost class. The figures here are reference classes rather than live prices — the BigModel platform console shows the current numeric rates, which the Zhipu AI team adjusts periodically. The relative ordering (small cheapest, multimodal highest per image token) has been consistent across recent pricing revisions.

Zhipu AI pricing — model tier vs. input and output token cost classes
Model tierInput cost classOutput cost class
Small (GLM-4-flash, GLM-4-air)Lowest — suitable for high-volume, latency-sensitive callsLow — still below mid-tier output even at volume
Mid (GLM-4, GLM-4-plus)Moderate — the practical optimum for most production workloadsModerate — covers the bulk of assistant and document tasks
Flagship (GLM-4.5+)Higher — justified where quality uplift is measurable in outcomesHighest in the text-only range — significant for long generations
Code (GLM-Coder variants)Close to mid-tier — code-specialised but not flagship pricedClose to mid-tier — output length for code completions tends to be shorter
Multimodal (GLM-4V)Image tokens billed separately at a higher rate than textText output billed at mid-tier equivalent; image token ingestion is the main cost driver

Payment methods and billing currency

The BigModel platform bills in CNY at the underlying tier and converts for international cards at the gateway.

Domestic Chinese accounts pay in CNY directly. International accounts pay through a gateway that converts from CNY at the prevailing rate at the time of billing. For procurement teams that need a hard currency budget, the practical approach is to pre-load a CNY credit balance in an amount that represents your expected monthly spend, rather than relying on per-month gateway conversion. Most teams that adopt this approach find their finance team's internal cost-center reconciliation aligns more cleanly with a pre-loaded balance than with a variable monthly card charge in a foreign currency.

The platform does not currently offer contractual USD or EUR invoicing for standard accounts. Enterprise accounts with high volume can negotiate billing terms through the BigModel team directly. For evaluation purposes and moderate-volume production, the gateway conversion path is the standard route and works reliably for most international cards. Budget guidance from ai.gov on responsible AI procurement is a useful starting point for teams working through an internal approval process for ongoing API spend.

Estimating your monthly spend

A simple three-variable formula covers the spend estimate for any GLM API workload.

The three variables are: average request volume per day, average token count per request (input plus output combined), and the per-token rate for your model tier. Multiply them together and scale to a month. For a team sending 500 requests per day with an average of 800 tokens per request on the mid-tier, the daily token volume is 400,000 tokens, which at the mid-tier cost class resolves to a predictable monthly figure that most small teams find budget-friendly. The flagship tier at the same volume would be roughly two to three times higher.

The most common calibration mistake is underestimating output length. Prompts that produce long-form outputs — summaries of long documents, multi-step reasoning traces, or code generation with explanation — can consume three to five times more output tokens than the prompt itself. Running a batch of 100 representative requests through the API and checking the token counts in the BigModel dashboard is a faster path to an accurate estimate than modelling from word counts alone.

Zhipu AI pricing frequently asked questions

Five questions on billing structure, cost tiers, payment methods, the free trial, and spend estimation.

How does Zhipu AI pricing work?

Zhipu AI pricing on the BigModel platform is per-token and usage-based. Input tokens and output tokens are billed at separate rates, with output typically priced higher. A free trial credit is applied at account creation without requiring a payment method upfront.

Is there a free tier for the Zhipu AI API?

Every new BigModel account receives a free trial credit that activates automatically after full verification. Once the trial credit is exhausted, a payment method is required to continue generating tokens through the API.

Which Zhipu AI model tier is cheapest?

The small-tier GLM variants — GLM-4-flash and GLM-4-air — carry the lowest per-token cost class. They suit classification, summarisation, and shorter-context workloads where the quality difference versus the flagship tier is not material to the product outcome.

How do I estimate my monthly Zhipu AI API spend?

Multiply your expected daily request volume by the average token count per request, then by the per-token rate for your chosen model tier. Run 100 representative requests first to calibrate average token length — output tokens are often longer than expected and dominate the cost for generation-heavy workloads.

What payment methods does BigModel accept?

The BigModel platform accepts major international credit-card networks alongside domestic Chinese payment options. Billing is denominated in CNY at the underlying tier; international cards are processed through a gateway that converts at the prevailing rate. Pre-loading a credit balance in CNY is the recommended approach for teams that need predictable foreign-currency spend.

Zhipu AI pricing in the broader platform context

How the pricing reference connects to adjacent pages on billing, model selection, and API access.

Understanding zhipu ai pricing is most useful when read alongside the BigModel AI platform reference, which explains how the billing dashboard surfaces usage data, and the API reference, which shows how token counts are reported in the response object. Choosing the right model tier for a workload is the single highest-leverage pricing decision — the GLM model family overview maps capability to tier so the trade-off is visible before the first paid call. Teams who want to reduce spend through self-hosted inference can consult the zhipuai download reference for the open-weight path. The Z.ai vs ChatGPT comparison puts zhipu ai pricing in a competitive context for teams evaluating multiple providers simultaneously.