Z.ai latest release: most recent flagship summary
A durable reference on the Z.ai latest release pattern — how new flagship generations are structured, what benchmark deltas typically accompany a launch, which parameter sizes ship at release, and how to track the release cadence without tying your workflow to a specific version name.
GLM-4.5+
Current flagship
128K
Context at launch
100B+
Flagship parameters
6B–9B
Open-weight companion
Vital Points
Z.ai flagship releases follow a consistent pattern: a hosted 100B+ checkpoint with an expanded context window, a companion open-weight build at the smaller parameter class, and benchmark comparisons against the prior generation on MMLU, HumanEval, and GSM8K. The current flagship is the GLM-4.5+ generation. This page is structured around generation patterns rather than pinned version names, so it remains accurate across the release cycle.
How Z.ai flagship releases are structured
Understanding the release pattern — what ships together, what comes first, and how to read a model card to assess whether an upgrade changes anything for your specific workload.
Every Z.ai flagship release follows a recognisable structure that has remained consistent across the GLM generations. The centrepiece is always the hosted flagship checkpoint: a large parameter model that updates the quality ceiling on the BigModel API. Alongside that, an open-weight companion at the smaller parameter class — typically 6B or 9B — ships on Hugging Face, usually within the same release window or close after. The companion build is not a cut-down version of the flagship; it is independently trained at its parameter class but shares the same pretraining corpus and instruction tuning pipeline, which is why it responds well to the same prompt patterns despite the parameter gap.
The release announcement typically includes three categories of benchmark comparison: reasoning (MMLU and variants), code (HumanEval and MBPP), and mathematics (GSM8K and MATH). Each category shows the new generation's score alongside the prior generation and, where available, a comparison to a competitive external model at a similar parameter class. Reading these comparisons requires care: the benchmarks are run by Zhipu AI on their own infrastructure, using their own prompt templates, which produces results that are not always directly comparable to numbers from external evaluation suites that use different templates or sampling parameters.
The honest read of a Z.ai release benchmark table is therefore: the relative delta between the new generation and the prior generation is informative, because both are evaluated under the same conditions. The absolute number relative to an external model requires independent verification. Teams that need to compare Z.ai against alternatives on their specific workload are better served by running their own evaluation on a representative sample than by reading the release benchmark table at face value. The benchmarks reference page on this site covers the public third-party evaluation data that is available for the GLM family.
Benchmark deltas at flagship launch
What the pattern of benchmark improvements looks like across Z.ai flagship generations, and which task types have seen the most consistent gains.
Looking across the recent Z.ai flagship generations, the benchmark delta pattern has been fairly consistent. MMLU improvements have been moderate — typically in the range of 2–5 percentage points at the flagship scale — because the prior generations were already scoring in the high-70s to low-80s range where headroom is limited. GSM8K improvements have been more pronounced: mathematical reasoning has been a clear focus of the instruction tuning pipeline improvements, and gains of 5–10 percentage points between successive generations are common at the flagship parameter class. HumanEval improvements have tracked similarly to GSM8K, reflecting the same instruction-following improvements that benefit multi-step reasoning tasks.
Context window expansion is the other axis that reliably improves with each flagship generation. The jump from 2K to 32K between the first and second ChatGLM generations was the most dramatic expansion; the subsequent move to 128K was meaningful but less transformative in practice because 32K already covers the majority of real-world single-session use cases. Future generations may push the context ceiling further, but the practical benefit of each increment diminishes as the window grows — most production workloads do not actually need more than 32K, and those that do are already at 128K.
Multilingual benchmark improvements tend to be the least visible in the headline numbers but the most appreciated in production. The multilingual evaluations that Zhipu AI runs internally show consistent improvement across the non-primary language tiers with each generation, but these numbers are not always included in the public release comparison tables. Teams building multilingual products are advised to run their own language-specific evaluation at each new flagship generation rather than assuming the English-language benchmark delta accurately predicts the improvement in their target language.
Parameter sizes shipped at Z.ai release
The consistent multi-size release philosophy at Z.ai, and how the open-weight companion build relates to the hosted flagship at each generation launch.
Zhipu AI's multi-size release philosophy means that each flagship generation ships simultaneously across several parameter classes rather than sequentially. This approach serves different audiences: the research and developer community gets the open-weight companion build immediately, while enterprise teams get the hosted flagship that requires no local infrastructure. The simultaneous release avoids the situation — common with some Western labs — where the open-weight build arrives months after the hosted API version and reflects an older checkpoint.
The parameter sizes that ship at a typical Z.ai generation launch are: a small open-weight build in the 6B–9B range for consumer hardware, occasionally a mid-size build in the 32B range for server deployments, and the hosted flagship at 100B+ parameters. Not every generation ships all three simultaneously; the small and large sizes are the most consistent pairing, with the mid-size build sometimes following after the initial release window. Developers planning production deployments should verify whether the mid-size class is available at launch or expected shortly after before committing to an architecture that depends on it.
"Tracking Zhipu AI releases by monitoring the Hugging Face model cards rather than news announcements has served us well. The model card is usually updated the same day the API becomes available, and it contains the benchmark comparison table that helps us decide whether to prioritise evaluation of the new checkpoint."
DevRel Specialist · Tessera Glade Studios · Asheville, NC
How to track Z.ai releases without chasing version numbers
A practical approach to staying current with the Z.ai release cadence without tying your documentation, prompts, or infrastructure to a specific version name that will be superseded.
The most common mistake teams make when tracking a model family with an aggressive release cadence is pinning their internal documentation and evaluation benchmarks to specific version numbers. A reference that says "we use GLM-4 version X" becomes stale the moment a new generation ships, and updating it requires re-running evaluations rather than simply reading a delta comparison. The more durable pattern is to track generation-level capabilities — context window, instruction-following quality, code benchmark class — and update your internal evaluation when a new generation ships, rather than tracking the version identifier itself.
For developers building applications on the BigModel API, the same principle applies to API integration. The OpenAI-compatible contract that the BigModel API uses has remained stable across generations; the model identifier in the API call is the only field that changes when a new flagship ships. Structuring your application so the model identifier is a configuration value rather than a hardcoded string means a generation upgrade is a one-line configuration change rather than a code change. That pattern is worth implementing from the start, because the Z.ai release cadence virtually guarantees that a new flagship will arrive before your application reaches end of life.
| Dimension | Prior generation (GLM-4) | Latest generation (GLM-4.5+) | Delta direction | Notes |
|---|---|---|---|---|
| Context window | 32K–128K (long-context variant) | 128K standard | Standardised upward | Long-context no longer a separate variant; 128K is default |
| MMLU class | High-70s to low-80s range | Low-to-mid-80s range | Modest improvement | Headroom limited at high scores; delta narrows with each generation |
| GSM8K class | Mid-80s range | Upper-80s to low-90s range | Pronounced improvement | Mathematical reasoning a clear focus of instruction tuning pipeline |
| HumanEval class | Mid-70s range | Upper-70s to low-80s range | Moderate improvement | Code branch shows larger gains; general model benefits from reasoning improvements |
| Open-weight companion | ChatGLM3-6B | ChatGLM4-9B | Parameter size increase | 9B open-weight build competitive with prior-generation mid-size models |
Z.ai latest release: frequently asked questions
Four questions covering the current flagship, release cadence, benchmark patterns, and the open-weight companion build.
What is the Z.ai latest release?
The Z.ai latest release as of the current reference period is the GLM-4.5+ generation, the flagship line that succeeded the GLM-4 checkpoint. It ships at 100B+ parameters through the hosted BigModel API with a 128K token context window as the standard. Open-weight companion builds in the ChatGLM4 lineage ship at smaller parameter classes alongside the hosted flagship.
Does Z.ai release open-weight models alongside the hosted flagship?
Yes. Each flagship generation is accompanied by open-weight builds at the smaller parameter classes — typically a 6B or 9B build published on Hugging Face as part of the ChatGLM lineage. The open-weight builds share the same base architecture and instruction tuning approach as the hosted flagship, so prompts transfer without structural changes. The largest parameter class remains hosted-only through the BigModel API.
How often does Z.ai release new models?
Zhipu AI operates on an aggressive release cadence, typically shipping a new flagship generation or a significant variant update multiple times per year. The cadence is faster than annual-release patterns of some Western labs and closer to the quarterly cycle that characterises the most active open-weight teams. This site covers generation patterns rather than specific version numbers precisely because individual version names age quickly.
How should I track Z.ai model releases?
The most reliable tracking pattern is to monitor the Zhipu AI GitHub organisation model cards and the Hugging Face repository pages rather than any single news source. The model cards are updated at or before API availability and include the key benchmark comparisons and context window changes. For application developers, structuring the model identifier as a configuration value rather than a hardcoded string means a generation upgrade requires only a configuration change.
What benchmark improvements does a Z.ai flagship release typically show?
Each Z.ai flagship release has historically shown measurable gains on MMLU, HumanEval, and GSM8K relative to the prior generation. The gains are most pronounced on GSM8K mathematical reasoning (5–10 percentage points at flagship scale) and harder HumanEval problems requiring multi-function composition. MMLU improvements are more modest because prior generations already scored in the high-70s to low-80s range where headroom is limited.
Tracking the Z.ai latest release in context
The latest release page gives the generation-level summary; the model-specific and benchmark pages give the deeper dive.
The latest release summary here is deliberately structured around generation patterns rather than pinned version names. For the deeper dive on what each generation changed, the GLM AI model page covers the full parameter sweep and architectural trajectory. The benchmarks page documents the public third-party evaluation data so the delta comparisons here can be cross-referenced against independent numbers. For the open-weight companion builds that ship alongside each flagship, the ChatGLM page covers the download lineage and local inference workflow. The API reference explains how to make the model identifier a configuration value in your BigModel platform integration so that generation upgrades never require a code change. Teams looking at the broader AI model portfolio will find variant selection guidance there.