Zhipu AI Releases GLM-4.7-Flash: A 30B-A3B MoE Model for Efficient Spatial Coding and Agents

ocopd 2026年1月21日

0 14 3 minutes read

Zhipu AI Releases GLM-4.7-Flash: A 30B-A3B MoE Model for Efficient Spatial Coding and Agents

GLM-4.7-Flash is a new member of the GLM 4.7 family and is aimed at developers who want robust coding and modeling functionality that can be used in the environment. Zhipu AI (Z.ai) describes the GLM-4.7-Flash as the 30B-A3B MoE model and presents it as the most robust model in the 30B class, designed for lightweight use where performance and efficiency are paramount.

Model category and position within the GLM family 4.7

GLM-4.7-Flash is a text generation model with 31B parameters, BF16 and F32 tensor types, and a property tag. glm4_moe_lite. It supports English and Chinese, and is optimized for chat use. GLM-4.7-Flash sits in the GLM-4.7 collection alongside the larger GLM-4.7 and GLM-4.7-FP8 models.

Iz.ai positions GLM-4.7-Flash as a free tier and lightweight implementation option relative to the full GLM-4.7 model, while still managing coding, reasoning, and general text generation tasks. This makes it attractive to developers who cannot use a 358B class model but still want a modern MoE design and robust benchmark results.

Length of buildings and context

In a Mixture of Expert architecture of this type, the model stores many parameters rather than acting on each token. That allows exceptionality for all professionals while keeping the computing power per token close to compact models.

GLM 4.7 Flash supports a context length of 128k tokens and achieves strong performance in code benchmarks among models of the same scale. This core size is suitable for large codebases, large file repositories, and long technical documents, where many existing models may require aggressive processing.

GLM-4.7-Flash uses a causal interface language and dialog template, which allows integration into existing LLM stacks with minimal changes.

Benchmark performance in the 30B class

Z.ai team compares GLM-4.7-Flash with Qwen3-30B-A3B-Thinking-2507 and GPT-OSS-20B. GLM-4.7-Flash leads or competes in all combinations of statistical, logic, long-horizon, and coding agent benchmarks.

The table above shows why the GLM-4.7-Flash is one of the most powerful models in the 30B class, at least among the models included in this comparison. The important point is that GLM-4.7-Flash is not only a unified deployment of GLM but also a model that works well in standardized coding and agent benchmarks.

Test parameters and mode of thinking

For most functions, the default settings are: temperature 1.0, top p 0.95, and new top tokens 131072. This defines an open sampling regime with a large generation budget.

On the Terminal Bench and the Validated SWE Bench, the configuration uses temp 0.7, max p 1.0, and max new tokens 16384. For τ²-Bench, the configuration uses a temperature of 0 and a maximum of 16,384 new tokens. These robust settings reduce the randomness of tasks that require constant use of tools and multi-step interactions.

The Z.ai team also recommends that you turn on the Saved Thinking Mode for many agent functions such as τ²-Bench and Terminal Bench 2. This mode saves the internal thinking sequence for every opportunity. That’s useful if you’re building agents that require long chains of function calls and maintenance.

How does GLM-4.7-Flash fit into a developer’s workflow

GLM-4.7-Flash includes several features that are compatible with agent-based, coding-oriented applications:

30B-A3B MoE architecture with 31B parameters and 128k token length.
Robust benchmark results in AIME 25, GPQA, SWE-bench Verified, τ²-Bench, and BrowseComp compared to other models in the same table.
Documented test parameters and a Saved Logic mode for multi-curve agent functions.
First-class support for vLLM, SGlang, and Transformers based on inference, ready to use commands.
A growing set of finetunes and quantizations, including MLX conversions, in the Hugging Face ecosystem.

Check out Model weight. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Michal Sutter is a data science expert with a Master of Science in Data Science from the University of Padova. With a strong foundation in statistical analysis, machine learning, and data engineering, Michal excels at turning complex data sets into actionable insights.

ocopd 2026年1月21日

0 14 3 minutes read

Zhipu AI Releases GLM-4.7-Flash: A 30B-A3B MoE Model for Efficient Spatial Coding and Agents