The Qwen Team Releases Qwen3-Coder-Next: An Open-Weight Model Language Designed Specifically for Coding and Development Agents

0 9 4 minutes read

The Qwen Team Releases Qwen3-Coder-Next: An Open-Weight Model Language Designed Specifically for Coding and Development Agents

The Qwen team recently released Qwen3-Coder-Next, an open source language model designed for coding agents and local development. It sits on the backbone of the Qwen3-Next-80B-A3B. The model uses a Mixture-of-Experts (MoE) microstructure with mixed attention. It has a total of 80B parameters, but only 3B parameters are activated per token. The goal is to match the performance of very large active models while keeping the computational cost low for long coding sessions and agent workflows.

The model is based on writing agent code, browser-based tools, and IDE copies instead of simply completing the code. Qwen3-Coder-Next is trained with a large corpus of executable tasks and reinforcement learning to program, run tools, run code, and recover from runtime failures across long horizons.

Architecture: Hybrid Attention Plus Sparse MoE

The research team describes it as a hybrid architecture that combines Gated DeltaNet, Gated Attention, and MoE.

Important stop points are:

Type: causal language model, pre-training and post-training.
Parameters: 80B total, 79B non-embedded.
Active parameters: 3B per token.
Layers: 48.
Hidden dimensions: 2048.
Structure: 12 repetitions of 3 × (Gated DeltaNet → MoE) followed by 1 × (Gated Attention → MoE).

The Gated Attention block uses 16 query heads and 2 key value heads with a head size of 256 and a rotary field embedding of size 64. The Gated DeltaNet block uses 32 linear attention heads for values and 16 for queries and keys with a head size of 128.

The MoE layer has 512 experts, with 10 experts and one shared expert active per token. Each expert uses an average size of 512. This design offers strong professional power, while the working computer sits next to a dense 3D model.

Agentic Training: Practical Activities and RL

The Qwen team describes the Qwen3-Coder-Next as ‘professionally trained at scale’ over the Qwen3-Next-80B-A3B-Base. The training pipeline uses a combination of large practical tasks, spatial communication, and reinforcement learning.

It highlights 800K proven activities with practical areas used during training. These activities provide visual cues for long-term horizon thinking, tool sequencing, test execution, and recovery from failed runs. This corresponds to a SWE-Bench style workflow instead of pure static code modeling.

Benchmarks: SWE-Bench, Terminal-Bench, and Aider

In SWE-Bench Validated using the SWE-Agent framework, Qwen3-Coder-Next scores 70.6. DeepSeek-V3.2 on 671B parameters gets 70.2, and GLM-4.7 on 358B parameters gets 74.2. In SWE-Bench Multilingual, Qwen3-Coder-Next scores 62.8, very close to DeepSeek-V3.2 at 62.3 and GLM-4.7 at 63.7. In the more challenging SWE-Bench Pro, Qwen3-Coder-Next scores 44.3, ahead of DeepSeek-V3.2 at 40.9 and GLM-4.7 at 40.6.

In Terminal-Bench 2.0 and Terminus-2 JSON scaffold, Qwen3-Coder-Next gets a score of 36.2, and also competes with major models. In the Aider benchmark, it reaches 66.2, which is close to the best models in its class.

These results support the claim from the Qwen team that Qwen3-Coder-Next achieves performance comparable to models with 10–20× more effective parameters, especially in scripting and agent settings.

Tool Usage and Agent Integration

Qwen3-Coder-Next is enabled for tool calling and integration with coding agents. The model is designed to connect to IDE and CLI environments such as Qwen-Code, Claude-Code, Cline, and other front-end areas of the agent. 256K core allows these programs to store large codes, logs, and dialogs in a single session.

Qwen3-Coder-Next only supports silent mode. Both the official model card and the Unsloth documentation emphasize that it is not productive blocks. This facilitates the integration of agents who are already making direct calls and responses without hidden thought segments.

Usage: SGlang, vLLM, and Local GGUF

For server use, the Qwen team recommends SGlang and vLLM. In SGlang, users are active sglang>=0.5.8 with --tool-call-parser qwen3_coder and a default context length of 256K tokens. In vLLM, users run vllm>=0.15.0 with --enable-auto-tool-choice and the same tool parser. Both setups are OpenAI-compatible /v1 endpoint.

For local use, Unsloth provides GGUF statistics for Qwen3-Coder-Next and full workflows for llama.cpp and llama-server. The limited 4-bit variant requires about 46 GB of RAM or integrated memory, while the 8-bit requires about 85 GB. The Unsloth guide recommends core sizes of up to 262,144 tokens, with 32,768 tokens as the practical default on smaller machines.

The Unsloth guide also shows how to connect Qwen3-Coder-Next to local agents that emulate OpenAI Codex and Claude Code. These examples rely on the llama server with an interface that works with OpenAI and also use the agent information templates while changing the model name to Qwen3-Coder-Next.

Key Takeaways

MoE structures with low-performance computing: Qwen3-Coder-Next has a total of 80B parameters in a small MoE structure, but only 3B parameters are active per token, which reduces the consideration cost while maintaining a high capacity of specialized experts.
Integrated attention stack for long-horizon coding: The model uses a hybrid architecture of Gated DeltaNet, Gated Attention, and MoE blocks over 48 layers with a hidden size of 2048, optimized for horizon thinking in coding and agent workflow.
Agent training with executable functions and RL: Qwen3-Coder-Next is trained for large executable tasks and reinforcement learning on top of Qwen3-Next-80B-A3B-Base, so it can program, run tools, run tests, and recover from failures instead of just completing short code snippets.
Competitive performance in SWE-Bench and Terminal-Bench: Benchmarks show that Qwen3-Coder-Next achieves strong scores in SWE-Bench Verified, SWE-Bench Pro, SWE-Bench Multilingual, Terminal-Bench 2.0, and Aider, often matching or outperforming larger MoE models with 10–20× more operating parameters.
Effective deployment of agents and local usage: The model supports 256K core, silent mode, APIs compatible with OpenAI with SGlang and vLLM, and GGUF standardization of llama.cpp, making it suitable for IDE agents, CLI tools, and private local copies of code under Apache-2.0.

Check it out Paper, Repo, Model weights and technical details. Also, feel free to follow us Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.