Qwen3 was apparently posted early, then quickly pulled from HuggingFace and Modelscope. The large ones are MoEs, per screenshots from Reddit:
Including a 235B/22B active and a 30B/3B active.
Context appears to ‘only’ be 32K unfortunately: https://huggingface.co/qingy2024/Qwen3-0.6B/blob/main/config_4b.json
But its possible they’re still training them to 256K:
Take it all with a grain of salt, configs could change with the official release, but it appears it is happening today.
You must log in or register to comment.
Seems that there are both dense and sparse models with this launch, like the 1.5 release. This “leak” (for instance) references what appears to be a real Qwen3 32B: