Alibaba's Qwen3.5-Omni Teaches Itself to Code From Video — Without Being Trained To
Alibaba has released Qwen3.5-Omni, a fully multimodal model that processes text, images, audio, and video in a single architecture. The model outperforms Gemini 3.1 Pro on audio benchmarks — and unexpectedly developed the ability to write code directly from spoken instructions and video input, a capability the training pipeline never explicitly targeted.