Speech AI

6 articles tagged "Speech AI"

Alibaba's Qwen3.5-Omni Teaches Itself to Code From Video — Without Being Trained To

Alibaba has released Qwen3.5-Omni, a fully multimodal model that processes text, images, audio, and video in a single architecture. The model outperforms Gemini 3.1 Pro on audio benchmarks — and unexpectedly developed the ability to write code directly from spoken instructions and video input, a capability the training pipeline never explicitly targeted.

D.O.T.S AI NewsroomApr 2, 20263 min read

Tools

ServiceNow's EVA Framework Exposes a Hidden Tradeoff: Voice AI Systems That Are Accurate Are Often Unpleasant to Talk To

ServiceNow AI has released EVA (Evaluating Voice Agents), the first end-to-end benchmark to jointly score voice agents on both task accuracy and conversational experience — and its initial results across 20 systems reveal a troubling pattern: the architectures that complete tasks reliably tend to deliver worse conversations, and vice versa.

D.O.T.S AI NewsroomApr 2, 2026

Breaking

Cohere Releases 'Transcribe': Open-Source Speech Model That Beats Whisper on Every Benchmark

Cohere has released Transcribe, a 2-billion parameter open-source automatic speech recognition model that tops the Hugging Face Open ASR Leaderboard with a 5.4% word error rate, outperforming OpenAI's Whisper Large v3 and ElevenLabs' Scribe v2. Licensed under Apache 2.0 and available on Hugging Face.

D.O.T.S AI NewsroomMar 29, 2026

Breaking

Mistral's Voxtral Can Clone Any Voice in 3 Seconds — and It's Fully Open Weight

Mistral has released Voxtral, its first open-weight text-to-speech model, capable of cloning a speaker's voice from just three seconds of audio across nine languages. The release puts Mistral in direct competition with ElevenLabs and OpenAI's voice products — while making the capability freely available to any developer.

D.O.T.S AI NewsroomMar 28, 2026

Breaking

Cohere's New Open-Source Model Beats OpenAI Whisper — and Runs on Edge Devices

Cohere has released Transcribe, a 2-billion-parameter open-source speech recognition model that outperforms OpenAI's Whisper across standard benchmarks while being specifically engineered for edge deployment — a combination that could fundamentally shift how enterprise voice AI is built.

D.O.T.S AI NewsroomMar 28, 2026

Breaking

Google Rolls Out Gemini 3.1 Flash Live — Making Real-Time Audio AI Feel More Like a Conversation

Google has deployed Gemini 3.1 Flash Live across its product suite, marking a meaningful step forward in real-time audio AI capabilities — with lower latency, more natural turn-taking, and improved reliability on the kind of messy real-world audio that previous versions stumbled on.

D.O.T.S AI NewsroomMar 28, 2026