Microsoft Expands Copilot Cowork With Multi-Model AI Verification — One Model Checks Another's Work

Microsoft's Wave 3 Copilot update introduces autonomous workflow handling across Microsoft 365 and a new dual-model 'Researcher' tool where one AI drafts and a second AI critiques. The system uses both Anthropic and OpenAI models, with internal benchmarks showing Claude Opus 4.6 outperforming Perplexity by 7 points on deep research tasks.

D.O.T.S AI Newsroom

AI News Desk

Mar 31, 20262 min read

Microsoft Expands Copilot Cowork With Multi-Model AI Verification — One Model Checks Another's Work

Microsoft has rolled out the third wave of its Microsoft 365 Copilot expansion, broadening the availability of Copilot Cowork — a feature that enables AI systems to autonomously handle multi-step workflows across the Microsoft 365 suite — and introducing a new dual-model verification approach to agentic research tasks.

The most technically notable addition in Wave 3 is the Researcher tool, which implements a critique function: one AI model drafts a response, and a separate AI model reviews and challenges it. Microsoft's implementation routes both Anthropic and OpenAI models through the same workflow, allowing the system to leverage different model strengths at different stages of a task. Internal benchmarks published alongside the announcement show the system — using Claude Opus 4.6 in the research role — achieving a score that outperforms Perplexity by 7 points on deep research evaluation tasks.

What Cowork Actually Does

Cowork handles what Microsoft describes as "complete workflows" — not single responses, but sequences of actions involving file access, calendar management, document generation, and cross-application coordination. In practice, a Cowork task might involve pulling data from a SharePoint folder, drafting a briefing document, scheduling a review meeting, and sending a summary to a distribution list — all initiated by a single natural language prompt.

Wave 3 also introduces a Model Council feature, which surfaces responses from multiple AI models side-by-side, allowing users to identify where models agree and diverge before acting on AI-generated output. The feature is positioned as a trust mechanism for high-stakes decisions where users want to stress-test AI conclusions across multiple systems simultaneously.

The Competitive Framing

Microsoft's benchmark claims come with a notable gap: the comparison does not include OpenAI's GPT-5-based Deep Research tool, which launched after the Wave 3 evaluation was conducted. That omission limits the utility of the published performance data. The agentic research market is moving fast enough that benchmarks dated by a few months may not reflect the current competitive landscape.

The broader signal from Wave 3 is that Microsoft is now treating multi-model orchestration — routing different AI systems to different sub-tasks within a single workflow — as a product differentiator rather than an implementation detail. This represents a meaningful shift from the earlier Copilot architecture, which defaulted to a single underlying model. Whether enterprise users will navigate the complexity that multi-model systems introduce, or whether they will default to single-model simplicity, remains an open question.

Wave 3 features are available through Microsoft's Frontier program, which provides early access to Copilot capabilities ahead of general availability.

Back to Home

Microsoft Expands Copilot Cowork With Multi-Model AI Verification — One Model Checks Another's Work

What Cowork Actually Does

The Competitive Framing

Related Stories

AWS Has Billions in Both Anthropic and OpenAI. Its Boss Explains Why That's Not a Problem.

Anthropic Poaches Microsoft's Azure AI Chief to Fix Its Infrastructure Problem

Intel's Nerdy Bet on Advanced Chip Packaging Could Decide Who Wins the AI Infrastructure Race