Key Highlights –
- Microsoft has added two multi-model options to Microsoft 365 Copilot Researcher: Critique, a dual-model draft-and-review system, and Council, which runs Anthropic and OpenAI fashions concurrently and compares their outputs by a 3rd choose mannequin
- Critique outperforms Perplexity’s Claude Opus 4.6 by 13.88% on the DRACO analysis benchmark, in line with Microsoft’s personal analysis, which is the primary time Microsoft has revealed a direct comparative benchmark towards a named competitor on this context
- Each options are at the moment obtainable by the Microsoft 365 Copilot Frontier program, focused at enterprise customers; broader rollout timing has not been confirmed
Microsoft introduced Critique and Council right now by way of Satya Nadella’s X account, with supporting documentation revealed concurrently on the Microsoft Tech Group weblog. Each options sit inside Microsoft 365 Copilot Researcher, the instrument geared toward enterprise professionals who want structured, sourced analysis outputs moderately than conversational responses.
As you might know, Copilot has traditionally operated on a single mannequin at a time which was usually OpenAI’s GPT sequence, with Microsoft progressively integrating different distributors. Critique and Council characterize a structural shift in how Copilot approaches analysis duties, transferring from a single mannequin producing a response to a number of fashions working in sequence or in parallel.
How Critique Works
Critique is constructed on a two-model pipeline. The primary mannequin handles the analysis itself – planning the strategy, sourcing related materials, and synthesising a draft. The second mannequin then opinions that draft, particularly checking for supply reliability, completeness, and whether or not claims are grounded in proof moderately than inference.
The structure is designed to catch the class of errors that single-model analysis constantly produces: confident-sounding claims which might be both unsupported or draw from low-quality sources. Whether or not a two-model test eliminates that downside in observe is an open query, however the DRACO benchmark outcomes which is a 13.88% enchancment over Perplexity’s Claude Opus 4.6 and offers Microsoft a concrete, named information level to face behind. That specificity is uncommon. Most AI product bulletins cite inner benchmarks with out naming the competitor being surpassed.
How Council Works
Council takes a special strategy. Slightly than fashions working in sequence, Council runs each Anthropic and OpenAI fashions on the identical analysis immediate concurrently. A 3rd mannequin then acts as a choose, reviewing each outputs and producing a abstract that flags the place they agree, the place they diverge, and what every provides uniquely.
For perspective, it is a significant product determination past technical structure. Microsoft is now actively positioning Claude and GPT as complementary instruments throughout the similar enterprise workflow moderately than competing options. The sensible implication for customers: they’ll, for the primary time, see the place two main AI fashions produce completely different conclusions on the identical analysis transient, and make an knowledgeable name on which to make use of.
What It Means for Enterprise AI
The broader sign right here is that Microsoft is transferring towards a multi-vendor AI stack on the product layer, not simply the infrastructure layer. Constructing Council into Copilot acknowledges that no single mannequin is definitively higher throughout all analysis duties, and that displaying customers the distinction is extra helpful than hiding it behind a single interface.
The danger is complexity. Enterprise customers who adopted Copilot for simplicity might discover a side-by-side mannequin comparability output more durable to behave on than a single clear reply. Whether or not Frontier program individuals discover Council’s judge-model summaries genuinely helpful, and even simply attention-grabbing, will decide how far this function travels past the early entry cohort.
Wrapping Up
Critique and Council will not be incremental function updates. They characterize a change in how Microsoft thinks Copilot ought to function much less as a single AI assistant and extra as a structured analysis course of with built-in verification. The query the Frontier program will reply is whether or not enterprise customers need their AI to point out its working, or whether or not they simply need the reply.
