Live
OpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling SoraOpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling Sora
Tools

Microsoft's Bing Team Open-Sources 'Harrier' — a 27B Embedding Model That Tops the Multilingual MTEB Benchmark

Microsoft's Bing team has released Harrier under the MIT license on Hugging Face: a 27-billion-parameter multilingual embedding model that outperforms OpenAI and Amazon on the MTEB v2 benchmark while supporting over 100 languages and a 32,000-token context window. Two smaller variants (0.6B and 270M parameters) make the technology accessible for resource-constrained deployments.

D.O.T.S AI Newsroom

D.O.T.S AI Newsroom

AI News Desk

3 min read
Microsoft's Bing Team Open-Sources 'Harrier' — a 27B Embedding Model That Tops the Multilingual MTEB Benchmark

Microsoft's Bing team has open-sourced Harrier, a multilingual embedding model trained on over two billion examples plus synthetic data generated from GPT-5. The model achieves state-of-the-art performance on the multilingual MTEB v2 benchmark, outperforming proprietary embedding models from OpenAI and Amazon, and is available in three sizes on Hugging Face under the MIT license — one of the most permissive open-source licenses in AI model releases.

What Harrier Does

Embedding models convert text into numerical vectors that capture semantic meaning, enabling AI systems to search, retrieve, compare, and organize information by conceptual similarity rather than exact keyword match. They are foundational infrastructure for retrieval-augmented generation (RAG) systems, semantic search, document clustering, and the "grounding" mechanisms that help AI agents access relevant context before generating responses. As agentic AI systems take on more complex, multi-step tasks — autonomously browsing documents, querying knowledge bases, and synthesizing information across sources — the quality of the underlying embedding model directly affects how accurately the agent retrieves what it needs.

Harrier's 32,000-token context window is particularly significant for enterprise use cases. Most embedding models have context windows in the 512–8,192 token range, which means long documents must be chunked before embedding — a process that can lose context across chunk boundaries and degrade retrieval quality for content that requires understanding document-level structure. A 32K context window allows Harrier to embed much longer document segments as coherent units, improving retrieval precision for legal documents, technical reports, and research papers.

Why Open-Source MIT Licensing Matters

The MIT license imposes essentially no restrictions on commercial use, modification, or redistribution. This is a more permissive release than Meta's Llama licenses (which restrict certain commercial deployments) and contrasts with the typical approach of proprietary API-only embedding models from OpenAI and Cohere. MIT licensing means any company can deploy Harrier internally, fine-tune it on proprietary data, or build commercial products on top of it without royalties or usage restrictions — including at scales that would be economically prohibitive through a per-token API.

Integration Plans

Microsoft plans to integrate Harrier into Bing search and into new "grounding services" for AI agents — the infrastructure that connects agents to external knowledge sources. Making the model's underlying architecture publicly available while deploying it in production at Bing scale gives Microsoft a feedback mechanism between research and deployment that benefits both the open-source community and the company's own product roadmap. The two smaller variants (0.6B and 270M) are designed for edge and on-device scenarios where the full 27B model is impractical, making Harrier a complete model family rather than a single release.

Back to Home

Related Stories

Astropad's Workbench Turns a Mac Mini Into an AI Agent Server You Control From Your Phone
Tools

Astropad's Workbench Turns a Mac Mini Into an AI Agent Server You Control From Your Phone

Astropad, the company behind the Luna Display hardware that lets iPads function as Mac monitors, has built a new product for a new era: Workbench lets users remotely monitor and control AI agents running on Mac Minis from an iPhone or iPad. It is remote desktop software reimagined not for IT support but for the AI agent operator — the person who needs to check on autonomous workflows without being at their desk.

D.O.T.S AI Newsroom
Microsoft's Bing Team Open-Sources Harrier, a Multilingual Embedding Model That Tops the MTEB v2 Benchmark
Tools

Microsoft's Bing Team Open-Sources Harrier, a Multilingual Embedding Model That Tops the MTEB v2 Benchmark

Microsoft's Bing search team has released Harrier as an open-source embedding model, and it tops the multilingual MTEB v2 benchmark while supporting over 100 languages. The release is significant not just for the benchmark numbers but for the source: a search team that has spent decades optimizing retrieval systems has built an embedding model for the exact use case — semantic search and retrieval — that underpins most production RAG applications.

D.O.T.S AI Newsroom
Stability AI Pivots to Enterprise With Brand Studio — a Platform for Brand-Consistent AI Image Generation
Tools

Stability AI Pivots to Enterprise With Brand Studio — a Platform for Brand-Consistent AI Image Generation

Stability AI, the company that made open-source image generation mainstream with Stable Diffusion, is repositioning for enterprise with Brand Studio. The platform lets creative teams train brand-specific image models, automate visual production workflows, and route tasks to the best-suited AI model — a commercial play from a company that built its name on open access.

D.O.T.S AI Newsroom