Microsoft's Bing Team Open-Sources Harrier, a Multilingual Embedding Model That Tops the MTEB v2 Benchmark
Microsoft's Bing search team has released Harrier as an open-source embedding model, and it tops the multilingual MTEB v2 benchmark while supporting over 100 languages. The release is significant not just for the benchmark numbers but for the source: a search team that has spent decades optimizing retrieval systems has built an embedding model for the exact use case — semantic search and retrieval — that underpins most production RAG applications.

D.O.T.S AI Newsroom
AI News Desk
Microsoft's Bing team published Harrier, an open-source embedding model that achieves state-of-the-art performance on the Multilingual MTEB v2 benchmark and supports over 100 languages. The model is available on Hugging Face for immediate use. Embedding models are the unsung infrastructure of the AI application stack: they convert text into numerical vector representations that can be compared mathematically, enabling semantic search, retrieval-augmented generation, clustering, and deduplication at scale. Harrier's release from Bing's search engineering team — rather than from a frontier model lab — is notable because search is the domain where embedding model quality has the most direct, measurable production impact.
What MTEB v2 Measures
The Massive Text Embedding Benchmark, now in its second version, evaluates embedding models across a comprehensive set of retrieval and classification tasks in multiple languages. It has become the standard leaderboard for the embedding model ecosystem, used to compare models from Cohere, OpenAI, Google, Voyage AI, and a range of open-source projects. Topping the multilingual portion of MTEB v2 is a meaningful achievement: multilingual embedding is technically harder than monolingual embedding because the model must learn representations that are semantically consistent across languages with very different morphology, syntax, and script systems. A query in Spanish and a matching document in English should produce similar vector representations; achieving that cross-lingual alignment across 100+ languages requires substantial training data and architectural investment.
Why the Source Matters
Embedding models designed for benchmark performance and embedding models optimized for production search at scale are not always the same thing. Benchmark tasks tend to be clean, well-structured, and drawn from a specific distribution of academic and web documents. Production search in Bing's context involves billions of documents, highly variable query intent, multiple languages simultaneously, and real-time latency constraints. The Bing team has been optimizing for that production environment, not the benchmark. The fact that Harrier achieves top benchmark performance suggests the two optimization targets are more aligned than skeptics might expect — or that the Bing team has found a way to transfer production retrieval insights into a general-purpose embedding model without sacrificing benchmark generalizability.
Implications for RAG Applications
Most production retrieval-augmented generation deployments — where an LLM is given access to an external knowledge base that it retrieves from based on the user's query — depend critically on embedding model quality. A better embedding model means more relevant documents are retrieved, which means the LLM has better context to work with, which means fewer hallucinations and more accurate answers. This is the part of the AI application stack that receives the least public attention relative to its importance. Harrier being open-source means developers can use it without API costs, fine-tune it for their specific domain, and run it on their own infrastructure for privacy-sensitive applications. For multilingual RAG applications in particular, a model that supports 100+ languages with top-tier performance addresses a gap that has been a persistent limitation for global deployments.