Microsoft Launches Three New MAI Foundational Models — Taking Direct Aim at OpenAI and Google in the Enterprise AI Race

Microsoft has released a trio of new foundational AI models under its MAI (Microsoft AI) brand: MAI-Transcribe-1 for speech-to-text, MAI-Encoder-1 for embeddings, and a new image generation model. Running 2.5x faster than predecessors and available via Azure AI Foundry, these models signal Microsoft's intent to compete at the infrastructure layer, not just through OpenAI's API.

D.O.T.S AI Newsroom

AI News Desk

Apr 3, 20262 min read

Microsoft Launches Three New MAI Foundational Models — Taking Direct Aim at OpenAI and Google in the Enterprise AI Race

Microsoft has released three new proprietary AI models under its MAI (Microsoft AI) brand, marking a significant expansion of the company's in-house foundational model capabilities — beyond its existing OpenAI partnership and existing Azure AI services portfolio.

The Three Models

MAI-Transcribe-1 is Microsoft's flagship speech-to-text model, designed to convert spoken audio to text across 25 languages with high accuracy even in noisy environments. According to Microsoft's release notes, it runs 2.5x faster than its predecessor and is priced at $0.36 per audio hour — a competitive rate that undercuts several existing transcription API offerings. The model handles challenging acoustic conditions, background noise, and accented speech with particular strength in multilingual enterprise call center environments.

MAI-Encoder-1 is an embedding model designed for enterprise search, retrieval-augmented generation (RAG), and semantic similarity applications. Embedding models are the unglamorous but critical infrastructure layer that powers AI search, recommendation, and knowledge retrieval systems — a market currently dominated by OpenAI's text-embedding-3 series and Cohere's Embed offerings. Microsoft building its own encoder signals a strategic decision to own this dependency rather than route it through OpenAI.

The image generation model completes the trio, though Microsoft has provided fewer specifics at launch. It appears designed for Azure-integrated content generation workflows rather than as a standalone consumer product.

The Strategic Subtext

Reading between the lines, these three releases tell a coherent story: Microsoft is quietly building the capability to offer complete, end-to-end AI application stacks — speech input, text embeddings, and image generation — without routing every request through OpenAI's API. The MAI brand has been steadily accumulating models over the past year, and this release accelerates that accumulation significantly.

This matters for the enterprise AI market. Azure customers who want to build fully Microsoft-native AI applications — for regulatory, cost, or latency reasons — now have a more complete native model stack available. It also positions Microsoft to reduce its OpenAI revenue share exposure over time as these proprietary models mature.

All three models are available via Azure AI Foundry and the Azure AI model catalog.

Back to Home

Microsoft Launches Three New MAI Foundational Models — Taking Direct Aim at OpenAI and Google in the Enterprise AI Race

The Three Models

The Strategic Subtext

Related Stories

Tubi Becomes the First Streaming Service With a Native App Inside ChatGPT

Meta Breaks From Open Source: Muse Spark Is Its First Frontier Model — and First Without Open Weights

An AI Singer Who Doesn't Exist Has Taken Over the iTunes Chart — and Nobody Noticed at First