Mantis Biotech Is Building 'Digital Twins' of Humans to Solve Medicine's Training Data Problem

Mantis Biotech generates synthetic medical datasets — 'digital twins' of the human body — to supply AI training data that real patient records cannot legally or practically provide. The approach targets one of healthcare AI's most persistent bottlenecks: the gap between what AI systems need to learn and what hospitals can share.

D.O.T.S AI Newsroom

AI News Desk

Mar 31, 20262 min read

Mantis Biotech Is Building 'Digital Twins' of Humans to Solve Medicine's Training Data Problem

Mantis Biotech is developing synthetic "digital twin" datasets of the human body to address what the company describes as one of medicine's most structural problems: the scarcity of high-quality, accessible training data for healthcare AI. The startup takes disparate sources of medical information — imaging studies, genomic data, lab results, clinical notes — and generates synthetic datasets representing human anatomy, physiology, and behavior at scale.

The Data Availability Problem in Medical AI

Most medical AI systems require large volumes of labeled patient examples to train effectively. In practice, that data is difficult to access. HIPAA and equivalent international privacy regulations impose significant consent, de-identification, and data governance requirements on patient records. Institutional review boards, data sharing agreements, and legal liability concerns further slow access. For rare diseases, or uncommon presentations of common conditions, the data may simply not exist in sufficient quantity anywhere in the world, regardless of access barriers.

Mantis' synthetic approach aims to sidestep these constraints. Synthetic datasets generated to represent real biological processes can be used to train diagnostic AI, test clinical decision support systems, and model pharmaceutical interventions — without touching actual patient records. The company can generate arbitrarily large datasets for any condition, including rare diseases where real-world data is inherently scarce.

Why Now

The timing reflects the convergence of two capabilities: advances in generative AI that can produce high-fidelity biological simulations, and the growing enterprise appetite for medical AI that is not stalling in regulatory review. Healthcare AI deployments have accelerated over the past two years, but the bottleneck has increasingly shifted from model capability to training data — precisely the problem Mantis targets.

Pharmaceutical research, diagnostic imaging AI, and clinical trial modeling represent Mantis' initial market segments. Each involves a different use case for synthetic data: drug discovery benefits from the ability to simulate rare patient populations; diagnostic imaging AI requires large volumes of labeled scans; clinical trial modeling needs diverse physiological variation to test drug responses across demographics.

Mantis Biotech has raised an undisclosed amount of funding and has not disclosed revenue figures or named enterprise customers. The company faces a validation challenge common to synthetic data startups: demonstrating that models trained on synthetic data perform equivalently on real patient populations — a claim that requires rigorous clinical validation before healthcare systems will adopt it in high-stakes applications.

Back to Home

Mantis Biotech Is Building 'Digital Twins' of Humans to Solve Medicine's Training Data Problem

The Data Availability Problem in Medical AI

Why Now

Related Stories

Google's AI Overviews Are Right Nine Times Out of Ten — but the 10% Failure Rate Has a Specific Shape

Databricks Co-Founder Wins Top Computing Prize — and Says AGI Is 'Already Here'

Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters