Researchers Fingerprinted 178 AI Models' Writing Styles — and Found Alarming Clone Clusters
A new study from Rival analyzed 3,095 standardized responses across 178 AI models, extracting 32-dimension stylometric fingerprints to map which models write like which others. The findings reveal tightly grouped clone clusters across providers — and raise serious questions about whether the AI ecosystem is converging on a single voice.

D.O.T.S AI Newsroom
AI News Desk
A study released this week offers the most systematic attempt yet to answer a question that has nagged practitioners since the LLM boom: do different AI models actually write differently, or are they converging on the same voice? Rival analyzed 3,095 standardized responses across 178 models, building 32-dimension stylometric fingerprints from sentence structure, vocabulary diversity, hedge-word frequency, punctuation patterns, and half a dozen other measurable stylistic signals. The results are uncomfortable reading for anyone who values epistemic diversity in AI-generated content.
The Clone Cluster Problem
The study's most striking finding is the existence of what the researchers call "clone clusters" — groups of nominally distinct models whose stylometric signatures are nearly indistinguishable. Several clusters span models from different providers and different model families, suggesting that the convergence is not simply a matter of fine-tuning from a shared base model. The training data pipelines, RLHF feedback preferences, and safety fine-tuning processes all appear to be pushing models toward similar stylistic attractors regardless of their origin. In practical terms: if you swap one cluster member for another in a workflow, you may not be getting the diversity of perspective you think you are.
What Drives Stylistic Convergence
The researchers identify three likely mechanisms. First, training data overlap: the corpus of high-quality English text on the internet is finite, and models trained on Common Crawl derivatives share a statistical substrate. Second, RLHF preference homogenization: human raters have consistent aesthetic preferences (clear, confident, organized prose), and models trained to maximize human approval ratings converge on the same prose style even when starting from different initializations. Third, safety fine-tuning: the hedging language, disclaimer patterns, and refusal phrasing that safety training produces are remarkably consistent across providers, adding a layer of stylistic similarity on top of any base-model differences.
Why This Matters Beyond Linguistics
The stylometric similarity finding has implications beyond writing aesthetics. If models write alike because they reason alike — because the same training signals produce not just similar prose but similar epistemics — then using multiple AI models as a diversity mechanism in research, journalism, or decision-making pipelines may be less effective than it appears. The study does not resolve this question, but it points directly at it. For practitioners building multi-agent systems that rely on diverse model perspectives as a quality signal, this research warrants a hard look at whether stylometric diversity correlates with the epistemic diversity they actually want.