Research2 min read
Can Frontier AI Write Formally Verified Graduate Math Proofs? A New Benchmark Has the Answer.
FormalProofBench is a new private benchmark that tests whether AI models can produce graduate-level mathematical proofs that are formally verified — not just plausible-sounding, but machine-checkably correct. The results expose a gap between AI math fluency and AI mathematical rigour.