Frontier AI capability has advanced faster in 2024–2026 than in any prior two-year period. Three benchmarks illustrate the pace — and the acceleration is real.
SWE-bench (coding), GPQA Diamond (scientific reasoning), and AIME math competition scores — each showing consistent upward progression from March 2024 to March 2026.
METR measures the length of software engineering tasks an AI can complete with 50% success rate. The log scale shows the exponential trend clearly — the linear view reveals the dramatic acceleration.
ARC-AGI-2 launched in March 2025 with all frontier models below 5%. Within a year, scores surged past the human baseline of 60%. This is the sharpest capability inflection on record.
Six observations from two years of frontier AI development.
The models that moved the benchmarks, in order.