Agentic model leaderboard
Ranked by Cost-to-Done: the real dollars to finish the job.
preview dataupdated 18h ago
Benchmarks test models in a vacuum. We run them on real, multi-step work and rank by what it actually costs to finish. The Recursiv-only columns are ones no single-model benchmark can produce.
Experiments
Every number above comes from one of these runs.
Run it yourself
Stop guessing which model to ship.
Every number here was produced by running real agentic work on Recursiv. Book a demo and we will show you the platform these experiments run on.