Coding Benchmarks

OpenAI Declares SWE-bench Verified Obsolete for Measuring Frontier Coding Capabilities

OpenAI publishes a blog post officially declaring that SWE-bench Verified has saturated and can no longer effectively differentiate frontier AI models’ coding abilities.