📰 News Report
Since 2023, the US AI Safety Institute (AISI) has been running a variety of frontier AI models through 95 different Capture the Flag challenges designed to test capabilities on cybersecurity tasks, including reverse engineering, web exploitation, and cryptography.
Test Results
The latest test results show that GPT-5.5 performs on par with the heavily hyped Mythos Preview in the highest-level cybersecurity evaluations. In the “TLO” test range — which simulates a 32-step data extraction attack on a corporate network — GPT-5.5 succeeded in 3 out of 10 attempts, compared to 2 out of 10 for Mythos Preview.
The new results for GPT-5.5 suggest that, when it comes to cybersecurity risk, Mythos Preview was likely not “a breakthrough specific to one model” but rather “a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding.”
OpenAI’s Response
In a recent interview with the Core Memory podcast, OpenAI CEO Sam Altman criticized what he calls “fear-based marketing” in promoting limited releases for certain AI models. While he said he’s “sure Mythos is a great model for cybersecurity,” he added: “There will be a lot more rhetoric about models that are too dangerous to release. There will also be very dangerous models that will have to be released in different ways.”
On Thursday, Altman said on social media that the initial release of GPT-5.5-Cyber would similarly be limited “to critical cyber defenders in the next few days.”
Industry Impact
In February, OpenAI rolled out its Trusted Access for Cyber pilot program, letting security researchers and enterprises verify their identities and register their interest in studying OpenAI’s frontier models for “legitimate defensive work.”
These test results carry significant implications for the broader AI safety industry. They suggest that cybersecurity capability improvements are becoming a general trend across frontier AI models rather than an isolated phenomenon. This is prompting regulators and enterprises to reassess risk evaluation frameworks for AI models in the cybersecurity domain.
Source: Ars Technica