Throughout its massive GPT-5 livestream on Thursday, OpenAI confirmed off a couple of charts that made the mannequin appear fairly spectacular — however for those who look intently, some graphs had been somewhat bit off.
In a single, sarcastically displaying how nicely GPT-5 does in “deception evals throughout fashions,” the size is in every single place. For “coding deception,” for instance, the chart proven onstage says GPT-5 with pondering apparently will get a 50.0 p.c deception price, however that’s in comparison with OpenAI’s smaller 47.4 p.c o3 rating which one way or the other has a bigger bar. OpenAI seems to have correct numbers for this chart in its GPT-5 weblog put up, nonetheless, the place GPT-5’s deception price is labeled as 16.5 p.c.
With this chart, OpenAI confirmed onstage that one among GPT-5’s scores is decrease than o3’s however is proven with a much bigger bar. On this similar chart, o3 and GPT-4o’s scores are completely different however proven with equally-sized bars. It was unhealthy sufficient that CEO Sam Altman commented on it, calling it a “mega chart screwup,” although he famous {that a} right model is in OpenAI’s weblog put up.
An OpenAI advertising and marketing staffer additionally apologized, saying, “We mounted the chart within the weblog guys, apologies for the unintentional chart crime.”
OpenAI didn’t instantly reply to a request for remark. And whereas it’s unclear if OpenAI used GPT-5 to truly make the charts, it’s nonetheless not an ideal search for the corporate on its massive launch day — particularly when it’s touting the “important advances in decreasing hallucinations” with its new mannequin.
