For the last day of its ‘12 Days of OpenAI’ Christmas campaign, OpenAI revealed a new family of AI models, capable of ‘reasoning’ its way through tasks and fact-checking for itself, called o3, after the release of its first reasoning model family, o1.
Although o3 and o3 mini aren’t publicly available yet, researchers can sign up to test the mini version, with the release date due in mid-January.
According to OpenAI, o3 outperforms o1 in areas such as coding and Math, with the new model scoring 22.8% higher than o1 in coding benchmark tests, beating OpenAI’s chief scientist in competitive programming, and scoring 96.7% (missing just one question) on the 2024 American Invitational Mathematics Exam. It also solved 25.2% of problems on EpochAI’s Frontier Math benchmark test, which is more than any other AI model has ever been able to achieve, with the maximum amount being just 2%. It also scored 87.7% on the GPQA Diamond test, which includes graduate-level biology, physics, and chemistry questions.
Although OpenAI confirmed that o1 and o3 had been trained on a new technique called “deliberative alignment,” designed to align the models on its company safety principles, safety experts are concerned that these models are capable of deceiving human users at a higher rate than traditional AI models, from the likes of Meta, Google, and Anthropic.