Following a series of conflicting speculations over what OpenAI was about to announce in its livestream (strategically planned to go live the day before Google’s big I/O developer conference), OpenAI has revealed its newest GPT model–GPT-4o–an upgraded GPT-4 model which will power new and improved features within ChatGPT.
GPT-4o is reportedly 2x faster, 50% cheaper, and more up-to-date, as it’s been trained on data up until October 2023.
According to OpenAI, GPT-4o will provide ChatGPT users with “GPT-4-level” intelligence but with improved capabilities across text, vision, and audio modalities and media: Hence the name GPT-4o (the “o” stands for “Omni”).
GPT-4o will also power a new voice assistant feature that can observe what’s going on in the real world, read facial expressions, and translate spoken language in real time, allowing users to interact with ChatGPT more like an assistant.
During the live stream, OpenAI demonstrated the new voice features capabilities, asking it to tell a bedtime story in a range of different styles and expressions, from a robotic voice to a sing-song one. They also asked it to describe what it could see on a phone camera and were able to interrupt it while it was speaking, and it just carried on, afterwards, without needing further prompting.
This is far more advanced than its existing voice feature, which can only respond to one prompt at a time, only works with what it can hear, and can’t be interrupted or respond to what it sees.
GPT-4o’s text and image capabilities are available now and will be free for all ChatGPT users, but its voice feature will “be rolled out iteratively”, in a limited alpha release, starting with “a small group of trusted partners” in the coming weeks to prevent misuse.