The team of OpenAI employees testing OpenAI’s new, upcoming AI model (Orion) has revealed that the model’s rate of improvement is nowhere near as high as the rate of improvement we saw between the launch of GPT-3 and GPT-4.
The model, although performs better than OpenAI’s existing models, isn’t as dramatically improved as GPT-4 was, when compared to GPT-3, and it can’t be consistently relied upon for completing complex reasoning tasks, like coding, for example
The slower rate of improvement is down to a lack of unused, high-quality, real-life training data and, in response, OpenAI has formed a foundations team who have been tasked with finding a solution.
One approach they’re trialing is training Orion on synthetic data—which has been generated using AI models and imitates patterns seen in real-world data—alongside real-world data to introduce new layers of variability and nuance, so the model can improve its ability to process and handle more complex, real-world scenarios.
While synthetic data could be a way to overcome the data scarcity issue, it’s a fairly new concept that is susceptible to bias and inaccuracies, so OpenAI must carefully finetune Orion to make sure the data remains realistic.
Another approach lies in finetuning the model after training, using techniques like reinforcement learning and fine-tuning on specific tasks, to address any performance gaps that real-world data can’t fill.
A data shortage isn’t just affecting OpenAI. This is an issue that the entire AI industry is facing, and is raising serious questions about the future of AI models and how advanced they can be, without access to new, high-quality data.
The whole industry will be watching OpenAI to see if their resolutions work to help step up the rate of AI model improvement, enabling them to reach their full potential.