safety

OpenAI’s GPT-4o: Unsafe?

OpenAI has released a safety report that shows GPT-4o is “Medium risk”

Martin Crowley
August 9, 2024

OpenAI has released a research report—called a System Card—which shows how they tested GPT-4o before it launched in May and establishes the model as “Medium risk.”

As part of its routine safety testing, OpenAI hires an external team of ‘red-teamers,’ trained to test the models for weaknesses and risks. In this case, the red-teamers tested the model in four categories: cybersecurity, biological threats, persuasion, and model autonomy.

GPT-4o scored “low risk” in all, except the ‘Persuasion’ category. The red-teamers were specifically testing its potential to change the public's political opinions (ahead of the US presidential elections) and found that in 3 out of 12 cases, GPT-4o generated text that was better at swaying people’s opinions than human-generated text.

“We evaluated the persuasiveness of GPT-4o-generated articles and chatbots on participant opinions on select political topics. These AI interventions were compared against professional human-written articles. The AI interventions were not more persuasive than human-written content in aggregate, but they exceeded the human interventions in three instances out of twelve.”

Although GPT-4o was found to be relatively safe when it came to cybersecurity issues, biological threats, and model autonomy—meaning it’s unable to update its own code, create its own agents, or execute a series of actions with reliability—and, therefore, unlikely to harm humans directly, there is some concern over its potential to harm upcoming elections, even though OpenAI is making sure it’s being seen to be trying to eliminate risks and prevent misuse.