Seven months after it released its debut text-to-video model, Veo, Google has launched an updated version (Veo 2) which is reportedly better than OpenAI’s Sora, which was released just last week.
Veo 2, which is only available to a select few, via a waitlist (Google does plan to expand waitlist users this week), can generate video clips in resolutions of up to 4k—which means the footage is crystal clear—that can be up to 8 mins long: 6x longer than Sora’s videos, which can only be up to 20-seconds in length.
Plus, Google claims Veo 2 “brings an improved understanding of real-world physics and the nuances of human movement and expression,” which means it doesn’t produce hallucinated details, like bonus fingers or warped features, like many other AI video generators—Sora included—are prone to do. Although early demos show lifeless eyes, unrealistic surfaces, and gravity-defying buildings in the background.
It has updated camera controls that allow users to ask for different lenses, cinematic effects, camera angles, and depth of field, and can realistically replicate ‘fluid dynamics’ —like pouring water—and light properties—like shadows and reflections.
While Google has sung its praises, it has acknowledged that Veo 2 still needs work:
“Veo can consistently adhere to a prompt for a couple of minutes, but [it can’t] adhere to complex prompts over long horizons. Similarly, character consistency can be a challenge. There’s also room to improve in generating intricate details, fast and complex motions, and continuing to push the boundaries of realism.”
And is, therefore, relying on user, artist, and producer feedback to finetune the model, before general public release, sometime next year.