OpenAI, the Microsoft-backed creator of ChatGPT, has released an initial look at Sora — a new AI model that can create video content from text prompts.
Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. OpenAI has made the tool available to “red teamers” who are specially picked developers looking for critical areas of harm and risks.
The prompt for this video was: A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colours.
The company is also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.
ChatGPT, as well as text-to-image generators such as Dall-E, have been heralded by some in the creative industries as huge time savers.
Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.
The prompt for this video was: A gorgeously rendered papercraft world of a coral reef, rife with colourful fish and sea creatures.
The company said that Sora has a “deep understanding” of language, making it capable of creating “compelling” characters that express “vibrant” emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.
The firm did say it has a number of flaws.
“It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark,” it explained.
“The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.”