News

Sora: OpenAI unveils latest tool that converts text prompts to videos

Sora employs a text-to-video approach, promising high-quality visuals aligned with user prompts. Beyond text prompts, Sora can animate existing images and extend videos by filling in missing frames.

Social Samosa

16 Feb 2024 13:18 IST

New Update

OpenAI has unveiled its latest innovation: Sora, a software touted to be capable of producing remarkably lifelike one-minute videos from text inputs. Under Sam Altman, the AI startup is currently fine-tuning Sora in the red teaming phase, aiming to identify and address any potential weaknesses. Collaborating with visual artists and filmmakers, OpenAI seeks feedback to enhance the model's performance.

Sora, introduced by CEO Sam Altman on his X account, employs a text-to-video approach, promising high-quality visuals aligned with user prompts. OpenAI asserts Sora's ability to generate intricate scenes with multiple characters and nuanced movements, understanding both the user's intent and real-world implications. Altman showcased Sora's capabilities through diverse examples, from playful dolphins to fantastical scenarios like a squirrel riding a dragon.

here is sora, our video generation model:https://t.co/CDr4DdCrh1

today we are starting red-teaming and offering access to a limited number of creators.@_tim_brooks @billpeeb @model_mechanic are really incredible; amazing work by them and the team.

remarkable moment.
— Sam Altman (@sama) February 15, 2024

https://t.co/rmk9zI0oqO pic.twitter.com/WanFKOzdIw
— Sam Altman (@sama) February 15, 2024

Operating on a diffusion model with transformer architecture akin to GPT models, Sora processes videos and images in patches, akin to tokens in GPT, enabling scalable performance.

https://t.co/qbj02M4ng8 pic.twitter.com/EvngqF2ZIX
— Sam Altman (@sama) February 15, 2024

Built upon research from DALL-E and GPT models, Sora incorporates recapturing techniques from DALL-E 3 for generating descriptive captions. Beyond text prompts, Sora can animate existing images and extend videos by filling in missing frames. OpenAI emphasizes Sora's comprehension of language, enabling accurate interpretation of prompts and the creation of emotionally expressive characters. However, the model faces challenges in accurately depicting complex physics and causal relationships, occasionally leading to inaccuracies in scene details.

Regarding safety, OpenAI assures users of rigorous measures, including collaboration with domain experts to combat misinformation, hateful content, and bias. Adversarial testing and the development of detection tools further reinforce Sora's safety protocols.

OpenAI gpt Sam Altman text-to-video