OpenAI Releases Sora: A Revolutionary Text-To-Video AI Model

OpenAI has launched Sora, a cutting-edge AI model that generates realistic and imaginative video scenes from text prompts. Sora is a significant improvement in AI capabilities, enabling anyone to create high-quality videos simply by describing what they want to see.

Access is currently limited to ChatGPT Plus and Pro users, with plans for wider availability and tailored pricing.

Table of Contents

What is Sora?

OpenAI's Sora is a new text-to-video AI model capable of generating realistic videos up to one minute long. The model, built upon transformer architecture and diffusion models, uses patches of visual data for scalable training across varying resolutions and aspect ratios.

Sora's release includes safety measures such as watermarks and content filtering to mitigate misuse. The technology, while impressive, still has limitations in accurately simulating physics and complex actions.

Sora's Capabilities and Features

Sora comes with several impressive capabilities:

Generates high-fidelity videos of up to one minute in length. Users can create videos with various characters, movements, and detailed backgrounds.
Understands and simulates the physical world. Sora doesn't just process the words in a prompt; it grasps how those elements interact in reality.
Accurately interprets language and generates characters with realistic emotions.
Creates multi-shot videos that maintain visual style and character consistency.
Offers various input options beyond text prompts. Sora can also generate videos from still images or existing videos, allowing users to animate pictures, extend video clips, or fill in missing frames.

OpenAI has also developed a dedicated interface for Sora with user-friendly features:

Support for 1080p resolution videos up to 20 seconds long. Users can choose between widescreen, vertical, or square aspect ratios.
Integration of user-provided assets for remixing and blending with generated content.
A storyboard tool allows users to specify inputs for each frame.
Featured and Recent feeds showcasing community creations.

Sora's Availability and Subscription Plans

Sora is being rolled out at sora.com and is currently accessible to ChatGPT Plus and Pro users.

Plus users can generate up to 50 videos per month at 480p resolution, with reduced quantities at higher resolutions (e.g., 720p).
Pro users receive ten times the usage allowance, higher resolutions, and longer video durations.
OpenAI plans to introduce tailored pricing for different users in early 2025.

Currently, Sora is not available to ChatGPT Team, Enterprise, or Edu users, nor to those under 18 years old.

Access is also restricted in certain regions, including the UK, Switzerland, and the European Economic Area. OpenAI is working to expand access in the future.

How to Create Videos with Sora

Creating videos with Sora is quite simple!

This is how the interface of Sora looks like.

Here are the steps to create videos from a text prompt using OpenAI Sora.

Step 1: Open the Composer

At the bottom of the Sora interface, locate the Composer. This is where you will input your text description of the desired video.

Step 2: Write Your Prompt

Enter a detailed description of the video you want to create. For example: "A family of woolly mammoths in an open desert."

Step 3: Review Settings

Add a style preset to influence the artistic look of your video.
Adjust the aspect ratio, resolution, and duration.
Specify the number of variations you want from the same prompt.
Hover over the help icon to see the credit cost for your selected settings.

Step 4: Create the Video

Once satisfied with the settings, click the Create button. Your video will appear in the library and start generating.

Step 5: Preview Variations

Hover over the generated videos in your library to preview them in real time.
Use your mouse to scrub through the videos at different speeds.
Click on a video to open it in a lightbox and use arrow keys to navigate through variations.

Step 6: Edit Your Video:

You can use the editing toolbar at the bottom of the video for advanced controls.

The toolbar has the following options:

Modify Prompt/Storyboard: Adjust the text description or storyboard for the clip.
Recut: Trim or extend sections of the video.
Remix: Use natural language to make additional changes.
Blend: Combine elements of your video with another.
Loop: Create a seamlessly repeating section.

Step 7: Quick Actions

At the top of the screen, you can favorite, share, or download your video.

Step 8: Explore for Inspiration

Visit the Explore section to view community creations, gather inspiration, and collaborate by using others’ clips.

Step 9: Learn More

Access video tutorials in the user account menu for additional guidance on Sora features.

OpenAI's Approach to Sora's Deployment

OpenAI acknowledges that this version of Sora has limitations. It can sometimes generate unrealistic physics and struggles with complex actions over extended periods. Additionally, while significantly faster than the model previewed in February, OpenAI continues to work on making the technology more affordable.

OpenAI is releasing Sora early to give society time to understand its potential and to collaboratively develop norms and safeguards for its responsible use.

Addressing Safety Concerns

To ensure responsible use and mitigate potential harms, OpenAI is implementing several safety measures:

All Sora-generated videos include C2PA metadata, identifying them as originating from Sora and allowing for origin verification.
Visible watermarks are included by default.
An internal search tool, utilising technical attributes of the generated videos, aids in content verification.
OpenAI is actively blocking harmful content such as child sexual abuse materials and sexual deepfakes.
Uploads featuring people are initially limited but will be expanded as deepfake mitigations improve.
OpenAI has engaged red teamers, experts in areas like disinformation and illegal content, to test the model and identify potential risks.
Robust safety systems developed for ChatGPT, DALL·E, and OpenAI's API are being leveraged for Sora.
A detection classifier identifies Sora-generated videos, and text and image classifiers filter harmful content.
OpenAI plans to engage policymakers, educators, and artists globally to address concerns and explore positive use cases.

Behind Sora's Technology

Sora's technological foundation is built upon several innovative approaches:

Diffusion model: Sora generates videos by starting with a noise-filled video and progressively removing the noise until a clear video emerges.
Transformer architecture: Similar to GPT models, this allows for superior scaling performance.
Unified data representation using patches: Video and image data are broken down into smaller units, allowing Sora to handle varying durations, resolutions, and aspect ratios.
Recaptioning technique: Sora utilizes highly descriptive captions for training data, similar to DALL-E 3, leading to improved text fidelity and video quality.

Sora's Potential Impact

Sora is definitely going to revolutionise video creation and storytelling. It aims to allow individuals and professionals to bring their creative visions to life effortlessly. The ability to simulate reality opens up new possibilities in various fields, including filmmaking, design, and education.

Sora is still under active development. As OpenAI continues to refine and improve Sora, addressing its limitations and ensuring responsible use, it has the potential to reshape how we interact with and create video content.

Resource:

Sora is here

Related Read:

Ben Affleck on AI in Film Industry "A Craftsman, Not an Artist"

AI Artificial Intelligence ChatGPT News OpenAI Releases Sora Technology Text-to-Video Video AI Model