OpenAI Jumps Into Text-to-Video Fray With Sora, Challenging Meta, MidJourney and Pika Labs

The leader in the AI chatbot space finally adds the ability to conjure moving pictures that are longer and higher quality than those made with more established tools.

Feb 15, 2024

3 min read

Image created by Decrypt using AI

OpenAI today unveiled Sora, a new artificial intelligence model that can take text-based instructions and create long, captivating videos. Well, one-minute long videos.

It’s currently a closed beta that's available only to invited developers, and represents a somewhat late entry by the global leader in AI. Text-to-video isn't exactly uncharted territory. Companies like RunwayML and Pika Labs have been in the game for a while and currently dominate the scene with models capable of creating stunning visuals in seconds.

But there's always a catch: these videos tend to be short, the story losing focus and coherence the longer they run.

With Sora, OpenAI aims to achieve consistency, generating highly detailed, minute-long videos that can seamlessly flow and evolve. It’s not a simple goal, as AI models effectively improvise every frame from scratch. A tiny flaw in a single frame can snowball into a cascade of hallucinations and unrealistic imagery.

OpenAI seems to have made headway though, with Sora demonstrating smooth, captivating visuals that are so far unmatched by current players in the space. Example videos were posted online by OpenAI, and some have been republished unofficially on YouTube.

OpenAI is going head-to-head with other AI companies that are also testing the waters of generative video. Popular text-to-image generator Midjourney recently announced that it is working on a text to video generator, but didn’t offer a release date. Also, Stability AI recently made waves with Stable Video Diffusion, its open-source offering capable of generating videos of 25 frames at resolution 576x1024.

Even Meta is showing off its EMU video generator, part of its push to weave AI into social media and the metaverse.

Sora—which is in limited release for now, with OpenAI giving access to “visual artists, designers, and filmmakers” for feedback—distinguishes itself by how it understands language. It generates vibrant, highly detailed images while interpreting the nuances of written prompts. Need a specific camera motion? Multiple characters with realistic emotions? No problem.

Sora even generates seamless transitions between different shots within the same video, mimicking what some video edition tools already do today. Here is another enthusiast video posted today to YouTube:

Even so, AI-powered creativity comes with its quirks. Sora isn't quite a cinematic maestro yet. Struggles with physics or intricate cause-and-effect may occur, in other words, and while it is already one of the most consistent video generators, it doesn’t achieve levels of absolute fidelity, so hallucinations are to be expected.

Also, coming from OpenAI, Sora will undoubtedly be a heavily censored model. The company emphasized its focus on safety tests and detection tools to flag potentially harmful and misleading content. OpenAI is working with its red team to polish its model and hopes its early release strategy will lead to collaboration in building increasingly secure AI in the coming years.

No immediate release date has been announced for Sora's wider implementation.

Edited by Ryan Ozawa.

OpenAI Jumps Into Text-to-Video Fray With Sora, Challenging Meta, MidJourney and Pika Labs

The leader in the AI chatbot space finally adds the ability to conjure moving pictures that are longer and higher quality than those made with more established tools.

Decrypt’s Art, Fashion, and Entertainment Hub.

Stay on top of crypto news, get daily updates in your inbox.

Coin Prices