OpenAI’s new tool for video generation looks better than those of competitors

For a while now, text-to-image artificial intelligence has been a popular topic in technology. While text-to-image generators like Midjourney are becoming more and more popular, text-to-video models are being developed by companies like Runway and Pika.

An important player in the AI industry, OpenAI, has been causing quite a stir lately, particularly with the introduction of ChatGPT, according to this article. In less than two months, the AI tool gained 100 million users—a quicker growth rate than either Instagram or TikTok ever could. OpenAI released DALL-E, its text-to-image model, before ChatGPT. The company released DALL-E 2 by 2022; however, access was first restricted because of concerns over explicit and biased images. These problems were eventually resolved by OpenAI, enabling universal access to DALL-E 2.

Images created with DALL-E 3 had some watermarks applied by OpenAI; however, the company stated that these could be readily deleted. In the meantime, Meta declared that it would use tiny hidden markers to detect and label photos taken on its platforms by other companies’ AI services. Aware of the opportunities and risks associated with AI-generated video and audio, Meta is also dabbling in this area.

Creating accurate and realistic images that closely matched the given prompts was one of DALL-E 3’s greatest skills. The seamless blending of linguistic and visual creativity is made possible by ChatGPT, which adds another level of versatility to the product.

Conversely, Midjourney, an established player in the AI art field, demonstrated its prowess in producing wacky and inventive images. It may not have consistently captured the intricacies of the immediate elements as well as DALL-E 3, but it prevailed in terms of visual appeal and subtlety. It’s important to keep in mind, though, that the comparison relied on particular prompts and criteria, and that assessments may differ depending on other circumstances or standards.

In the end, the assessment is determined by the user’s choices and particular needs. Based on the comparison offered, DALL-E 3 may be deemed better if speed, accuracy, and ease of use are of the utmost importance. Midjourney, however, may be chosen if a more sophisticated feature and an aesthetically pleasing result are required.

Recently, OpenAI unveiled Sora, the Japanese word for “sky,” an AI tool that can produce videos up to a minute using short text prompts. In essence, you tell it what you want, and Sora transforms your concepts into visual reality. In a recent blog post, OpenAI described how Sora works, stating that it transforms these inputs into scenes complete with people, activities, and backgrounds.

Before the release of OpenAI, tools like Runway (Runway ML), which debuted in 2018, dominated the market and gained traction in the amateur and professional video editing sectors for some years.

Runway’s Gen-2 update has enabled the release of numerous new features over the past year, including Director Mode (a feature to move perspective like you were using a camera). However, because Pika Labs has primarily run on its own Discord server, it has evolved along a route more similar to Midjourney, and it was considered one of the most promising AI applications for generative video. Most importantly, with the release of the Pika 1.0 update, its Camera Control (pan, zoom, and rotate) features have elevated it to the status of one of the greatest real idea-to-video AI solutions available until the release of OpenAI’s tool.

In fact, in addition to creating videos, Sora can also enhance still photos, make videos longer, and even repair missing frames. Examples from OpenAI’s demonstration included a virtual train ride in Tokyo and sights from the California gold rush. Additionally, CEO Sam Altman released a few video clips on X that Sora created in response to user requests. Currently, Sora is only available to researchers, visual artists, and filmmakers through OpenAI. To ensure that it complies with OpenAI’s guidelines, which prohibit excessive violence, sexual content, and celebrity lookalikes, the tool will be tested.

“The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” said OpenAI in a blog post.

“Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions,” said OpenAI on X.

“One obvious use case is within TV: creating short scenes to support narratives,” said Reece Hayden, a senior analyst at market research firm ABI Research. “The model is still limited, though, but it shows the direction of the market.”

Sure, it looks amazing at first, but if you pay close attention to how the woman moves her legs and feet during the minute-long footage, several major issues become clear. She slightly switches the positions of her entire legs and feet between the 16 and 31-second marks. Her left and right legs altered positions entirely, demonstrating the AI’s poor knowledge of human anatomy.

To be fair, Sora’s capabilities are light years beyond those of previous AI-generated video examples. Do you recall that awful AI clip when Will Smith was enjoying a dish of pasta and, horrifyingly, merging with it? Less than a year has passed since then.

Furthermore, even though the company’s most recent demonstration shocked some, generative AI’s limits are still evident.

Over the next few years, we will see the ability of AIs to generate increasingly accurate videos steadily improve. Thus, the future of cinema could have new tools, but it would also open up a new possibility for audiobooks that could also be narrated with a graphical representation. As we previously discussed in this regard, though, there are also many problems related to the creation of fake videos that could generate evidence of facts that never happened.