The next step to art creation by AIs
Recently, we are seeing ever more how the power of AIs in different fields can accomplish different tasks though not always perfectly but with amazing results. Just think about the latest AI image generator tools that are spreading the internet. Day after day they can produce beautiful images even copying famous artists’ styles.
Meta is now trying to do a step forward with a tool able to generate videos through Artificial Intelligence. Its new tool called Make-A-Video is available via Twitter. Although the results may look pretty weird, it would be no surprise if AI video-generation tools would overtake AI image-generation tools as a new trend.
However, achieving good results is not as easy as for images. An animation needs a higher degree of coherence between frames and the ability to make subjects interact and move accordingly. That’s why the error rate rises. In addition, video generation needs much more data to draw from.
According to the research paper, the Meta team used an evolved version of diffusion’s Text-to-image generation model to animate images although the lack of large datasets with high-quality text-video pairs is still a problem due to the complexity of modeling higher-dimensional video data because text-to-video AI models need to be trained by huge datasets that are too large compared to those of images.
To generate images, diffusion models begin with noise that is generated randomly, and then they gradually adjust it to get closer to the goal prompt, but the quality of the training data has a significant impact on how accurate the outcomes are.
But the amazing thing about the Meta algorithm is that doesn’t need paired text-video data and therefore doesn’t require too much data to work.
Currently, Make-A-Video generates silent clips made up of 16 frames generated at 64 x 64 pixels, which are subsequently upscaled to 768 x 768 pixels using another AI model. They barely last for five seconds and only show one action or scene.
According to Meta, Make-A-Video’s AI learned “what the world looks like from paired text-image data and how the world moves from video footage with no associated text”. It was trained using more than 2.3 billion text-image pairs from the LAOIN-5B database and millions of videos from the WebVid-10M and HD-VILA-100M databases.
Meta claims that static images with paired text are enough for training text-to-video models since they may be used to infer movements, activities, and events. In a similar way, even without any text describing them, “unsupervised videos are sufficient to learn how different entities in the world move and interact”.
The researchers acknowledged that like “all large-scale models trained on data from the web, [their] models have learned and likely exaggerated social biases, including harmful ones”, but claimed to have done what they could to control the quality of the training data by filtering LAOIN-5B’s dataset of all text-image pairs that contained NSFW content or toxic words. One of the main problems in the industry is preventing AIs from producing insulting, false, or dangerous content.
Anyway, the results look like stop-motion videos with some glitches that make them seem surreal or dreamy.
The tool can be applied in a few different ways, such as to give motion to a single image, to fill in the gaps between two photos, or to create new iterations of a video based on the original.
It’s not hard to imagine a future where our stories could come to life in a movie completely generated by an A.I. where not only images but also music and dialogs are created by an algorithm. That would be amazing for those who would like to have the opportunity to see what their stories would be like. But some creators may be worried this technology could steal their creativity. However, these tools could integrate with the existing creative processes adding new styles. Nonetheless, when the quality becomes hyperrealistic, it may happen anyway but the major problem will be dealing with media that look so realistic that they could be taken for real with all the risks associated.