Ever more realistic images

Google introduced a new text-to-image generator called Parti which stands for Pathways AutoRegressive Text-to-Image. This new model uses a new technique that helps generate images that more closely match the text description of the user.

anubis
A portrait of a statue of the Egyptian god Anubis wearing aviator goggles, white t-shirt and leather jacket. The city of Los Angeles is in the background.

In Parti’s approach, a collection of photos is first transformed into a series of code entries that resemble puzzle pieces. Then, a new image is created by translating a supplied text prompt into these code entries. This method is essential for handling lengthy, complicated text prompts and producing high-quality images because it makes use of current research and infrastructure for large language models.

DALL-E instead, uses a diffusion-based model, which starts from a noise image and then organizes pixels to generate an image according to our description.

Parti allows a high-fidelity photorealistic image generation with the ability to scale from 350M to 20B parameters.

It can also generate scenes that have never been seen using long and complex prompts. This also allows to deploy many participants and objects, with fine-grained details and interactions, and to adhere to a specific image format and style.

teddy
A teddy bear wearing a motorcycle helmet and cape is driving a speed boat near the Golden Gate Bridge

However, similar to all other text-to-image generators, Parti suffers a variety of issues. The list may go on and on and include inaccurate object counts, blended features, relational placement or size, improper handling of negation, etc.

Anyway, these models are quickly getting better, therefore soon we could see extremely accurate photos and draws created perfectly through a description. Then the same process could happen with videos: a script could directly become a movie. Maybe with different tools but it will be a completely new approach to creativity.

>>>  DeepMind’s Gopher surpassed GPT-3

Text-to-image models are inspiring resources for creativity but they also carry dangers relating to bias, false information, and safety. There are debates about ethical A.I. practices and the actions that must be taken in order to advance this technology safely. That’s why they use watermarks that are simple to spot as a first step to make sure that anyone can always tell when an image was created using this tool.

Art will inevitably change in the next future. What will happen to artists? Are they going to be skilled descriptors? Or will they be able to mix this new technology with their own art to create a new one? It’s hard to tell but these algorithms will surely be useful for those who can’t draw or paint anything. A writer, for example, will be able to depict his book with nice and accurate images without an illustrator. Will it be a good thing?