Google's Parti: the evolution of text-to-image generators

Ever more realistic images

Google introduced a new text-to-image generator called Parti which stands for Pathways AutoRegressive Text-to-Image. This new model uses a new technique that helps generate images that more closely match the text description of the user.

anubis — *A portrait of a statue of the Egyptian god Anubis wearing aviator goggles, a white t-shirt, and a leather jacket. The city of Los Angeles is in the background.*

In Parti’s approach, a collection of photos is first transformed into a series of code entries that resemble puzzle pieces. Then, a new image is created by translating a supplied text prompt into these code entries. This method is essential for handling lengthy, complicated text prompts and producing high-quality images because it uses current research and infrastructure for large language models.

DALL-E instead uses a diffusion-based model, which starts from a noise image and then organizes pixels to generate an image according to our description.

Parti allows high-fidelity photorealistic image generation with the ability to scale from 350M to 20B parameters.

Using long and complex prompts can also generate scenes that have never been seen. This also allows the deployment of many participants and objects with fine-grained details and interactions and to adhere to a specific image format and style.

A teddy bear wearing a motorcycle helmet and cape is driving a speed boat near the Golden Gate Bridge

However, similar to all other text-to-image generators, Parti suffers from a variety of issues. The list may go on and on and include inaccurate object counts, blended features, relational placement or size, improper handling of negation, etc.

Anyway, these models are quickly getting better; therefore, soon we could see extremely accurate photos and drawings created perfectly through a description. Then the same process could happen with videos: a script could directly become a movie. Maybe with different tools, but it will be a completely new approach to creativity.

>>> How easy it is to implant false memories in our minds

Text-to-image models are inspiring resources for creativity, but they also carry dangers relating to bias, false information, and safety. There are debates about ethical A.I. practices and the actions that must be taken in order to advance this technology safely. That’s why they use watermarks that are simple to spot as a first step to make sure that anyone can always tell when an image was created using this tool.

Art will inevitably change in the future. What will happen to artists? Are they going to be skilled descriptors? Or will they be able to mix this new technology with their own art to create a new one? It’s hard to tell but these algorithms will surely be useful for those who can’t draw or paint anything. A writer, for example, will be able to depict his book with nice and accurate images without an illustrator. Will it be a good thing?

Google’s Parti: the evolution of text-to-image generators

Ever more realistic images

Dan Brokenhouse

Leave a Reply Cancel reply

Press ESC to close

Ever more realistic images

Share Article:

Dan Brokenhouse

Dead spiders used as necrobots

Amazon will let you speak with the ‘dead’

Leave a Reply Cancel reply