The new OpenAI algorithm is amazingly scary

In January 2021, OpenAI introduced DALL·E, an amazing algorithm able to generate drawings and pictures from simple text prompts. In 2022, their newest algorithm, DALL·E 2, can generate more realistic and accurate images with 4x greater resolution.

The DALL-E 2 algorithm from OpenAI can turn a simple text into a full-fledged work of art. There’s no limit, only the user’s imagination.

What’s impressive is that the A.I. system recognizes where to position an element in an image and what a photorealistic image is, including the ability to add and remove components while taking into account shadows, reflections, and textures but also making variations of the same image.

In this first example, DALL-E created these images from the following descriptions, combining concepts, attributes, and styles: “An astronaut lounging in a tropical resort in space in a vaporwave style” and “An astronaut playing basketball with cats in space as a children’s book illustration”.

In this second example, DALL-E positioned the flamingo in three different places taking into account shadows, reflections and textures for each position.

In the third example, DALL-E took the first original image as an inspiration to make a variation of it in the second one.

The name of this algorithm is a combination of the name of the Spanish artist Salvador Dalí and the Pixar character WALL-E. Anyway, DALL·E 2 is arousing mixed feelings: some are amazed and others are scared about its potential. The tool’s ability to accurately convert text prompts into graphics is very remarkable.

How is it possible? According to OpenAI, a process called “diffusion” starts with a pattern of random dots and gradually alters that pattern towards an image when it detects specific characteristics of that image. Through this process, DALL·E 2 has “learned the relationship between images and the text used to describe them”.

It’s also interesting to see how users can try to deceive the model’s recognition capabilities by identifying one object (such as a Granny Smith apple) with a name that denotes something else (like an iPod). Despite the high relative predicted probability of this caption, the algorithm nevertheless generates photos of apples with a high probability, even when using a mislabeled image, and never produces pictures of iPods.

DALL-E apple labeled ipod

At the moment, the tool unavailable to the public but you can join a waitlist to request to be included to a group of selected users that can test the algorithm.

However, OpenAI is worried about a possible bad use of this tool, therefore they made it unable to generate real faces, or create NSFW (not safe for work) images like violent, hate, or adult images.

Shiba Inu dog wearing a beret and black turtleneck
Image from the description: “Shiba Inu dog wearing a beret and black turtleneck”. Photo by OpenAI.

These A.I. tools can be seen as an opportunity to create new forms of art, mixing existing techniques with technology but for some these algorithms represent a threat to artists because the risk is to be completely replaced by an A.I. In addition, another fear lies in the potential of deception of photos that could represent something fake but that you can’t recognise as such.

DALL-E 3

dall-e 3

The most recent version of the algorithm, DALLE 3, produces images that are both more visually appealing and more clear in detail when compared to its predecessor. Text, hands, and faces can all be rendered accurately by DALLE 3 in even the most complex scenes. Additionally, it can accommodate both landscape and portrait aspect ratios and excels at reacting to lengthy, comprehensive directions. To provide richer textual descriptions for the images that served as the training data for our models, they trained a cutting-edge image captioner. After training DALLE-3 on these enhanced captions, a model that pays substantially more attention to user-supplied captions was created.

Here are the 10 most important differences between DALL-E 2 and DALL-E 3:

  1. Resolution: DALL-E 3 produces images with a resolution of 1024×1024 pixels, which is twice as high as the 512×512 pixel resolution of DALL-E 2’s images;
  2. Diffusion: Whereas DALL-E 2 uses a discrete variational autoencoder (VAE) for image compression and decompression into discrete latent codes, DALL-E 3 uses a diffusion model to produce images from noise;
  3. ChatGPT Integration: The OpenAI conversational AI system ChatGPT is integrated with DALL-E 3. ChatGPT will create specially crafted prompts for DALL-E 3 to help bring those ideas to life;
  4. Prompt Adherence: Unlike its predecessor, DALL-E 3, it is excellent at adhering to challenging prompts;
  5. Text Generation: Creating text for labels, signs, logos, or captions within images has been improved in DALL-E 3;
  6. Diffusion Model: DALL-E 3’s diffusion model increases flexibility and expressive potential and makes it capable of handling complicated landscapes and textures;
  7. Human Details: Compared to DALL-E 2, DALL-E 3 produces more realistic human details;
  8. Engaging Images: Compared to DALL-E 2, DALL-E 3 produces images that are more interesting;
  9. Safety Mitigations: Compared to its predecessor, DALL-E 3 offers better safety mitigations;
  10. Creative Control: Compared to its predecessor, DALL-E 3, users have more creative control over the images produced.

Source creativityblog.org