When machine learning meets surreal art meets Reddit, you get DALL-E mini
DALL-E mini is the AI that brings to life all the goofy “what if” questions you’ve ever asked yourself: what if Voldemort was a member of Green Day? What if there was a McDonald’s in Mordor? What if scientists sent a Roomba to the bottom of the Mariana Trench?
You no longer have to wonder what a Roomba cleaning the bottom of the Mariana Trench would look like. DALL-E mini can show you.
DALL-E mini is an online text-to-image generator that has exploded in popularity on social media in recent weeks.
The program takes a text phrase – like “sunset over the mountain”, “Eiffel Tower on the moon”, “Obama making a sandcastle” or anything you could imagine – and creates an image of it.
The results can be strangely beautiful, like “synthwave buddha,” or “a chicken nugget smoking a cigarette in the rain”. Others, like “Teletubbies in a nursing home”, are truly terrifying.
DALL-E mini gained internet notoriety after social media users began using the program to smash recognizable pop culture icons into bizarre, photorealistic memes.
Boris Dayma, a computer engineer based in Texas, originally created DALL-E mini as part of a coding contest. Dayma’s program takes its name from the AI on which it is based: Inspired by the incredibly powerful DALL-E from artificial intelligence company OpenAI, DALL-E mini is essentially a web-based application that applies similar technology from a more easily accessible way. (Dayma has since renamed DALL-E mini to Craiyon at the company’s request).
While OpenAI restricts most access to its models, Dayma’s model can be used by anyone on the internet, and it was developed in conjunction with the AI research communities on Twitter and GitHub.
“I would have great feedback and suggestions from the AI community,” Dayma said. NPR said on the phone. “And it got better and better” at generating images, until it hit what Dayma called “a viral threshold.”
While the images produced by DALL-E mini may still appear distorted or unclear, Dayma says that it has reached a point where the images are still good enough, and that it has reached a large enough audience, that the conditions were together to make the project go viral.
Learning from the past and a complicated future
While DALL-E mini is unique in its widespread accessibility, this isn’t the first time AI-generated art has made the news.
In 2018, art auction house Christie’s sold an AI-generated portrait for over $400,000.
Ziv Epstein, a researcher at the MIT Media Lab’s Human Dynamics Group, says the advancement of AI image generators is complicating notions of ownership in the art industry.
In the case of machine learning models like DALL-E mini, there are many stakeholders to consider when determining who should be credited for creating an artwork.
“These tools are these diffuse socio-technical systems,” Epstein told NPR. “[AI art generation is a] complicated arrangement of human actors and computational processes interacting in this crazy way.”
First, there are the coders who created the model.
For DALL-E mini, this is mainly Dayma, but also members of the open source AI community who collaborated on the project. Then there are the owners of the images the AI was trained on – Dayma used an existing image library to tweak the model, essentially teaching the program how to translate text into images.
Finally, there’s the user who came up with the text prompt – like “CCTV footage of Darth Vader flying a unicycle” – for DALL-E mini to use. It is therefore difficult to say who exactly “owns” this image of Gumby performing an NPR Tiny Desk concert.
Some developers also worry about the ethical implications of AI media generators.
Deepfakes, often convincing applications of machine learning models to render fake images of politicians or celebrities, are a major concern for software engineer James Betker.
Betker is the creator of Tortoise, a text-to-speech program that implements some of the latest machine learning techniques to generate speech from a reference voice.
Initially starting Tortoise as a side project, Betker said he was not motivated to continue developing it due to its possible misuse.
“That’s what worries me the most – people trying to get politicians to say things they didn’t really say, or even to make affidavits that you take to court… [that are] completely wrong,” Betker told NPR.
But the accessibility of open-source AI projects like those by Dayma and Betker has also produced positive effects. Tortoise has given developers who can’t afford to hire voice actors a way to create realistic voiceovers for their projects. Likewise, Dayma said small businesses used DALL-E mini to generate graphics when they could not afford to hire a designer.
The growing accessibility of AI tools could also help people learn about potential threats from AI-generated media. For Dayma and Betker, the accessibility of their projects clearly shows people the rapidly evolving capabilities of AI and its ability to spread misinformation.
MIT’s Epstein said the same thing: “If people are able to interact with AI and be a creator themselves, in some way, that inoculates them, maybe, against misinformation. ”
Copyright 2022 NPR. To learn more, visit https://www.npr.org.