On April 6, 2022, Open AI released the image “An Astronaut Riding a Horse” on Instagram (see Figure 1) and threw open the whole debate about whether artificial intelligence (AI) is about to make graphic design a dying art. According to Open AI, “DALL-E 2 can make realistic edits to existing images from a natural language caption. It can add and remove elements while taking shadows, reflections, and textures into account.” DALL-E 2 can produce images just from the text and image prompts it receives. As Open AI puts it, DALL-E 2 understands the association between images and the text we use to describe them by utilizing ‘diffusion,’ a method that begins with a pattern of random dots and gradually alters that pattern to create a recognizable image when it identifies specific aspects of that image. Graphic designers should be concerned. Text-to-Image AI opens up a world of imagination to anyone with access to DALL-E 2 or many of the other text-to-image generators popping up on the internet, like MidJourney, which I recently used to give me ideas for a book cover for my upcoming novel.
Source: Open AI
Transcendent Beauty as a Service
Text-to-image AI is a technology that utilizes natural language processing (NLP) or machine learning algorithms to glean the knowledge that humans have about their visual world to create AI systems with similar capabilities. In his Fanatical Futurist articleThis astronaut riding a horse shows AI is getting better at creating synthetic content, Matthew Griffin explains that DALL-E 2 uses an OpenAI’s language model that “can pair written descriptions with images, to translate the text prompt into an intermediate form that captures the key characteristics that an image should have to match that prompt.” DALL-E 2 then runs a diffusion model that has been trained on images that have been completely distorted with random pixels in an attempt to generate an image that satisfies the language model. The systems learn to convert these images back into their original form. “In DALL-E 2, there are no existing images. So the diffusion model takes the random pixels and,” transforms it into a brand new image, built from scratch but matching the original text prompt, explains Griffin.
With text-to-image generators, a neural network learns how to predict the appearance of an object given its features. “One way you can think about this neural network is transcendent beauty as a service,” says Ilya Sutskever, cofounder and chief scientist at OpenAI. “Every now and then it generates something that just makes me gasp,” she claims. Open AI is currently limited to the use of DALL·E 2, which it calls a research project they want to deploy responsibly, but Dall-E mini is available at Huggingface.
MidJourney to the Center of Creativity
Another text-to-image AI system, MidJourney, creates images from user prompts, but this one runs through the Discord server, so you need an account there to access Midjourney’s system. But it’s well worth the trouble for what you get. MidJourney calls itself, “An independent research lab. Exploring new mediums of thought. Expanding the imaginative powers of the human species.” Venturing onto the platform is like going down a rabbit hole of creativity that can suck away hours of your life. Five minutes on the MidJourney discord thread reveals an endless array of photo-realistic landscapes, robot dreams, cool-looking steampunk contraptions, fantasy dragons flying through fiery skies, colorful almost photorealistic flowers, and rain-drenched neon-lit, Blade Runneresque street scenes.
As a novelist who is soon to have his first novel traditionally published, I’ve been worried that the publisher won’t be able to capture the essence of my story, which is set in a place few people visit – Macau – and in an industry few people know anything about – casino junket operators – so I ran some ideas through MidJourney. I used the prompt: “Macau cityscape in a dark moody landscape with the red neon lights of a casino and a man with a gun and a beautiful woman in a red dress, matte painting, daytime, atmospheric, highly realistic, Blade Runner dark mood, style baroque.” The system provided four options (see Figure 2). MidJourney allows you to delve deeper into any one of the options if you want to. Variations on any of them are also an option.
Figure 2: MidJourney images
I had MidJourney touch-up version four (bottom right) and it came up with Figure 3.
MidJourney completes its process in about a minute. There is a feature that allows you to add images to the drawing and the system will take inspiration from any image you point to. To create my book cover potentials, the entire MidJourney process took about five minutes, half of which was the rendering time the AI system requires to collect the images and build the visual landscape from them. None of these images will be my final book cover, but these designs capture the mood, style, and images I want in the final version so this should help me avoid going down too many creative roads that are destined to lead nowhere.
In one sense, MidJourney acts as a graphic designer’s mood board or a costume designer’s lookbook, offering up, in some cases, cliched ideas, but in many other cases, far-flung and obscure ideas that a designer might never have come up with or even been able to verbalize. It’s one thing to say I want a “Macau cityscape in a dark moody landscape with the red neon lights in an atmospheric, highly realistic, Blade Runner baroque style” and something entirely different to point to Figure 3 and say, “These are the elements I am looking for.”
“An astronaut riding a horse,” “A Renaissance painting of a cat working at McDonald’s,” “Two skydivers play a game of Jenga”, “Mosh pit in a daycare”, “A cinematic photo of an astronaut nun” – all of these are text prompts that have produced highly accurate pictures that are stunning in one sense, amusing in another. As one Twitter user put it, “I’m grateful for dall-e because now people don’t have to start their own infant fight clubs to get images like this.” This is in jest, of course, but it does bring up a good point about these text-to-image systems: there are no boundaries to what can be created besides the typical standards of decency normal websites force you to create under. Space, time, age, fame, game, century, and location have no meaning when the act of creation is simply a few words away.
Edward Steichen, the man whom the New York Times claimed transformed photography into an art form, once said, “A portrait is not made in the camera but on either side of it.” This is a good way to look at what DALL-E 2 and MidJourney are doing. His life story is rather instructive too. In his obituary, the New York Times wrote, that his first photographs were so bad that they were almost his last. He had purchased a Kodak and took 50 pictures, mostly of subjects around his house. When developed, only one picture was deemed clear enough to print. Steichen’s father thought one picture out of 50 was a money-losing proposition, but his mother said the picture was so beautiful it was worth all the other failures.
Is text-to-image AI the end of graphic design as we know it?
Maybe, maybe not. It’s hard to tell at this point, but it’s inarguable that with systems like DALL-E 2, DALL-E mini, and MidJourney, the machines are gunning for the job. AI tends to improve the larger the datasets get and with systems like DALL-E mini a free offering and MidJourney priced at a very reasonable $10/month for its low-tier offering, they will get plenty of use. The more people use these system, the better they will get at understanding what their human overlords want.
At the very least, MidJourney is a wonderful tool for students to learn about unique concepts of art. One user requested a “steampunk tardigrade airship with hanging paddleboat, gouache, jean giraud, william stout, yves chaland, isabelle beaumenay-joannet, akira toriyama, geof darrow, chiarascuro, yoji shinkawa, ross trans, teal, lemon, bluetiful, cadmium, carmine, lavendar, lilac, tan, detailed, delicate, ivory, coral, comic art” and, in this case, result was actually quite impressive. However, these AI tools are wondrous things, but we should always keep in mind the author Jane Lynch’s warning that “Artists are free to push boundaries to make art. But when pushing boundaries is their only aim, the result is usually bad art.”