Mastering DALL-E 2. A Complete Guide to Writing Precise Prompts
DALL-E 2 is a generative AI that will create imagery based on whatever prompts you throw at it. In fact, you can send DALL-E
DALL-E 2 is a generative AI that will create imagery based on whatever prompts you throw at it. In fact, you can send DALL-E any communication, from an emoji to a detailed description and it will create a photo that looks some type of good. There are plenty of “happy little accidents”, by which the AI will create imagery that makes little sense and little to do with the prompt. That’s the way of new and experimental tech. When you are looking for an artistic touch, these mishaps are refreshing. When you are looking to deliver according to a client’s brief, AIs will get you to grind your teeth into fine paste.
The issue though is to get precisely what you’re looking for, not a generally intriguing image. While AIs understand natural language, their interpretation of prompts is obscure enough to have quite a high rate of hit-and-miss.
As any commercial creator knows, the briefs from clients can be very picky about the fine details of their material. If you’ve ever been sent back to the drawing board over slight variations in palette or how curved the lines feel, you know what I mean.
AIs are having a hard time generating images that are true to brief. You can end up spending up to 100 dollars trying to get the precise image that you want. That’s where prompt engineering comes in. So far, I’ve personally never got every single aspect of a prompt reflected in the generated image. The issue is not the prompting itself, it’s the stage of the tech. There is some amount of miscommunication with the AI that feels unavoidable. The community is just now figuring out how the AI wants to be briefed.
This being said, you don’t need to start from scratch. Some of the stuff has already been figured out.
The Issue of Miscommunication That You Need to Understand Before Moving Forward
AIs are made to understand natural language, which is precisely the innovating aspect of this technology. ChatGPT, the chit-chatty product of OpenAI, will pass the Touring test with flying colors. Of course, its visual counterpart understands language well enough. Here’s where the misfire happens.
DALL-E has not explicitly been “taught” anything, like what a banana looks like, who Leonardo DaVinci is and what defines pop art.
The AI has studied 650 million images with and without captions and was left to draw its own conclusions. Programmers gave the AI a lot of resources, left it to its own devices and came back to find the AI had “learned” to recreate what it had seen. Precisely how it did that is not a complete mystery, but it’s not entirely clear either. Some of the data that the AI was trained on was labeled, but a lot of it was unlabeled, which means the AI had to connect the dots.
This means the software might have developed some biases.
What Biases?
Imagine I label most photographs of cats as “majestic.” The AI doesn’t necessarily know which part of the photograph is majestic, but is as smart as any toddler its age and will find its own patterns. The next time you try to make your client’s logo look majestic, it will have whiskers and knock stuff off your desk just to piss you off.
This is the issue of AIs developing biases. They learn by association. We are not precisely sure what are the kinds of associations they make. A great majority of them are obviously very on-point. Others are slightly misplaced. Sometimes you will discover words that trigger it to spawn higher quality images with codewords that the AI misassociated.
“For instance, defining a camera or lens (Sigma 75mm) doesn’t just create that specific look”, it more broadly alludes to the kind of photo where the lens/camera appears in the description, which tend to be professional and hence higher-quality” - Dallery Gallery explain in an 82 pages presentation on prompting.
Get creative and explore what other biases you might discover. Think of it in terms of commonalities and agreed-upon standards.
The entire AI community is throwing test prompts at DALL-E to explore what it can do. There is no cut and dry recipe. The best way to deal with these amazing tools is to explore on your own.
The Essence of a Prompt
Engineering a prompt requires you to steer the AI in the right direction with cue words and a bit of structure. Most users will tell the AI what are the elements that they want depicted, which is great, but there are infinite ways to skin a cat. Or draw it.
Here are a few guiding elements you should have in your prompt:
- What is the destination of your image? Is it a product image? An ad banner? A portrait with photographic qualities? A logo? DALL-E knows how to differentiate between the different destinations of a graphic element.
- To capture the essence of the photograph, add cues that go beyond the practicalities of what the picture shows. You will be surprised at how well DALL-E makes sense of mood, emotion, vibe. Pictures can be blissful, lively, harmonious, zen, chilling, refreshing, ominous. Dig deep for emotional cues and test out multiple versions.
- Tell the AI how to layer your image, foreground to background. What are the main elements at the center? What does the viewer notice first? Do you want a white plain background or an entire setup?
- Give the AI technical requirements, from format to lens specifications to composition and art style.
Unlocking Technical Codewords. Language That Helps
The labels is where the AI learned to recognize and associate imagery with moods, cues about image quality, style and the actions described.
DALL-E was trained on artistic material - both images and text captions. It speaks the lingo of the artistic field. This means it will understand artistic styles, art movements, techniques, all the technical language in general.
Borrow technical languages from the art world in order to reproduce a specific aesthetic.
- Explore photography technical language: exposure, depth of field, types of composition. Basically learn what the settings of a camera are and go down a rabbit hole from there. Deep dive into painting: brushwork, color palettes, genres and styles, textures. Understand lighting, perspective, color.
- Research and understand art movements. Learn to associate historical timeline with style.
- Have a portfolio of favorite artists, classical and modern. Look for artists that inspire you and match the style you want.
- Get inspired by other visual mediums: film, gaming art, sculpture, costume design and fashion, architecture, home design, city planning, theater, arts and crafts.
Creating Commercial Imagery
Codewords for Logos
Logos tend to be simple, clean, with clearly-defined areas and a silhouette that you can easily identify. Most AIs, DALL-E included, will have trouble delivering the simplistic and clean imagery without being specifically prompted to simplify the imagery.The whole purpose of a logo is that it can be discerned from a distance and stay recognizable to all kinds of sizes. For that reason, logos don’t mix well with shading and complexity. The important thing to get your AI to do is get it to dull down that precise complexity.
Test the following cues: symmetrical, flat, isometric, sleek, simple, 2D, minimalistic, sleek, uncluttered, negative space, minimal color palette, clean lines, iconic, monochromatic, scalable, iconic, contrast, understated, memorable, stands out, clear hierarchy, geometric, restrained color palette, sharp shadows.
Codewords for Product Pictures
There is a certain protocol that we have with our product pictures, and it works. We want the product to be depicted clearly, in a flat light, on a neutral (in not white) background. These are very functional pictures and we don’t want them to be misleading. You will want the AI to accomplish three things: make a high-quality and detailed photograph, have a good light setup (preferably natural-looking noon light), the product at the front-and-center, and a background that doesn’t distract from the product.
Test the following cues: natural light, soft light, arrange the product in an aesthetically pleasing way, considering the rule of thirds and visual balance, clean background, {{color}} background, sharp focus, accurate colors, depth of field, angle {{type: eye level, close-up, high angle, low angle}}, macro, lifestyle, attention to scale.
Codewords for photorealism
A big part of an advertiser’s work is finding or creating beautiful background imagery that sets up the right mood. Businesses want to give their products context. Generally, smaller businesses will buy general-purpose imagery off of shutterstock or snap a few pictures at the office. AIs will get every mom and pop store a great chance too generate unique and stunning photographs that differentiate their business. To get good commercial background photography, prompt DALL-E to deliver two things: high level of quality and the sentiment of the photo.
Test the following cues to set a mood: Blissful, optimistic, hopeful, cheery, lively, vibrant, upbeat, playful, festive, charming, serene, tranquil, peaceful, calm, harmonious, zen, refreshing, radiant, captivating, mesmerizing, animated, dynamic, radiant, effervescent, bubbly, charismatic
Test the following cues to get quality photographs: photorealistic, high quality, detailed, lifelike, authentic, natural, accurate, sharp, clear, crisp, precise, vivid, true-to-life, professional photograph, polished, expertly-captured, enhanced, dynamic shot, realistic textures, shadows and reflections, micro-details, convincing depth-of-field, realistic environmental context, convincing sense of scale.
In Closing
AIs of all types do a great job of understanding context. When I first generated an AI image, I took a break just to hold on to the feeling of awe. I can only be grateful that my first experience of generating an AI picture happened to not be an irrelevant abstract blob, as it sometimes happens.
When AIs fail, which they do, the results are absolutely hilarious. I think it’s a great reminder that beyond the mind-blowing sophistication, AIs are still machines with buttons that you need to push right. We are in the process of learning those buttons, throwing things at the AI to see what sticks. With the ever-evolving algorithm, recipes are a rough approximation of what might taste good. Enjoy it!