Exploring prompt-based generative AI
Like thousands of others, over the last few weeks I've been exploring the possibilities of prompt-based generative AI systems as a creative medium. The main tools I've been working with are MidJourney and Stable Diffusion (via DiffusionBee). I've also played with RunwayML, Leonardo and DALL•E 2.
Each system has its own benefits and quirks, but the way they all work is essentially the same: you enter a text description of the image you want generated, wait for a short time (from 10 seconds to 1 minute or so) and the image appears. Tools such as MidJourney have additional features where you can include images in the prompts or blend different images together.
The results produced using these tools are, at first, impressive. In particular, the technical quality and fidelity of MidJourney makes it almost impossible to generate a poor quality image (however the image may not be what you imagined when you wrote the prompt). It feels quite empowering to be able to summon up an image of anything you can express in words in just an instant.
MidJourney also has a "describe" command that lets an AI describe any image as a prompt. You get four alternatives that you can then use to synthesise new images based on the prompts generated. Using an image from the Megaforms series as an example, here is the original image and some examples of what was synthesised from the resultant prompts:
Here's the AI's descriptions expressed as prompts:
The text for each description:
- a black and white image of a sunflower root, in the style of hyperrealistic marine life, darkroom photography, large-scale sculpture, radiant clusters, spiky mounds, bentwood, human anatomy
- a close up of a flower with many sticks in it, in the style of darkroom printing, human anatomy, sculpted, spiky mounds, tabletop photography, ambient occlusion, biomorphic
- a close view of a rotting sunflower, in the style of ambient occlusion, biomorphic abstraction, darkroom printing, kinetic installations, human anatomy, infinity nets, shot on 70mm
- san lorenzo cactus t, black & white, 3d, in the style of reduction of canine anatomy, darkroom photography, scattered composition, patrick dougherty, detailed anatomy, rusty debris, willem claesz. heda
An interesting feature is what I would call "style theft" in that the AI suggests similar visual styles from popular artists, photographers, illustrators and designers.
It also seems incapable of any biological accuracy in determining the species or even the common name.
/describe command is an interesting feature.
The images below are examples generated with these AI interpreted prompts:
While each of the AI's interpretations are interesting and capture something of the original appearance of the photograph, they seem more like fantasy versions that lack the intent and qualities of the original image.
MidJourney uses Discord as its interface, and, unless you pay for the top tier, all your prompts and the images they produce, are public. Once you've generated an image, you can search for "similar" images – quite a useful feature for learning the language necessary to get certain visual styles or compositional effects. After some more experimentation, I managed to change the original prompts into something quite visually different.
As someone who has spent much of my working life involved in the computer synthesis of images, the complexity and "otherworldly" nature of these images is very impressive. The fact that I could "find" them so easily made me wonder how hard it is generate any subject in an image. However, my feelings of uniqueness quickly fell away when I looked at the "similar images" produced by other users of the system. There were pages and pages of similar – and often better – images than what I had managed to produce from my simple prompting. Getting what you want through a simple language description is often difficult. The initial feelings of empowerment soon changed to disappointment for the opportunities these systems present as a new creative medium. If anyone can generate visually rich and complex imagery just by knowing a few keywords, where does that leave the techne, skill and embodied knowledge of making?
After more experimentation, I managed to generate a series of images that "similar images" did not present visually or conceptually similar results made by other users of the system. I called this series, The World We Made, as a reflection more about the nature of prompt-based imagery and biases than any serious use as an artistic medium.
As it turned out, it is relatively easy to make something different than the vast majority of the banal, sexist and derivative imagery shown on MidJourney's showcase. However, any uniqueness is largely inconsequential – all images made by generative AI are derivative. Derivative of the billions of hours of work of past human art and culture, rendered flat into pixels by a neural network. In this sense, I find them morally and ethically questionable as creative tools.
We are living in a golden age, a new gilded age where words can conjure up images of almost anything, provided someone else has already imagined and probably laboured to make it real. There is no need to spend your life learning a skill, training your hand and eyes, understanding the nuances of lighting, cinematography, tone, value or representation in the hope that you may one day produce something truely unique and innovative. Sit back and let the machines do the work for you as they parasitically feed on prior human labour.
What is the meaning of an image and the role of image makers if we can synthesise any image via machine learning? AI systems know the cost of everything but the value of nothing. If it costs virtually nothing to create any image previously imagined, what value can an image possibly have anymore?