Beware stupid use of AI
Zero-shot prompting on MidJourney
Strong words, but I find that I do not have patience for journalists who try to sensationalize something that they do not either understand or are too lazy to even try to understand.
This piece on the The Verge titled “I’m still trying to generate an AI Asian man and white woman” is a prime example. The issue here are the supposed “biases baked into training sets” of image generators.
The reporter is concerned that “Image generators, from DALL-E to Midjourney, consistently have trouble creating accurate pictures based on simple prompts involving Asian people.” Reading this article leaves me with the impression that she’s upset that the image generators did not generate the kind of image she had in mind when she put in that simple prompt. The magic somehow failed.
The prompt that was used to generate this pairing of couples was a very simple one: “can you make me a photo of an Asian man and a white woman.” based on this simplest of inputs, the reporter somehow wanted the image generator to magically reach inside her mind and create a picture that was based on her ideas of what a couple like that should look like, but of course it didn't work that way.
This is like getting into a car, putting your hands on the steering wheel and slamming on the gas without doing any steering at all, and then acting surprised when the car crashes into a tree. Cars are not designed to work this way, and saying that car designers are biased in their designs by not taking care of those people who would get into the car and start driving without bothering to steer, is just stupid.
Image generators, and by corollary LLM's, are designed to be prompted and given enough context to be able to generate content that makes sense within that context. With the minimum of context provided you can expect, and should expect, an output that makes the minimum of sense. The isn’t bias–it is incorrect use of technology.
MidJourney: a photo of an Asian man and a white woman
It is incorrect, because even in the future, the very same minimal prompt into an image generator will still probably generate an image that may offend somebody. As a practicing and serious photographer who is always exposed to other people’s subjective view of my work, I do not think that there can be found any two people who would have the same expectation of a pairing based on such a simple prompt. Therefore, without additional context, the image generator just cannot satisfy all users in all cases. It is therefore not a question of bias; it is a question of not using the tech in the right way. Of course, bias in AI is a serious problem. An indicative issue with image generators would be if one prompted, “young white man sitting in a modern office, surrounded by coworkers” and one got a decent output, but if one switched that to “young black man sitting in a modern office, surrounded by coworkers” and one got a scene out of a crime movie. But that is not the case.
MidJourney: young white man sitting in a modern office, surrounded by coworkers
MidJourney: young black man sitting in a modern office, surrounded by coworkers
Context is king, as the saying goes. And with LLMs, and image gneerators. that adage still holds true. Zero-shot prompting of the kind that was used in the case of this article to convey a faux outrage muddies the water around the very serious issue of real biases baked in AI training sets, which does deserve very serious attention. Even providing a bit better context seems to result in images that are not far off from what I would have imagined, as could be seen in this example.
MidJourney: a photo of an asian man and a white woman who are coworkers and are standing in a line to board a place JFK --style raw