ChatGPT Images 2.0

minimaxir · 2026-04-21T23:19:57 1776813597

So during my Nano Banana Pro experiments I wrote a very fun prompt that tests the ability for these image generation models to follow heuristics, but still requires domain knowledge and/or use of the search tool:

    Create a 8x8 contiguous grid of the Pokémon whose National Pokédex numbers correspond to the first 64 prime numbers. Include a black border between the subimages.

    You MUST obey ALL the FOLLOWING rules for these subimages:
    - Add a label anchored to the top left corner of the subimage with the Pokémon's National Pokédex number.
      - NEVER include a `#` in the label
      - This text is left-justified, white color, and Menlo font typeface
      - The label fill color is black
    - If the Pokémon's National Pokédex number is 1 digit, display the Pokémon in a 8-bit style
    - If the Pokémon's National Pokédex number is 2 digits, display the Pokémon in a charcoal drawing style
    - If the Pokémon's National Pokédex number is 3 digits, display the Pokémon in a Ukiyo-e style

The NBP result is here, which got the numbers, corresponding Pokemon, and styles correct, with the main point of contention being that the style application is lazy and that the images may be plagiarized: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...

Running that same prompt through gpt-2-image high gave an...interesting contrast: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...

It did more inventive styles for the images that appear to be original, but:

- The style logic is by row, not raw numbers and are therefore wrong

- Several of the Pokemon are flat-out wrong

- Number font is wrong

- Bottom isn't square for some reason

Odd results.

MrManatee · 2026-04-22T07:59:35 1776844775

Prompts like this feel like it's using the wrong abstraction. The "obvious" thing to do with something like this would be to generate some code that generates the image and then run that code.

Inspired by this, I tried something much simpler. I asked it to draw 12 concentric circles. With three tries it always drew 10 instead. https://chatgpt.com/share/69e87d08-5a14-83eb-9a3b-3a8eb14692...

LeifCarrotson · 2026-04-22T11:36:51 1776857811

I think prompts like this are where agentic workflows come in to play. If you asked it to do generate the first 64 prime numbers, AI tools could do that. If you asked it to draw a charcoal image of Pokemon 13, it could do that. If you asked it to add a white Menlo 13 on a black background to the top left corner of that image, it could do that. If you asked it to do that 63 more times, it could do those things, and if you asked it to assemble those into a grid, it could.

It can't get that in a one-shot. Perhaps, though, it could figure out when it needs to break a problem into individual tasks to delegate to itself and assemble them at the end.

wahnfrieden · 2026-04-22T15:40:00 1776872400

That's what makes it a fair evaluation of its limits

fennecfoxy · 2026-04-22T16:45:38 1776876338

I mean asking these transformers to do maths has always been the wrong task. It's like we're now considering "it doesn't have x tools built with traditional code built in".

Though I suppose we're testing their model + agent harness here as well. It really _should_ have all of those tools/reasoning available to accomplish a task like the above without issue.

wahnfrieden · 2026-04-22T17:26:33 1776878793

It's only been the wrong task because they've been deficient at it and expensive to use, so we had workarounds. They are getting better at these tasks and cheaper (sometimes). It's fair to evaluate even if there are more economical and accurate alternatives available.

podgietaru · 2026-04-22T15:26:09 1776871569

How is it that a model can produce what must be near 1:1 images ripped straight out of Pokemon Fire Red (The first ones) for profit and not be infringing copyright.

I know that's the game, but it seems CRAZY to me that they can do this.

minimaxir · 2026-04-22T20:12:06 1776888726

So the sprites aren't what I considered plagiarism since to my surprise they are sufficiently different even though it's a similar design to the FR/LG sprites.

For Charmeleon, the sprite is closest to the B/W sprite, but not exact: https://bulbapedia.bulbagarden.net/wiki/Charmeleon_(Pokémon)...

For Squirtle, the sprite is much closer to the FR/LG sprite but still some differences: https://bulbapedia.bulbagarden.net/wiki/Squirtle_(Pokémon)#S...

The other images, however, crib from official artworks a bit too close for comfort.

In my original analysis I hypothesized this is due to token scarcity that reduces the ability for the model to be created: I believe that NBP images used 1.5k tokens for that image while the gpt-2-image used 7k tokens, but this is hard to test.

dragonwriter · 2026-04-22T16:46:58 1776876418

Training a model on a corpus which includes copyrighted images but which is not focussed primarily or exclusively on applications which violate copyright might be fair use in the US (so far, it seems that way.)

But that doesn't mean that producing outputs using the model so trained which are based on copyright-protected ones in ways which would violate copyright if produced by any other means doesn't still violate copyright. DMCA safe harbor might apply to the system owner (IIRC, the exact boundaries are fuzzy with UGC generated on the site by the provider’s systems rather than generated elsewhere and posted), so Google may not be liable for the infringement (though if it is actively searching for references online at generation and not relying on what is trained into the model, that would seem to weaken the case for that), but it's still an infringement.

dd8601fn · 2026-04-22T16:24:32 1776875072

The funny this is the main complaint I’ve heard so far is that it repeatedly refused to operate on original content… because it might violate copyright.

minimaxir · 2026-04-22T20:15:00 1776888900

In testing the ChatGPT interface it appears to be looser on copyright than expected given the legal trouble.

DrewADesign · 2026-04-22T16:37:23 1776875843

Yeah, the CSAM generated by grok proves the guardrails are only really good for stymieing benign uses.

DrewADesign · 2026-04-22T16:34:34 1776875674

It can’t. It violates copyright. The big players are the only ones with the money to pursue these things, but they’re interested in replacing artists with AI trained on their models so they settle and set up some sort of agreement. The little guys have no presidential case law to help them along, and nowhere close to the resources to push it that far, so they get steamrolled. I know artists famous enough for people— even commercial entities — to regularly blatantly rip them off by name with “in the style of” prompts, but there’s no realistic path to pursue it. Fame doesn’t pay legal bills.

Jensson · 2026-04-22T16:03:23 1776873803

Gemini uses google search to find references when making images, so it probably found the pokemon images online to do this.

> I know that's the game, but it seems CRAZY to me that they can do this.

Its not crazy that a search can find existing pokemon images. Maybe google should show which images it used as references to be more transparent here.

dvt · 2026-04-22T01:23:36 1776821016

This is an amazing test and it's kinda' funny how terrible gpt-2-image is. I'd take "plagiarized" images (e.g. Google search & copy-paste) any day over how awful the OpenAI result is. Doesn't even seem like they have a sanity checker/post-processing "did I follow the instructions correctly?" step, because the digit-style constraint violation should be easily caught. It's also expensive as shit to just get an image that's essentially unusable.

the_arun · 2026-04-22T02:55:09 1776826509

This is from Gemini - https://lens.usercontent.google.com/banana?agsi=CmdnbG9iYWw6...

fblp · 2026-04-22T04:13:56 1776831236

Did it correctly follow the instructions? Don't know my pokemon well enough.

minimaxir · 2026-04-22T04:49:46 1776833386

Essentially yes (bottom got distorted), but Gemini uses Nano Banana Pro or Nano Banana 2 so it's not a surprising result. The image I linked uses the raw API.

thih9 · 2026-04-22T07:37:11 1776843431

Note that the styles are different; there are two digit images rendered in color.

Color charcoal drawings do exist, but it’s not what’s usually meant by “charcoal drawing”.

podgietaru · 2026-04-22T15:27:59 1776871679

Plusul and Minun sit next to each other in the Pokedex, 311 and 312. There's two 307s.

anshumankmr · 2026-04-22T02:42:04 1776825724

that is interesting cause I feel gpt-image-1 did have that feature.

(source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)

weird-eye-issue · 2026-04-22T04:12:34 1776831154

You are comparing ChatGPT to a raw image model. These are two completely different things. ChatGPT takes your input, modifies the prompt and then passes it to the image model and then will maybe read the image and provide output. The image model like through the API just takes the prompt verbatim and generates an image.

minimaxir · 2026-04-22T04:42:06 1776832926

Nano Banana Pro and ChatGPT Images 2.0 also tweak the prompt because they can think.

weird-eye-issue · 2026-04-22T04:48:04 1776833284

Yes exactly, "ChatGPT Images 2.0" is in ChatGPT. That is not a model.

hyperadvanced · 2026-04-22T02:46:00 1776825960

I wouldn’t say it’s terrible. I wouldn’t say it’s a huge step forward in terms of quality compared to what I’ve seen before from AI

AussieWog93 · 2026-04-22T08:30:52 1776846652

For what it's worth, NBP made some mistakes too.

Artistic oddities aside (why are the 8-bit sprites 16-bit, why do the charcoal drawings have colour, why does the art of specifically the Gen 1 Pokemon look so off.), 271 is Lombre, not Lotad.

rrr_oh_man · 2026-04-22T00:37:08 1776818228

Why would you consider this a good prompt?

minimaxir · 2026-04-22T00:40:35 1776818435

Because both Nano Banana Pro and ChatGPT Images 2.0 have touted strong reasoning capabilities, and this particular prompt has more objective, easy-to-validate criteria as opposed to the subjective nature of images.

I have more subjective prompts to test reasoning but they're your-mileage-may-vary (however, gpt-2-image has surprisingly been doing much better on more objective criteria in my test cases)

o10449366 · 2026-04-22T01:33:00 1776821580

[flagged]

minimaxir · 2026-04-22T01:44:59 1776822299

"Quirky and obscure" has the functional benefit of ensuring the source question is not in the training data/outside the median user prompt, and therefore making the model less likely to cheat.

We have enough people complaining about Simon Willison's pelican test.

o10449366 · 2026-04-22T06:44:06 1776840246

When you program, do you consider using your prior knowledge of programming cheating?

Bjartr · 2026-04-22T02:45:49 1776825949

What would make the prompt a better actual evaluation in your judgement?

leptons · 2026-04-22T06:18:59 1776838739

Not focusing on pokemon for a start. Maybe use something more people can recognize and evaluate. I have zero knowledge of pokemon, I see it as a niche thing for ultra-nerdy people, and not something everyone is familiar with. Nothing about that test can be evaluated by anyone but a pokemon expert. Sorry, but pokemon isn't as mainstream as some people might think it is.

Bjartr · 2026-04-22T16:37:07 1776875827

I think you underestimate how popular Pokemon is.

By most objective measures it's the largest entertainment franchise in all of history.

Would you also object to any other pop-culture reference for the same reason?

tailscaler2026 · 2026-04-22T02:32:47 1776825167

still #opentowork huh

beepbooptheory · 2026-04-22T03:51:03 1776829863

Where does one even use that hashtag?

minimaxir · 2026-04-22T05:04:05 1776834245

It's a LinkedIn joke.

codemog · 2026-04-22T02:08:36 1776823716

Ah yes, also known as C++ enjoyers.

vincentbuilds · 2026-04-22T06:18:38 1776838718

banana Pro gets the logic and punts on the art; gpt-2-image gets the art and punts on the logic. Feels like instruction-following and creativity sit on opposite ends of the same slider.

sdwr · 2026-04-22T18:20:04 1776882004

Yeah it's fascinating to have an alternate source for intelligence, feels like a mental Rosetta Stone

dieortin · 2026-04-22T09:29:35 1776850175

This feels incredibly AI generated

doginasuit · 2026-04-22T12:13:58 1776860038

The random accusations of AI generated comments are the most annoying part of the unfolding AI dystopia.

fennecfoxy · 2026-04-22T16:47:45 1776876465

Dogs don't wear suits! You must be AI too ;)

doginasuit · 2026-04-22T17:24:59 1776878699

Damn, busted!

Palmik · 2026-04-22T08:47:54 1776847674

I do not think this is a good prompt or useful benchmark, but nonetheless, it seems to work better for me: https://chatgpt.com/share/69e88a94-ded8-8395-b5dc-abceb2f44d...

minimaxir · 2026-04-22T15:32:50 1776871970

Huh, that is indeed better. If ChatGPT Images 2.0/gpt-2-image is more nondeterministic than usual, than that is in itself a useful data point.

Palmik · 2026-04-22T16:38:52 1776875932

Did you enable thinking for your experiment? Are you sure you were on the 2.0 rather than 1.5 version?

minimaxir · 2026-04-22T21:18:29 1776892709

That experiment image was directly through the API on high. (no Thinking parameter like the Web UI)

pfortuny · 2026-04-22T08:43:04 1776847384

Just try a 23-sided plane convex polygon.

razorbeamz · 2026-04-22T08:21:44 1776846104

Neither of them drew them in an 8-bit style either. It's way too many colors.

podgietaru · 2026-04-22T15:28:58 1776871738

They made the same mistake a lot of people do, 8-bit meaning Retro style. But they're from the 16bit(?) GBA games.

dodslaser · 2026-04-22T08:41:55 1776847315

Maybe they're so advanced they learned to write to the palette registers mid-scanline.

Razengan · 2026-04-22T04:37:30 1776832650

Even a few months ago, ChatGPT/Sora's image generation performed better than Gemini/Nano Banana for certain weird prompts:

Try things like: "A white capybara with black spots, on a tricycle, with 7 tentacles instead of legs, each tentacle is a different color of the rainbow" (paraphrased, not the literal exact prompt I used)

Gemini just globbed a whole mass of tentacles without any regards to the count

m3kw9 · 2026-04-22T05:11:35 1776834695

Prob a very unscientific way to test an image model. This would me likely because they have the reasoning turned down and let its instant output takeover

minimaxir · 2026-04-22T05:32:35 1776835955

There's no good scientific way to test a closed-source model with both nondeterministic and subjective output.

This example image was generated using the API on high, not the low reasoning version. (it is slow and takes 2 minutes lol)

crustaceansoup · 2026-04-22T05:36:04 1776836164

If the results are quantifiable/objective and repeatable it's scientific, how is it not scientific?

The reasoning amount is part of the evaluation isn't it?

TeMPOraL · 2026-04-22T06:40:54 1776840054

This is the best kind of science there is: direct, empirical test.

parasti · 2026-04-22T06:04:43 1776837883

A great technical achievement, for sure, but this is kind of the moment where it enters uncanny valley to me. The promo reel on the website makes it feel like humans doing incredible things (background music intentionally evokes that emotion), but it's a slideshow of computer generatated images attempting to replicate the amazing things that humans do. It's just crazy to look at those images and have to consciously remind myself - nobody made this, this photographed place and people do not exist, no human participated in this photo, no human traced the lines of this comic, no human designer laid out the text in this image. This is a really clever amalgamation machine of human-based inputs. Uncanny valley.

mw888 · 2026-04-22T22:46:36 1776897996

Uncanny Valley means the content directly evokes that creepy feeling, because the 'unrealness' is somehow subjectively apparent.

But you say yourself you "have to consciously remind [yourself]" it isn't real. The Uncanny Valley is not applicable when true subjective realness is imparted.

qnleigh · 2026-04-22T08:29:39 1776846579

No this is what life looks like on the other side of the uncanny valley. The images don't look creepy because they look artificial or wrong. They're a reminder of a creepy new reality where our eyes can no longer tell us what's real.

Cyan488 · 2026-04-22T11:52:21 1776858741

We've definitely passed the point where discerning between real and AI images is impossible, even for a very detail-oriented eye.

Cthulhu_ · 2026-04-22T14:21:46 1776867706

It's not really a new problem though, as image forgery was a thing ages ago; if there weren't laws or measures taken against photoshopped images or instagram filters or faceapp things then, why would there be laws or measures taken against AI generated images now?

Granted, a nontrivial difference is that the barrier to entry is lower; photo editing is something that requires active effort and learning.

conception · 2026-04-22T14:45:34 1776869134

Absolutism isn’t very useful. Scale and magnitude always need to be considered. “I can buy plates with uranium in them, why can’t I enrich it at scale for my own personal use???” “Humans have been hunting for thousands of years. Why can’t i deploy automated sentry machine guns at my property line??”

rkozik1989 · 2026-04-22T13:39:01 1776865141

Can't wait for all scams to rip of older folks and people who aren't there but aren't so far gone they still have that nobody has power of attorney over them.

rambojohnson · 2026-04-22T08:42:35 1776847355

Online.

Sohcahtoa82 · 2026-04-22T22:26:13 1776896773

That's not the uncanny valley. It's literally the other side of the uncanny valley.

The uncanny valley is when it's just slightly imperfect which makes things feel "off".

When we've reached the point that the AI is indistinguishable from humans, we've exited the uncanny valley.

tempaccount5050 · 2026-04-22T14:51:02 1776869462

Yep. Just like motion pictures. Why, it's just a facsimile! People were meant to see performances by real people. These motion pictures fool your eye and surely will unravel the very fabric of civilized society! No longer shall the thespian be well employed! And the minds of the children will lay in ruins from such filth!

parasti · 2026-04-22T15:49:04 1776872944

I get your point, but it's not even really that. It's that an AI generated photo evokes the same feelings in me that human-made photographs do and I have to catch that and turn that off consciously.

tempaccount5050 · 2026-04-22T16:10:26 1776874226

It shouldn't bother you. Just enjoy stuff. It's ok to think computer art is pretty. It's not some kind of personal or societal moral failing.

vincnetas · 2026-04-22T16:56:18 1776876978

well it is. what if you found out that your wife is actually a robot that you cant tell apart from real human. your real wife. well at least not by cutting her open. would you feel the same being with here?

GMoromisato · 2026-04-22T18:00:32 1776880832

That's not as bad as when I learned my wife is really just the product of cell division.

rnjesus · 2026-04-22T18:27:21 1776882441

where can i buy one of these robot wives?

GMoromisato · 2026-04-22T18:11:37 1776881497

I get this attitude--I really do. But I think the world moves on, and our children are not going to think this is even slightly strange. As always, it's us old-timers who have the hardest time with change.

I also think this is "art" in service of commerce. This is OpenAI advertising their goods using art/design/writing. That's no different than cereal companies using Elmer's glue instead of milk for their photoshoots. I don't have a high-bar for that kind of "art".

The good news is that the cutting edge of art will (for a while longer) still be a human domain. The more popular these models become, the more of their images we see in our lives, the more we will value things that look different.

linen · 2026-04-22T22:18:18 1776896298

Why are so many on HN unable to see through the B.S. and hype? Everything in the trailer feels unvaried and derivative. It does text and filters well (grit/grain, UI etc) but all the posters, comics, and infographics feel the same. They've all got matching structure and color palettes and once you've seen enough of them, you can easily spot them in a crowd. I'm not sure why people are falling for this, the AI voices in the trailer are ridiculous too.

ceroxylon · 2026-04-22T16:17:49 1776874669

The wolf photo for the article was the most eerie example for me... if I am reading about the natural world, I want to see a real photo of the natural world.

simonw · 2026-04-21T19:26:48 1776799608

I've been trying out the new model like this:

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run https://tools.simonwillison.net/python/openai_image.py \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio"

Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...

Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...

simonw · 2026-04-21T19:52:34 1776801154

I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
    --quality high --size 3840x2160

https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!

I think that image cost 40 cents.

makira · 2026-04-21T20:13:31 1776802411

Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:

"Found the raccoon holding a ham radio in waldo2.png (3840×2160).

  - Raccoon center: roughly (460, 1680)                                                                                            
  - Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)                                         
  - Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780                                                                
                                                                                                                                   
  It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "

Which is correct!

cwillu · 2026-04-21T20:32:23 1776803543

I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.

makira · 2026-04-21T20:34:52 1776803692

simonw posted 2 different images: make sure to look at the second one.

cwillu · 2026-04-21T20:35:27 1776803727

Yeah, I noticed that just now, but too late to delete the comment :p

jaggederest · 2026-04-21T22:45:21 1776811521

You had a meta problem, and three, in total: find the raccoon, find the umbrella, find the right link in the comments.

bombcar · 2026-04-22T12:15:05 1776860105

To find Waldo you must first create the Universe.

M3L0NM4N · 2026-04-21T23:20:18 1776813618

We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.

nerdsniper · 2026-04-22T01:00:52 1776819652

There seemed to be more space around the raccoon than most other subjects. Zoomed out it appears as almost a “halo” highlighting the raccoon.

prmoustache · 2026-04-22T07:00:57 1776841257

Funny how it can look convincing from far away but once you zoom in you find out most characters have a mix of leprosy and skin cancer.

wewtyflakes · 2026-04-21T23:18:58 1776813538

A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!

rattlesnakedave · 2026-04-22T01:58:56 1776823136

To be fair, the average person has fewer than two arms.

cozzyd · 2026-04-22T05:35:12 1776836112

Most people have an ARM in their pockets, nowadays. And possibly on their wrist.

floodfx · 2026-04-22T02:21:21 1776824481

Haha. Underrated comment!

cozzyd · 2026-04-22T02:17:13 1776824233

This is why they're congregating around the first aid and the lost and found

ehnto · 2026-04-22T07:51:41 1776844301

There id a leg that sprouts into part of bush, perhaps that's where people's legs are disappearing to.

globular-toast · 2026-04-22T05:46:24 1776836784

Finding the raccoon was instant. Finding all the weird AI artifacts is more fun. It's quite fascinating really. As usual it looks impressive at a glance but completely falls apart on closer inspection. I also didn't find any jokes, unless maybe the bridge to nowhere or finger posts pointing both ways counts?

davebren · 2026-04-21T20:04:06 1776801846

The faces...that's nice that it turned a kid's book into an abomination

Filligree · 2026-04-22T00:56:17 1776819377

By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits.

globular-toast · 2026-04-22T06:01:34 1776837694

But it's also straight up plagiarism and still ridiculously bad on so many levels.

davebren · 2026-04-22T01:22:27 1776820947

It could already copy the art styles from its training data, what is the advancement here?

vaulstein · 2026-04-22T04:19:46 1776831586

It's interesting that the raccoon is well defined because it was a part of the request. But none of the other Fauna are.

keithnz · 2026-04-22T02:53:27 1776826407

it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my.

jdironman · 2026-04-22T03:16:58 1776827818

The real NFTs where the images we generated along the way

louiereederson · 2026-04-21T19:59:46 1776801586

The people in this image remind me of early this person does not exist, in the best way

dfee · 2026-04-21T22:44:57 1776811497

fair point, also "this raccoon does not exist"

gpt5 · 2026-04-21T23:04:26 1776812666

I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.

https://postimg.cc/wyxgCgNY

luxpir · 2026-04-22T06:23:46 1776839026

Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)

djmips · 2026-04-22T05:17:14 1776835034

mmmm yummy OSLS?

mirekrusin · 2026-04-22T00:57:18 1776819438

Can it generate non halloween version though?

This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.

ireadmevs · 2026-04-21T20:05:40 1776801940

I found it on the 2nd image! On the 1st one not yet...

dzhiurgis · 2026-04-22T04:14:43 1776831283

Cost me < 1 cents - https://elsrc.com/elsrc/waldo/wojak.jpg

And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents

p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!

botanrice · 2026-04-22T14:29:26 1776868166

Some pretty funny but good examples:

https://elsrc.com/elsrc/waldo/10_schoolsofthought.jpg

https://elsrc.com/elsrc/waldo/10_anthropomorphizedcomputermo...

https://elsrc.com/elsrc/waldo/10_breathoffreshairsittingonad...

https://elsrc.com/elsrc/waldo/10_drizzydrakesdoingthedrakeme...

https://elsrc.com/elsrc/waldo/10_sashringingtrashsingingmash...

Ok i promise I'm done xD

wordpad · 2026-04-22T14:56:41 1776869801

That's way more than 10, around 50

botanrice · 2026-04-22T14:03:52 1776866632

are you using the same prompt the above commenter used? I've been toying around with increasingly ridiculous prompts and it works surprisingly well. It's the new ChatGPT image gen or Nano Banana?

It's pretty good tbh, even with absurd prompts

Barbing · 2026-04-22T03:45:51 1776829551

>I think that image cost 40 cents.

Kinda made me sad assuming the author didn't license anything to OpenAI.

I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.

$.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)

rafram · 2026-04-22T04:06:57 1776830817

License what? The concept of a hidden object search? The only stylistic similarity here is the viewing angle. Where’s Waldo comics are flat, brightly colored line drawings that look nothing like this at all.

Barbing · 2026-04-22T05:16:04 1776834964

Well, I recognized the style from even the new physical books on sale today, but I don’t know art well enough to use a term like flat.

I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.

makira · 2026-04-21T19:33:04 1776799984

> though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure

I see an opportunity for a new AI test!

vunderba · 2026-04-21T20:44:53 1776804293

There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.

It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.

simonw · 2026-04-21T19:37:24 1776800244

I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.

marricks · 2026-04-22T00:06:32 1776816392

Like... this has things that AI will seemingly always be terrible at?

At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:

- Nightmarish screaming faces on most people

- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist

- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...

It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...

We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??

fennecfoxy · 2026-04-22T17:01:26 1776877286

No, it won't be. I did indeed get the same problems when trying to generate my own image for it.

However as someone who's mucked about with local image generation as well - I'd say that this is a problem with their implementation, it doesn't resolve fine detail because majority of requests it won't matter/it drastically increases compute requirements.

With local image generation bad features/incorrect fingers/disfigurement etc has been solved for a long time.

I think their new process involves multiple steps including sketching/fleshing out the idea before adding detail. The step that would fix this would be outpainting or similar to tile based upscaling.

From what I understand of image generation models they also struggle with fine detail in general because they aren't really trained for that. However for each tiny chunk of a detailed image like that there's nothing to say they can't allocate a 500x500 chunk for it to work in as its "idea/reference space" and then transpose that into the main image being generated - i.e. generate image features separately rather than all together.

p1esk · 2026-04-22T00:57:06 1776819426

AI will seemingly always be ...

You do realize that the whole image generation field is barely 10 years old?

I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!

halamadrid · 2026-04-22T04:12:56 1776831176

Really hard to look at these images given how not human like the humans are. A few are ok, but a lot are disfigured or missing parts and its hard to find a raccoon in here.

vova_hn2 · 2026-04-22T01:28:09 1776821289

Thanks for the image, I will see their faces in my nightmares.

vunderba · 2026-04-22T01:30:56 1776821456

This happens all too frequently when you ask a GenAI model to create an image with a large crowd especially a “Where’s Waldo?” style scenes, where by definition you’re going to be examining individual faces very closely.

hackable_sand · 2026-04-22T03:46:33 1776829593

What about the faces of the people ChatGPT killed?

pants2 · 2026-04-21T20:07:17 1776802037

The second 4K image definitely has a raccoon on the left there! Nice.

nerdsniper · 2026-04-22T00:59:03 1776819543

That is a devilishly difficult prompt for current diffusion tasks. Kudos.

ritzaco · 2026-04-21T19:40:58 1776800458

haha took me a while to notice that one of the buildings is labelled 'Ham radio'

arealaccount · 2026-04-21T19:43:34 1776800614

I see the raccoon

ElFitz · 2026-04-21T20:06:48 1776802008

Damn. There’s a fun game app to make here ^^

dymk · 2026-04-22T00:20:27 1776817227

Is there? The moment you look closely at the puzzle (which is... the whole point of Where's Waldo), you notice all the deformities and errors.

ElFitz · 2026-04-22T06:54:21 1776840861

Yes, it’s not there yet. But nothing unsolvable. First thing that comes to mind would be generating smaller portion at the same resolution, then expand through tiling (although one might need to use another service & model for this), like we used to do with Stable Diffusion years ago.

Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.

Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.

amelius · 2026-04-22T08:10:02 1776845402

Yes sounds more like a fun research project instead.

tptacek · 2026-04-21T19:39:29 1776800369

5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."

(I don't think it's right).

ritzaco · 2026-04-21T19:43:23 1776800603

I tried

> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist

and got this. I'm not sure I know what a ham radio looks like though.

https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...

jackpirate · 2026-04-21T19:47:57 1776800877

Also, the racoon it circled isn't in the original.

Aurornis · 2026-04-21T19:59:48 1776801588

I love how perfectly this captures the difficulties of using generative AI for detection tasks.

jetbalsa · 2026-04-22T01:21:45 1776820905

Oh god yes, I've been trying to make a LLM Assisted Magic the Gathering card scanner... its been a hell of a time trying to get it to just OCR card names well....

what · 2026-04-22T03:03:15 1776826995

Why would you use an LLM for OCR?

fennecfoxy · 2026-04-22T16:56:12 1776876972

Because if it's multimodal, oops all transformers and they're pretty much best in class for ocr now, afaik?

jubilanti · 2026-04-22T13:53:07 1776865987

Because apparently that's what programming is and can only be these days...

angiolillo · 2026-04-21T19:55:12 1776801312

Indeed. I suppose one way to ensure you can find Waldo in any image is to add it yourself.

simonw · 2026-04-21T20:54:37 1776804877

That's excellent. I added it to my post: https://simonwillison.net/2026/Apr/21/gpt-image-2/#update-as...

davecahill · 2026-04-22T02:33:28 1776825208

hilarious - i tried and got the same thing.

there was a very large bear in the first image; when asked to circle the raccoon it just turned the bear into a giant raccoon and circled it.

vunderba · 2026-04-21T20:32:10 1776803530

OpenAI’s gpt-image-1.5 and Google’s NB2 have been pretty much neck and neck on my comparison site which focuses heavily on prompt adherence, with both hovering around a 70% success rate on the prompts for generative and editing capabilities. With the caveat being that Gemini has always had the edge in terms of visual fidelity.

That being said, gpt-image-1.5 was a big leap in visual quality for OpenAI and eliminated most of the classic issues of its predecessor, including things like the “piss filter.”

I’ll update this comment once I’ve finished running gpt-image-2 through both the generative and editing comparison charts on GenAI Showdown.

Since the advent of NB, I’ve had to ratchet up the difficulty of the prompts especially in the text-to-image section. The best models now score around 70%, successfully completing 11 out of 15 prompts.

For reference, here’s a comparison of ByteDance, Google, and OpenAI on editing performance:

https://genai-showdown.specr.net/image-editing?models=nbp3,s...

And here’s the same comparison for generative performance:

https://genai-showdown.specr.net/?models=s4,nbp3,g15

UPDATES:

gpt-image-2 has already managed to overcome one of the so‑called “model killers” on the test suite: the nine-pointed star.

Results are in for the generative (text to image) capabilities: Gpt-image-2 scored 12 out of 15 on the text-to-image benchmark, edging out the previous best models by a single point. It still fails on the following prompts:

- A photo of a brightly colored coral snake but with the bands of color red, blue, green, purple, and yellow repeated in that exact order.

- A twenty-sided die (D20) with the first twenty prime numbers (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71) on the faces.

- A flat earth-like planet which resembles a flat disc is overpopulated with people. The people are densely packed together such that they are spilling over the edges of the planet. Cheap "coastal" real estate property available.

All Models:

https://genai-showdown.specr.net

Just Gpt-Image-1.5, Gpt-Image-2, Nano-Banana 2, and Seedream 4.0

https://genai-showdown.specr.net?models=s4,nbp3,g15,g2

gusmally · 2026-04-22T21:58:35 1776895115

Such a fun site, thank you! I was surprised that Seedream4 passed the mermaid test since it's hard to tell whether they are in the water or submerged, and the mermaid has something funny going on with her left hand.

vunderba · 2026-04-22T22:17:38 1776896258

Yeah seedream's attempt does have a bit of an uncanny valley effect: the mermaid/dolphin are only partially submerged, but there’s water above them with sunlight reflecting on the surface, and the mermaid’s hand looks disconnected from the angle of her arm.

That’s why I gave it a bronze. To me, it falls into that “barely passing” category, similar to Gemini 2.5 Flash Image on that test. Seedream also took a major hit to its weighted score because of how many attempts it took to get something even remotely passable out of it.

Thanks for the feedback!

m_kos · 2026-04-22T03:42:02 1776829322

Very useful website. Would you have insight into what models are best at editing existing images?

I often have to make very specific edits while keeping the rest of the image intact and haven't yet found a good model. These are typically abstract images for experiments.

I asked gpt-image-2 to recolor specific scales of your Seedream 4 snake and change the shape of others. It did very poorly.

vunderba · 2026-04-22T03:56:53 1776830213

OpenAI actually has really good adherence, but occasionally tends to introduce its own almost equivalent of "tone mapping", making hyper-localized edits frustrating.

I don’t know how much work it is for you, but one thing a lot of people do, myself included, is take the original image, make a change to it using something like NB, then paste that as the topmost layer in something like Krita/Pixelmator. After that, we’ll mask and feather in only the parts we actually want to change. It doesn’t always work if it changes the overall color balance or filters out certain hues, it can be a real pain but it does the job in some cases.

The Flux models (like Kontext) are actually surprisingly good at making very minimal changes to the rest of the image, but unfortunately their understanding of complex prompts is much weaker than the closed, proprietary models.

I will say that I’ve found Gemini 3.0 (NB Pro) does a relatively decent job of avoiding unnecessary changes - sometimes exceeding the more recent NB2, and it scored quite well on comparative image-editing benchmarks.

https://genai-showdown.specr.net/image-editing

m_kos · 2026-04-22T04:18:12 1776831492

Thanks. I will try this! I need to read up on how to work with vision models for both generation and understanding.

valarauko · 2026-04-22T17:19:16 1776878356

That's lovely. My own personal benchmark has been to ask the various models to generate a functional pair of novelty New Year's Eve glasses on a person, that don't just plonk the year onto the top of regular frames.

vunderba · 2026-04-22T21:29:24 1776893364

Thanks. That's a good one~ Lens type stuff that involves reflections/refraction is a neat challenge for generative models. I did some editing tests that involved replacing an apartment window with a mirror back when Nano-Banana Pro was released and was rather stunned by the results.

https://mordenstar.com/blog/edits-with-nanobanana/#through-t...

LiteSoul · 2026-04-22T16:31:56 1776875516

Great website, 2 things:

1 - Gpt-image-2 seems to pass the Flat Earth test? (if not, I'm sure the paid thinking 2k version passes it).

2 - Since NB2 was earlier, many gold medals are assigned to it, even though now GI2 passes them too, example the Octopus test NB2 14 attempts but GI2 just 2 (BTW number of attempts should affect the score I guess?)

vunderba · 2026-04-22T16:42:23 1776876143

So if you zoom in (click the zoom button on the actual gpt-image-2 of the flat Earth), you’ll see that a lot of the people are anatomical impossibilities, which is one of the disallowed criteria on the list. The faces also look like melted candles.

This is one of those areas where even state-of-the-art models still struggle. You’re asking for a high level of detail at a per-person level, which means you end up with lots and lots of very small objects that all need to be rendered with convincing detail.

I should probably explain the scoring rubric better - it's in the (i) info icon. If you click the pass/fail button towards the top, it switches from a simple pass/fail view to a weighted score. That weighted score is based on three things: level of adherence to the prompt, visual fidelity, and the number of attempts.

I've tried to keep my criteria as objective as possible, but there's just a certain level of unavoidable subjectivity to it.

For example, with the octopus image: Even though the minimum criteria might be five tentacles covered, having all eight is much closer to the ideal of “an octopus,” so it usually gets bumped up to a higher rating (bronze, silver, gold).

Honestly, I think I agree that the gpt-image-2 probably should be upgraded to a gold medal. Thanks for pointing that out!

VladVladikoff · 2026-04-22T03:15:32 1776827732

Why does Gemini 3.1 get a pass for the same reasons they got image 2 gets a fail on the flat earth one? Gemini has all sorts of random body parts and limbs etc.

vunderba · 2026-04-22T03:22:01 1776828121

That's a mistake~ None of the models successfully passed the Flat Earth composition test. I've updated the passing criteria to be more explicit as well. Thanks for catching that!

CamperBob2 · 2026-04-22T02:46:42 1776826002

It'd be interesting if you could add HunyuanImage-3 to the competition. It's better than Z-Image at almost everything I've thrown at it.

It can be (slowly) run at home, but needs 96GB RTX 6000-level hardware so it is not very popular.

vunderba · 2026-04-22T02:50:41 1776826241

I’ll have to give it another try. Its predecessor, Hunyuan Image 2.0, scored pretty poorly when I tested it last year: 2 out of 15, so it'll be interesting to see how much it has improved.

Here's ZiT, Gpt-Image-2, and Hunyuan Image 2 for reference:

https://genai-showdown.specr.net/?models=hy2,g2,zt

Note: It won't show up in some of the newer image comparisons (Angelic Forge, Flat Earth, etc) because it's been deprecated for a while but in the tests where it was used (Yarrctic Circle, Not the Bees, etc.) it's pretty rough.

CamperBob2 · 2026-04-22T03:37:48 1776829068

It does quite a bit better than 2.0, I think. Or at least it may be stylistically different enough to justify a rematch against the others.

Ring toss: https://i.imgur.com/Zs6UNKj.png (arguably a pass)

9-pointed star: https://i.imgur.com/SpcSsSv.png (star is well-formed but only has 6 points)

Mermaid: https://i.imgur.com/R6MbMPX.png (fail, and I can't get Imgur to host it for some reason even though it's SFW)

Octopus: https://i.imgur.com/JTVH7xy.png (good try, almost a pass, but socks don't cover the ends of all the tentacles)

Above are one-shot attempts with seed 42.

vunderba · 2026-04-22T04:01:13 1776830473

> https://i.imgur.com/6NXpI2q.png

You're killing me Smalls. This one is a 404. I'm really curious what it actually showed.

That ring toss is definitely leagues better than its predecessor. I’m not going to fault it too much for the star though, that one is an absolute slate wiper. The only locally hostable model that ever managed it for me was the original Flux, and I’m still not entirely convinced it wasn’t a fluke. Despite getting twice as many attempts, Flux 2, a much larger model, couldn’t even pull it off.

CamperBob2 · 2026-04-22T04:07:05 1776830825

Yeah, I suspect you'd see some solid passing scores if you ran it as many times as some of the others.

For the mermaid, https://i.imgur.com/R6MbMPX.png sometimes seems to work but not consistently. It is probably triggering a porn filter of some kind. I need to find another free image host, as imgur has definitely jumped the shark.

The image shows a mermaid of evident Asian extraction lying on a beach, face down. There is a dolphin lying on top of her, positioned at a 90-degree angle. It doesn't show any interaction at all, so a definite fail.

vunderba · 2026-04-22T05:17:16 1776835036

I still use Imgur from time to time just because it’s convenient, but I’ve been meaning to build an Imgur-style extension for my site for a while, something that would let me drag and drop media for quick sharing but it being Astro-based (static site generation) makes it tricky.

what · 2026-04-22T03:27:40 1776828460

Where can I see the actual prompts and follow ups you fed each model?

vunderba · 2026-04-22T03:46:56 1776829616

So the prompts are tuned and adjusted on a per-model basis. If you look at the number of attempts, each receives a specific prompt variation depending on the model. This honestly isn't as much of an issue these days because SOTA models natural language parsing (particularly the multimodal ones) has eliminated a lot of the byzantine syntax requirements of the SD/SDXL days.

The template prompt seen in each comparison gets adjusted through a guided LLM which has fine-tuned system prompts to rewrite prompts. The goal is to foster greater diversity while preserving intent, so the image model has a better chance of getting the image right.

Getting to your suggestion for posting all the raw prompts, that's actually a great idea. Too bad I didn't think about it until you suggested it. And if you multiply it out - there's 15 distinct test cases against 22 models at this point, each with an average of about 8 attempts so we’re talking about thousands of prompts many of which are scattered across my hard drive. I might try to do this as a future follow-up.

what · 2026-04-22T04:19:38 1776831578

Shouldn’t every model get the same prompt? Seems a bit weird, especially when you can’t see the prompts that were used.

vunderba · 2026-04-22T04:57:14 1776833834

The goal isn’t the prompt itself. The test is whether a prompt can be expressed in such a way that we still arrive at the author's intent, and of course to do so in a way that isn't unnatural.

The prompts despite their variation are still expressed in natural language.

The idea is that if you can rephrase the prompt and still get the desired outcome, then the model demonstrates a kind of understanding; however more variation attempts also get correspondingly penalized: this is treated more as a failure of steering, not of raw capability.

An example might help - take the Alexander the Great on a Hippity-Hop test case.

The starter prompt is this: "A historical oil painting of Alexander the Great riding a hippity-hop toy into battle."

If a model fails this a couple of times (multiple seeds), we might use a synonym for a hippity-hop, it was also known as a space hopper.

Still failing? We might try to describe the basic physical appearance of a hippity-hop.

Thus, something like GPT-Image-2 scored much higher on the compliance component of the test, requiring only a single attempt, compared with Z-Image Turbo, which required 14 attempts.

ea016 · 2026-04-21T19:24:05 1776799445

Price comparison:

GPT Image 2

  Low     : 1024×1024 $0.006 | 1024×1536 $0.005 | 1536×1024 $0.005

  Medium  : 1024×1024 $0.053 | 1024×1536 $0.041 | 1536×1024 $0.041

  High    : 1024×1024 $0.211 | 1024×1536 $0.165 | 1536×1024 $0.165

GPT Image 1

  Low     : 1024×1024 $0.011 | 1024×1536 $0.016 | 1536×1024 $0.016

  Medium  : 1024×1024 $0.042 | 1024×1536 $0.063 | 1536×1024 $0.063

  High    : 1024×1024 $0.167 | 1024×1536 $0.25  | 1536×1024 $0.25

Melatonic · 2026-04-21T20:17:20 1776802640

Weird that they restrict the resolution so much. Does it fall apart with more detail (when zoomed in) or does the cost just skyrocket?

vunderba · 2026-04-21T20:20:58 1776802858

It's usually based on what they've been trained on. There aren't very many models that'll do higher resolutions outside of Seedream but adherency is worse.

_the_inflator · 2026-04-21T22:35:21 1776810921

Processing power, not training. The larger the scene in 2ď the more you need to compute. The resolution itself is not flexible. Imagine painting a white canvas. It is still a pixel per pixel algo which costs LLM GPU power while being the easiest thing to do without it.

You can create larger images by creating separate parts you recombine. But they may not perfectly match their borders.

It is a Landau thing not a trading thing. The idea of LLM is to work on the unknown.

vunderba · 2026-04-21T23:36:37 1776814597

It depends on the model. Diffusion models, which are among the more popular approaches, are typically trained at a specific image resolution.

For example, SDXL was trained on 1MP images, which is why if you try to generate images much larger than 1024×1024 without using techniques like high-res fixes or image-to-image on specific regions, you quickly end up with Cthulhu nightmare fuel.

nomel · 2026-04-21T22:36:11 1776810971

Need a model trained on closeup/macro shots of everything, to use for upscaling, then run that, as a kernel, over the whole image.

Melatonic · 2026-04-22T07:38:04 1776843484

Exactly what I was thinking

dsrtslnd23 · 2026-04-22T07:15:11 1776842111

actually gpt-image-2 is VERY flexible with the resolution. You can use arbitrary resolution within the max pixel budget.

ModernMech · 2026-04-22T04:18:18 1776831498

Generate a lower resolution image and upscale to the resolution you need.

lxgr · 2026-04-21T22:50:16 1776811816

Interesting, I wonder why larger outputs are more expensive than smaller square ones on v2, while it’s the other way around in v1.

ComputerGuru · 2026-04-22T03:20:39 1776828039

It can generate 3840x2160

lionkor · 2026-04-22T08:37:16 1776847036

Every cent you spend on this, remember: The people who made this possible are not even getting a millionth of a cent for every billion USD made with it (they are getting nothing). Same with code; that code you spent years pouring over, fixing, etc. is now how these companies make so much money and get so much investment. It's like open source, except you get shafted.

arghwhat · 2026-04-22T11:18:14 1776856694

This is, in my opinion, attempting to say the right thing with entirely the wrong perspective:

The people you say are getting "shafted" always got shafted. Their works are the inspiration for all artists and people who lay their eyes on it - maybe they got paid when they made the work, maybe they managed to sell it, but probably not. And still, other artists (and machines) will use remember and be inspired by it, sometimes to the point of verbatim copy (which is extremely common for human artists as well, with verbatim copy and replication being an actual sought after skill).

(Those about to shout "LICENSING", that's a very new invention and we're terrible at it. What are you going to do, cut out the part of your brain that formed new connections while touching GPL code?)

The person (singular) that is actually getting "shafted" at each use is the artist you didn't hire to do the job of making your new work, because it is their skill that got replaced. A skill build from a lifetime of studying other art and practicing themselves, replaced with a skill build from a machine studying other art and by virtue of some closed loops likely also "practicing" itself.

Still, shafting at large, but the obsession with training data is misplaced in that it entirely ignores how society and art worked beforehand.

At the same time, for most of the things you're likely using the tool for, there would probably would never have been an artist in the first place. For example, if you're just making your powerpoint prettier, or if your commission is ridiculous as it often is and yet only willing to offer a single-digit dollar sum per work which no artist should take (RIP the poor souls that take such work anyway).

tsimionescu · 2026-04-22T12:55:32 1776862532

You're ignoring the biggest problem here: the concentration and extraction of wealth. The sum total of human artists were previously getting those billions of dollars, and now it's OpenAI (and Anthropic, and Google, and Microsoft, and maybe a handful of other players) getting it. Now, maybe it actually used to be hundreds of millions of dollars, and they've grown it to billions, and maybe they deserve some of that - but they're getting all of it. This is the huge issue with this technology, not so much the fact that it exists but that it is being sold by a tiny, tiny amount of people.

torginus · 2026-04-22T13:23:26 1776864206

I wonder what happened to actual artists though - they seem to be doing fine. I'm sure many people as consumers dabbled in AI art, and reached the conclusion after hours that what they made never looked quite right.

Then they found they could commission an actual artist to draw what they wanted for tens or hundreds of dollars, which is a very good price for getting exactly what you want without having to waste your time playing the token slot machine.

mrWiz · 2026-04-22T14:33:23 1776868403

How'd you conclude that artists are doing fine? That doesn't match my experience or observations at all.

fennecfoxy · 2026-04-22T17:04:21 1776877461

Yes, look at how many historical inventors (like the Blue LED, the guys struggling to convince Gates and Balmer to make the Xbox) etc get/got nothing for their efforts compared to the huge sums raked in by the very people actively trying to prevent them from building the idea that made all the money.

AI is hugely beneficial to our species. Our tribalism and "yeah well they earned it!" response to capitalism's rampant production of billionaires is the real problem, not technology.

Why are footballers and movie celebrities paid 50$m a year? There's the answer.

arghwhat · 2026-04-22T13:00:27 1776862827

That's an entirely different problem to artists getting "shafted". Not saying it's not a worthwhile discussion, but it is a separate concern.

Having everyone pay phone/internet, office, streaming, music, etc., subscriptions to large tech companies that are effectively monopolies all do that. It's a bigger, pre-existing issue.

WarmWash · 2026-04-22T13:50:50 1776865850

That's also the wrong framing

AI Labs are getting a tiny cut of the hundreds saved by not hiring an artist.

So regular people save hundreds, the labs get a few dollars, and the artists get nothing.

The artists are still losing, but it's regular people, especially the least able, who are winning.

The coffee shop isn't cutting OAI a $300 check for doing their spring menu. They are pocketing $295 and paying OAI $5.

runarberg · 2026-04-22T14:58:29 1776869909

No. The coffee shop who isn’t paying an artist $300 is gonna get negative reviews and loose customers and money from their bad business decision[1]. I know I would think twice about ordering at a café which uses AI in their marketing, and I am not the only one.

The coffee shop who cannot afford the $300 for an artist and homebrews their design in Microsoft Word is still doing just as before, the coffee shop which can afford it and still pays an artist is still doing fine. The coffee shop which is paying openAI $5 for stolen art, gets to look as cheap as they are.

1: https://www.sfgate.com/food/article/santa-cruz-restaurant-ai...

atonse · 2026-04-22T16:23:18 1776874998

So to save the idea of $300 (logo design with "local" talent is never $300, it is only that cheap if you offshore it), they tried to ruin a business that presumably employs multiple LOCAL people full time (way more than $300) with 1 star reviews to "punish it"

This is an internet mob at its worst. Not an example of anything to emulate, in my opinion.

runarberg · 2026-04-22T16:36:54 1776875814

People hate AI, and this is one of very few ways people have to punish AI. It is bound to happen.

And in either case, this example destroys the framing that coffee shop owners are the ones who benefit from the systemic art theft employed by AI companies.

WarmWash · 2026-04-22T16:07:14 1776874034

Sure, just like every software company using AI is going to go under and every video game using AI will fail?

runarberg · 2026-04-22T16:33:22 1776875602

I am not sure what you mean. The AI backlash is real, and it has real and obvious effects in the real world, with written articles to prove it.

If you are attempting here to shift the focus away from coffee shops (may I remind you, you were the one who brought that as an example) and into video games or software companies, I simply reject that attempt.

That there exists a software company which uses AI in their product and is not failing has no bearing on the framing on how a coffee shop which is too cheap to pay an artist for their logo does indeed look cheap to it’s customers who will be inclined to give that café a negative review or otherwise avoid said café.

WarmWash · 2026-04-22T17:32:37 1776879157

I'm shifting the focus to the reality that exists outside of internet mobs.

99% of people don't recognize AI generated content, and don't particularly care enough to pixel scan every image they see.

You can death grip articles of AI art backlash, but they are all these hyper-narrow one off events. But reality is the general population doesn't really see it or care.[1]

1.https://www.forbes.com/sites/conormurray/2026/04/17/the-no-1...

jstummbillig · 2026-04-22T13:34:01 1776864841

1) Is there a moat? Is there no moat? Are open models as good as the closed ones? I keep getting confused.

2) As one of these artists, I am entirely fine with my entire body of work being used for the purposes of model building. The tech is astonishing and fantastic, and I sincerely hope we will be better through it. As the parent suggested: The idea that people in general previously gave a fuck about compensating artists is hilarious. MS builds models with my work, random people bought, idk, another vacation in Thailand or a fourth pair of shoes with the money that they never spent on art. I know which one I would prefer.

But I do find it particularly juicy that people, who, on the whole, never thought too much about paying artists (which I am also fine with btw!), all of a sudden can't stop wringing their hands about the injustice of it all.

ori_b · 2026-04-22T13:51:20 1776865880

What art have you produced? I did a little googling, and I can't find anything of note in public.

dwayne_dibley · 2026-04-22T13:14:02 1776863642

The same issue applies to fastfood, coffee chains and taxi services. Capitalism.

Lalabadie · 2026-04-22T13:42:32 1776865352

Correct. The way it's being built is exactly all that the US mentality warns about socialism/communism (that giving away your hard work "for the greater good" is a lie and is actually a power grab).

Turns out, if it's American oligarchs profiting from everyone's work, they love the idea!

rasKqa · 2026-04-22T12:06:22 1776859582

Children can draw without ever having been to an art gallery. The IP laundromats need the entire stolen corpus of human labor. The latter is clearly an infringing derivative work.

It will be true no matter who many bribes those who have never created anything pay to Marsha Blackburn (who miraculously reversed her AI skepticism).

I wonder how many threats of being primaried have been issued by the uncreative technocrat thieves.

strulovich · 2026-04-22T12:39:18 1776861558

No they can’t just draw by themselves. It’s extremely bad and random.

Their teachers teach them from a very early age how to hold a carton, and how to draw.

Maybe some miraculous humans will reinvent all drawing of growing by themselves in the jungle, most people will not.

Source: I have kids.

BeetleB · 2026-04-22T15:52:52 1776873172

> The person (singular) that is actually getting "shafted" at each use is the artist you didn't hire to do the job of making your new work, because it is their skill that got replaced.

1% Yes, and 99% No.

Over 99% of uses would not have resulted in hiring someone to do the work had these models not existed as you yourself acknowledge.

arghwhat · 2026-04-22T17:21:11 1776878471

Yes, but this is a bit of an oversimplification. The "99%" tends to be either: 1. Pointless throwaway content which we can just ignore as a new source of noise, 2. Something that could have ended up being a $5 commission[^1] to a kid somewhere out there but now never will be.

Those numbers are also a bit too aggressive - it's easy to miss what kind of gig work exist out there. PowerPoint as a service is a thing on Fiverr for example. A horrible, horrible thing, but a thing none the less.

^1: not at all what art costs, but someone trying to get started might do quick sketches at those prices

BeetleB · 2026-04-22T18:01:44 1776880904

> The "99%" tends to be either: 1. Pointless throwaway content which we can just ignore as a new source of noise, 2. Something that could have ended up being a $5 commission[^1] to a kid somewhere out there but now never will be.

Or 3. Something I made and I actually use, but I would never have paid a kid $5 to do.

Yes, I know of Fiverr and similar sites. Even planned on using it once. Even know someone in another country who made side money from it. And yes, it does suck for them. But none of that changes the fact that well over 99% of uses are not depriving them of any money.

runarberg · 2026-04-22T18:47:40 1776883660

I have seen arguments that a lot of your nr. 3 is basically just addiction. You are making the AI slot machine generate stuff for you and you get to have the sense of accomplishment that comes with thinking you created something without putting in any of the work of actually creating something. To the rest of the world this is indistinguishable from your parent’s nr. 1.