I've been trying out the new model like this: OPENAI_API_KEY="$(llm keys get ope...

simonw · 2026-04-21T19:52:34 1776801154

I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...

  OPENAI_API_KEY="$(llm keys get openai)" \
    uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
    -m gpt-image-2 \
    "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
    --quality high --size 3840x2160

https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!

I think that image cost 40 cents.

makira · 2026-04-21T20:13:31 1776802411

Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:

"Found the raccoon holding a ham radio in waldo2.png (3840×2160).

  - Raccoon center: roughly (460, 1680)                                                                                            
  - Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)                                         
  - Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780                                                                
                                                                                                                                   
  It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "

Which is correct!

cwillu · 2026-04-21T20:32:23 1776803543

I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.

makira · 2026-04-21T20:34:52 1776803692

simonw posted 2 different images: make sure to look at the second one.

cwillu · 2026-04-21T20:35:27 1776803727

Yeah, I noticed that just now, but too late to delete the comment :p

jaggederest · 2026-04-21T22:45:21 1776811521

You had a meta problem, and three, in total: find the raccoon, find the umbrella, find the right link in the comments.

bombcar · 2026-04-22T12:15:05 1776860105

To find Waldo you must first create the Universe.

M3L0NM4N · 2026-04-21T23:20:18 1776813618

We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.

nerdsniper · 2026-04-22T01:00:52 1776819652

There seemed to be more space around the raccoon than most other subjects. Zoomed out it appears as almost a “halo” highlighting the raccoon.

wewtyflakes · 2026-04-21T23:18:58 1776813538

A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!

rattlesnakedave · 2026-04-22T01:58:56 1776823136

To be fair, the average person has fewer than two arms.

cozzyd · 2026-04-22T05:35:12 1776836112

Most people have an ARM in their pockets, nowadays. And possibly on their wrist.

floodfx · 2026-04-22T02:21:21 1776824481

Haha. Underrated comment!

globular-toast · 2026-04-22T05:46:24 1776836784

Finding the raccoon was instant. Finding all the weird AI artifacts is more fun. It's quite fascinating really. As usual it looks impressive at a glance but completely falls apart on closer inspection. I also didn't find any jokes, unless maybe the bridge to nowhere or finger posts pointing both ways counts?

cozzyd · 2026-04-22T02:17:13 1776824233

This is why they're congregating around the first aid and the lost and found

ehnto · 2026-04-22T07:51:41 1776844301

There id a leg that sprouts into part of bush, perhaps that's where people's legs are disappearing to.

prmoustache · 2026-04-22T07:00:57 1776841257

Funny how it can look convincing from far away but once you zoom in you find out most characters have a mix of leprosy and skin cancer.

davebren · 2026-04-21T20:04:06 1776801846

The faces...that's nice that it turned a kid's book into an abomination

Filligree · 2026-04-22T00:56:17 1776819377

By image generation standards this is a ridiculously good result. No surprise that people instantly find the new limits, but they are new limits.

davebren · 2026-04-22T01:22:27 1776820947

It could already copy the art styles from its training data, what is the advancement here?

globular-toast · 2026-04-22T06:01:34 1776837694

But it's also straight up plagiarism and still ridiculously bad on so many levels.

vaulstein · 2026-04-22T04:19:46 1776831586

It's interesting that the raccoon is well defined because it was a part of the request. But none of the other Fauna are.

keithnz · 2026-04-22T02:53:27 1776826407

it's interesting, zoomed out it kind of looks ok, zoomed in.... oh my.

jdironman · 2026-04-22T03:16:58 1776827818

The real NFTs where the images we generated along the way

louiereederson · 2026-04-21T19:59:46 1776801586

The people in this image remind me of early this person does not exist, in the best way

dfee · 2026-04-21T22:44:57 1776811497

fair point, also "this raccoon does not exist"

gpt5 · 2026-04-21T23:04:26 1776812666

I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.

https://postimg.cc/wyxgCgNY

luxpir · 2026-04-22T06:23:46 1776839026

Nice, enjoyed the image as someone who has been to the events. But also easy raccoon placement :)

djmips · 2026-04-22T05:17:14 1776835034

mmmm yummy OSLS?

mirekrusin · 2026-04-22T00:57:18 1776819438

Can it generate non halloween version though?

This lower-is-better danse macabre, nightmares inducing ratio feels like interesting proxy for models capability.

ireadmevs · 2026-04-21T20:05:40 1776801940

I found it on the 2nd image! On the 1st one not yet...

dzhiurgis · 2026-04-22T04:14:43 1776831283

Cost me < 1 cents - https://elsrc.com/elsrc/waldo/wojak.jpg

And this medium quality, high resolution https://elsrc.com/elsrc/waldo/10_wojaks.jpg was 13cents

p.s. aaaand that's soft launch my SaaS above, you can replace wojak.jpg with anything you want and it will paint that. It's basically appending to prompt defined by elsrc's dashboard. Hopefully a more sane way to manage genai content. Be gentle to my server, hn!

botanrice · 2026-04-22T14:29:26 1776868166

Some pretty funny but good examples:

https://elsrc.com/elsrc/waldo/10_schoolsofthought.jpg

https://elsrc.com/elsrc/waldo/10_anthropomorphizedcomputermo...

https://elsrc.com/elsrc/waldo/10_breathoffreshairsittingonad...

https://elsrc.com/elsrc/waldo/10_drizzydrakesdoingthedrakeme...

https://elsrc.com/elsrc/waldo/10_sashringingtrashsingingmash...

Ok i promise I'm done xD

wordpad · 2026-04-22T14:56:41 1776869801

That's way more than 10, around 50

botanrice · 2026-04-22T14:03:52 1776866632

are you using the same prompt the above commenter used? I've been toying around with increasingly ridiculous prompts and it works surprisingly well. It's the new ChatGPT image gen or Nano Banana?

It's pretty good tbh, even with absurd prompts

Barbing · 2026-04-22T03:45:51 1776829551

>I think that image cost 40 cents.

Kinda made me sad assuming the author didn't license anything to OpenAI.

I recognize it could revert (99% of?) progress if all the labs moved to consent-based training sets exclusively, but I can't think of any other fair way.

$.40 does not represent the appropriate value to me considering the desirability of the IP and its earning potential in print and elsewhere. If the world has to wait until it’s fair, what of value will be lost? (I suppose this is where the big wrinkle of foreign open weight models comes in.)

rafram · 2026-04-22T04:06:57 1776830817

License what? The concept of a hidden object search? The only stylistic similarity here is the viewing angle. Where’s Waldo comics are flat, brightly colored line drawings that look nothing like this at all.

Barbing · 2026-04-22T05:16:04 1776834964

Well, I recognized the style from even the new physical books on sale today, but I don’t know art well enough to use a term like flat.

I am not an art expert but I’m perhaps a reasonable consumer and there is possibility of confusion if someone sells AI Where’s Waldo knockoff books at the dollar store, maybe until I take a closer look.

makira · 2026-04-21T19:33:04 1776799984

> though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure

I see an opportunity for a new AI test!

vunderba · 2026-04-21T20:44:53 1776804293

There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.

It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.

simonw · 2026-04-21T19:37:24 1776800244

I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.

marricks · 2026-04-22T00:06:32 1776816392

Like... this has things that AI will seemingly always be terrible at?

At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:

- Nightmarish screaming faces on most people

- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist

- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...

It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...

We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??

fennecfoxy · 2026-04-22T17:01:26 1776877286

No, it won't be. I did indeed get the same problems when trying to generate my own image for it.

However as someone who's mucked about with local image generation as well - I'd say that this is a problem with their implementation, it doesn't resolve fine detail because majority of requests it won't matter/it drastically increases compute requirements.

With local image generation bad features/incorrect fingers/disfigurement etc has been solved for a long time.

I think their new process involves multiple steps including sketching/fleshing out the idea before adding detail. The step that would fix this would be outpainting or similar to tile based upscaling.

From what I understand of image generation models they also struggle with fine detail in general because they aren't really trained for that. However for each tiny chunk of a detailed image like that there's nothing to say they can't allocate a 500x500 chunk for it to work in as its "idea/reference space" and then transpose that into the main image being generated - i.e. generate image features separately rather than all together.

p1esk · 2026-04-22T00:57:06 1776819426

AI will seemingly always be ...

You do realize that the whole image generation field is barely 10 years old?

I remember how I was able to generate mnist digits for the first time about 10 years ago - that seemed almost like magic!

pants2 · 2026-04-21T20:07:17 1776802037

The second 4K image definitely has a raccoon on the left there! Nice.

halamadrid · 2026-04-22T04:12:56 1776831176

Really hard to look at these images given how not human like the humans are. A few are ok, but a lot are disfigured or missing parts and its hard to find a raccoon in here.

vova_hn2 · 2026-04-22T01:28:09 1776821289

Thanks for the image, I will see their faces in my nightmares.

vunderba · 2026-04-22T01:30:56 1776821456

This happens all too frequently when you ask a GenAI model to create an image with a large crowd especially a “Where’s Waldo?” style scenes, where by definition you’re going to be examining individual faces very closely.

hackable_sand · 2026-04-22T03:46:33 1776829593

What about the faces of the people ChatGPT killed?

ritzaco · 2026-04-21T19:40:58 1776800458

haha took me a while to notice that one of the buildings is labelled 'Ham radio'

nerdsniper · 2026-04-22T00:59:03 1776819543

That is a devilishly difficult prompt for current diffusion tasks. Kudos.

arealaccount · 2026-04-21T19:43:34 1776800614

I see the raccoon

ElFitz · 2026-04-21T20:06:48 1776802008

Damn. There’s a fun game app to make here ^^

dymk · 2026-04-22T00:20:27 1776817227

Is there? The moment you look closely at the puzzle (which is... the whole point of Where's Waldo), you notice all the deformities and errors.

ElFitz · 2026-04-22T06:54:21 1776840861

Yes, it’s not there yet. But nothing unsolvable. First thing that comes to mind would be generating smaller portion at the same resolution, then expand through tiling (although one might need to use another service & model for this), like we used to do with Stable Diffusion years ago.

Another option would be generating these large images, splitting them into grids, and using inpainting on each "tile" to improve the details. Basically the reverse of the first one.

Both significantly increase costs, but for the second one having what Images 2.0 can produce as an input could help significantly improve the overall coherence.

amelius · 2026-04-22T08:10:02 1776845402

Yes sounds more like a fun research project instead.

tptacek · 2026-04-21T19:39:29 1776800369

5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."

(I don't think it's right).

ritzaco · 2026-04-21T19:43:23 1776800603

I tried

> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist

and got this. I'm not sure I know what a ham radio looks like though.

https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...

jackpirate · 2026-04-21T19:47:57 1776800877

Also, the racoon it circled isn't in the original.

Aurornis · 2026-04-21T19:59:48 1776801588

I love how perfectly this captures the difficulties of using generative AI for detection tasks.

jetbalsa · 2026-04-22T01:21:45 1776820905

Oh god yes, I've been trying to make a LLM Assisted Magic the Gathering card scanner... its been a hell of a time trying to get it to just OCR card names well....

what · 2026-04-22T03:03:15 1776826995

Why would you use an LLM for OCR?

fennecfoxy · 2026-04-22T16:56:12 1776876972

Because if it's multimodal, oops all transformers and they're pretty much best in class for ocr now, afaik?

jetbalsa · 2026-04-23T20:10:39 1776975039

Yep, Its pretty damn good compared to classic OCR and even more lightweight ones as well that I can run locally. the cards just vary too much over time.

jubilanti · 2026-04-22T13:53:07 1776865987

Because apparently that's what programming is and can only be these days...

angiolillo · 2026-04-21T19:55:12 1776801312

Indeed. I suppose one way to ensure you can find Waldo in any image is to add it yourself.

simonw · 2026-04-21T20:54:37 1776804877

That's excellent. I added it to my post: https://simonwillison.net/2026/Apr/21/gpt-image-2/#update-as...

davecahill · 2026-04-22T02:33:28 1776825208

hilarious - i tried and got the same thing.

there was a very large bear in the first image; when asked to circle the raccoon it just turned the bear into a giant raccoon and circled it.