More

taylorfinley · 2026-05-16T04:31:59 1778905919

I had a similar experience recently while helping my 5 year old daughter vibe code a sandcastle-themed tower defense game (https://sandcastles.finley.lol).

I ended up thinking it might be easier to generate rigged models, animate them, and capture from an iso perspective, then do some kind of pixel art style transfer on the masked sprite sheet. Eventually I realized my kid didn't really care too much about the visuals so I didn't get too far with it.

agentifysh · 2026-05-16T17:26:15 1778952375

That's a cute looking game! I have considered using 3D mesh models but to generate a highly detailed, textured 3d mesh it still costs quite a bit especially when you need to do this at scale

taylorfinley · 2026-04-29T03:52:36 1777434756

Don't have too much fun with this: https://en.wikipedia.org/wiki/EICAR_test_file

tetha · 2026-04-29T05:20:48 1777440048

Do have way too much fun with EICAR:

https://www.youtube.com/watch?v=cIcbAMO6sxo

This guy put the EICAR test string into a barcode and started to scan it on various systems, with rather funny effects.

taylorfinley · 2026-04-16T18:54:42 1776365682

Surely they are testing their optimizations against common benchmarks internally? I bet the "real world task" degradation is larger by some multiple than it appears when measured through a benchmark that is part of the target.

taylorfinley · 2026-04-16T18:28:28 1776364108

I've noticed this and thought about it as well, I have a few suspicions:

Theory 1: Some increasingly-large split of inference compute is moving over to serving the new model for internal users (or partners that are trialing the next models). This results in less compute but the same increasing demand for the previous model. Providers may respond by using quantizations or distillations, compressing k/v store, tweaking parameters, and/or changing system prompts to try to use fewer tokens.

Theory 2: Internal evals are obviously done using full strength models with internally-optimized system prompts. When models are shipped into production the system prompt will inherently need changes. Each time a problematic issue rises to the attention of the team, there is a solid chance it results in a new sentence or two added to the system prompt. These grow over time as bad shit happens with the model in the real world. But it doesn't even need to be a harmful case or bad bugged behavior of the model, even newer models with enhanced capabilities (e.g. mythos) may get protected against in prompts used in agent harnesses (CC) or as system prompts, resulting in a more and more complex system prompt. This has something like "cognitive burden" for the model, which diverges further and further from the eval.

taylorfinley · 2026-04-14T06:34:50 1776148490

I can see a market for virtual copies of incredibly unpopular CEOs, but I don't think Mark would like how people would likely choose to use these digital effigies.

taylorfinley · 2026-04-06T23:19:02 1775517542

I've actually switched back to the web chat UI and copying Python files for much of my work because CC has been so nerfed.

taylorfinley · 2026-04-06T23:15:37 1775517337

I've seen this frequently also

withinboredom · 2026-04-07T07:54:26 1775548466

I suspect it happens when the model's adaptive thinking was too conservative and it could have thought more, but didn't.

taylorfinley · 2026-01-04T19:08:31 1767553711

Right? Just add this to .bashrc:

alias yt-pl='yt-dlp -o "%(channel)s/%(playlist_title)s/%(title)s.%(ext)s" -a playlists.txt'

taylorfinley · 2025-10-01T19:20:15 1759346415

For both of these scenarios, it seems to happen when the context limit is getting full and the context is summarized. I've found it usually works to respond with the right file, i.e. "great, let's apply those changes in @path/to/file", but it may also be a good time to return to an earlier conversation point by editing one of your previous messages. You might edit the message that got you the response with changes not linked to a specific file, including the file path in that prompt will usually get you back on track.

taylorfinley · 2025-08-08T18:56:00 1754679360

probably skips the step where you say "take a look at path/to/file" and the model converts that to a tool call