No one is producing one output token though. And using up gpus for that cache is...

No one is producing one output token though.

And using up gpus for that cache is a pretty big opportunity cost. I highly doubt it's done in vram. That would be insane for the one hour caches.

So its memory + the time it takes to unload/load into vram + the extra cost per output token

Is it a scam? Idk