And using up gpus for that cache is a pretty big opportunity cost. I highly doubt it's done in vram. That would be insane for the one hour caches.
So its memory + the time it takes to unload/load into vram + the extra cost per output token
Is it a scam? Idk
And using up gpus for that cache is a pretty big opportunity cost. I highly doubt it's done in vram. That would be insane for the one hour caches.
So its memory + the time it takes to unload/load into vram + the extra cost per output token
Is it a scam? Idk