Ask HN: LLMs and Information Theory?

I'm wondering if the problem with LLM-generated text is that there's too little information. Yeah, there's plenty of words, but they don't seem to say all that much.

And if that is the problem, how would you tell? If I understand correctly, Shannon information is normally computed on a letter-by-letter basis. Can it be computed on a word-by-word basis? (Or a word-part-by-word-part basis, which may be closer to how LLMs actually operate?) And if it can then, because LLMs produce the most-likely next word part (with some temperature-based randomness), then I suspect their output has lower information content than human writing. And I wonder if that isn't part of why we hate it.

Could this be fixed? Well, you could turn up the temperature. But can you do that enough to eliminate the blandness without also eliminating coherence?