In utf-8, bytes (uint8_t) may not represent a whole "code point". A code point b...

_a_a_a_ · on Dec 24, 2023

Yes, I understand a little about Unicode in this kind of problem, but a code point is an individual logical item even if it is composed of multiple bytes; being a kind of 'string' in itself. I should have asked more carefully, what would be a better system in your view?

Thanks for the link, will check it out after Christmas.

lor_louis · on Dec 27, 2023

I personally believe that Swift's strings where graphemes are the smallest indexable unit are the gold standard for writing logic that might truncate multilingual text. It's still not perfect though, they add overhead and updates to Unicode might change behaviour so there's that but it should handle most cases gracefully.