More

ben-schaaf · 2026-04-29T16:22:50 1777479770

> Software update deletes this memory.

While there was a bug where the session was lost when updating, this was fixed years ago.

sixtyj · 2026-04-29T16:48:47 1777481327

Great, good to know. Thanks. I wasn’t brave enough to test, so I hope you are a human not a dog that tries to prank me :))

ben-schaaf · 2026-04-29T16:20:23 1777479623

Sublime Text has always been paid, it was never free.

ben-schaaf · 2026-04-27T02:32:12 1777257132

There's really an endless list of these optimizations. A few I've used (though not necessarily in rust):

Atoms: Each string can be referenced with a single u32 or even u16, and they're inherently deduplicated.

Bump allocator: your strings are &str, allocation is super fast with limited fragmentation.

Single pointer strings (this has a name, I can't think of it right now): you store the length inside the allocation instead of in each reference, so your strings are a single pointer.

tialaramex · 2026-04-27T15:09:45 1777302585

ColdString is both your "Single pointer string" and a Small String Optimisation on top.

First, on the heap we have a self-indicating length prefix, basically we use the bottom 7 bits of each byte to indicate 7 bits of length and the top bit indicates there are more bits in the next byte. So "ben-schaaf" would be 0x0A then the ASCII for "ben-schaaf"

But, we avoid even having a heap allocation if we have 8 or fewer UTF-8 bytes to encode the text, that's our Small String Optimisation.

To pull this off we specify that our heap allocations will have 4 byte alignment even though they don't need it. This shouldn't be a problem, in fact many allocators never actually deliver smaller alignments anyway.

This means our pointer now has two spare bits, the least significant bits are now always zero for a valid heap pointer. We rotate these bits to the top of the first byte (this varies depending on whether the target is big-endian or little-endian) and we mask them so that for these valid pointers they are 0b10xxxxxx

So, now we can look at the "single pointer" and figure out

If it begins 0b10xxxxxx it really will be a valid pointer, rotate it, mask out that flag bit and dereference the pointer to find the length-prefixed text.

If it begins 0b11111AAA there's a short string here but it didn't need all 8 bytes, the next AAA bytes of the "pointer" are just UTF-8 and conveniently AAA is enough binary for 0 through 7 to be signalled, the exact length we have

If it has any other value the entire 8 bytes of "pointer" is a UTF-8 encoded string

VorpalWay · 2026-04-27T15:31:21 1777303881

Storing the length in the allocation has potential performance tradeoffs too! The most obvious is that taking a substring/string view will need a copy (or use a different type that store the length outside the allocation).

But it also means the CPU has to follow the pointer (and potentially get a cache miss or pipeline stall) to find the length. Having a fat pointer of ptr+length makes a lot of sense for string views, and for owned string buffers with capacity it can mean avoiding a cache miss when appending to the buffer.

It's complicated in other words.

Rendello · 2026-04-27T20:34:59 1777322099

> Bump allocator

You can build an ad-hoc bump allocator by using a String and indexing into it. You can't use &str references though, as a growing String may reallocate elsewhere and invalidate your references (Rust won't even let you try this), so you have to use your own indices. This is the same thing that bump allocator libraries usually do, too. It can be tricky but have great performance gains.

I recently 100x-d the speed of an XML/HTML builder I use internally by rewriting it to only have one thing on the heap, a single String. Every push happens right at the call site linearly, and by passing data through closures the formatting (indentation, etc.) is controllable. My first iteration was written in the least efficient way possible and had thousands of tiny allocations in nested heap objects, it was painfully slow.

MrBuddyCasino · 2026-04-27T07:51:54 1777276314

Atoms: is this similar to interned strings?

locknitpicker · 2026-04-27T08:02:15 1777276935

> Atoms: is this similar to interned strings?

Yes. It is exactly how they are described.

https://docs.rs/string_cache/latest/string_cache/struct.Atom...

> Represents a string that has been interned.

ben-schaaf · 2026-04-27T13:10:31 1777295431

The names of these things are hazy and inconsistent. In Java and C# an interned string is the same type as other strings. Others describe atoms as interned strings, some call them symbols. At my work we call the u16/u32 atoms and interned strings are the single pointer strings described above.

locknitpicker · 2026-04-27T07:58:58 1777276738

> There's really an endless list of these optimizations.

These aren't really optimizations. They are specialized implementations that introduce design and architectural tradeoffs.

For example, Rust's Atom represents a string that has been interned, and it's actually an implementation of a design pattern popular in the likes of Erlang/Elixir. This is essentially a specialized implementations of the old Flyweight design pattern, where managing N independent instances of an expensive read-only object is replaced with a singleton instance that's referenced through a key handle.

I would hardly call this an optimization. It actually represents a significant change to a system's architecture. You have to introduce a set of significant architectural constraints into your system to leverage a specific tradeoff. This isn't just a tweak that makes everything run magically leaner and faster.

dryarzeg · 2026-04-27T11:55:27 1777290927

> everything run magically leaner and faster

In my opinion, there's no magic in the software engineering. Everything (or almost everything) is a system that can be described, explained, modified and so on. Applications, libraries, operating systems, kernels, CPUs/RAM/GPU/NPU/xPU/whatever silicon there is, ALUs/etc, transistors, electricity, physics... That's nowhere near "magic". There's always some trade-offs, it's just that you may not be aware of them initially.

ben-schaaf · 2026-04-27T08:44:32 1777279472

You might want to refresh your understanding of the word optimisation. Changing a system to be more effective/efficient is optimisation, how big that change is makes no difference.

ben-schaaf · 2026-04-26T01:59:25 1777168765

To be fair, lightning only looks sensible because it never did anything other than USB2 and power delivery.

gsnedders · 2026-04-26T06:26:04 1777184764

A few devices do support USB 5Gbps over Lightning!

ben-schaaf · 2026-04-26T00:29:07 1777163347

UBO lite has a long list of all the types of filters that aren't possibly under MV3: https://github.com/uBlockOrigin/uBOL-home/wiki/Frequently-as...

Not sure about page load, but CPU time is about the same between the two: https://x.com/gorhill/status/1792648742752981086/photo/1

lxgr · 2026-04-27T09:44:49 1777283089

This list only applies to Chrome, so it's completely irrelevant when talking about Firefox.

ben-schaaf · 2026-04-22T07:38:05 1776843485

> Why didn't they stream Neflix on Ubuntu they ship with?

It's not exactly impressive to say you can stream 20 hours of 540p video. Netflix DRM is awful for Linux users.

Though I agree it would have been better to show a benchmark of youtube or similar for Linux.

ben-schaaf · 2026-04-22T00:42:05 1776818525

Socamm is all about data centre capacity. It starts at 128GB and you can only order it in bulk(?) from data centre providers.

On the other hand I can go to a local store and buy a stick of 32GB lpcamm2.

throwaway85825 · 2026-04-22T00:58:04 1776819484

There's no reason they couldn't use smaller capacity modules. SOCAMM has better area efficiency and z height, both highly relevant for thin and light laptops.

ben-schaaf · 2026-04-22T01:58:44 1776823124

There's a big reason they shouldn't do that though: Laptops have already standardized enough on LPCAMM2 for the modules to be widely available.

throwaway85825 · 2026-04-22T06:03:35 1776837815

The form factor is only a tiny part of the RAM module price. DRAM already isn't compatible gen to gen so you might as well go with the most optimized form factor

ben-schaaf · 2026-04-22T06:45:23 1776840323

I agree, competition and economies of scale are much larger factors. Custom RAM SKUs for a single manufacturer with no competition will obviously cost more than an industry standard part. And you'll be locked to buying RAM upgrades from framework themselves.

throwaway85825 · 2026-04-22T10:53:45 1776855225

SOCAMM is a jedec standard nvidia is already shipping in several products.

ben-schaaf · 2026-04-19T10:02:00 1776592920

Correct me if I'm wrong, but that reserved memory is for the framebuffer? The iBoot bootloader also reserves some memory for the framebuffer.

dGPUs bring their own VRAM because it's a different type of memory, allowing them to get higher performance than they could with DDR. The M4 Max requires 128GB of LPDDR5X to reach its ~500GB/s bandwidth. The RX Vega 64 had that same bandwidth in 2017 with just 8GB of HBM2.

fc417fc802 · 2026-04-19T10:51:42 1776595902

Nope, the reserved memory is what's available to use from the various APIs (VK, GL, etc). More recently there's OS support for flexible on demand allocation by the GPU driver.

Of course the APIs have allowed you to make direct use of pointers to CPU memory for something like a decade. However that requires maintaining two separate code paths because doing so while running on a dGPU is _extremely_ expensive.

kimixa · 2026-04-19T18:19:03 1776622743

As someone that's worked on GPU drivers for shared memory systems for over 15 years, supporting hardware that was put on the market over 20 years ago, and they've "always" (in my experience) been able to dynamically assign memory pages to the GPU.

The "reserved" memory is more about the guaranteed minimum to allow the thing to actually light up, and sometimes specific hardware blocks had more limited requirements (e.g. the display block might require contiguous physical addresses, or the MMU data/page tables themselves) so we would reserve a chunk to ensure they can actually be allocated with those requirements. But they tended to be a small proportion of the total "GPU Memory used".

Sure, sharing the virtual address space is less well supported, but the total amount of memory the GPU can use is flexible at runtime.

ben-schaaf · 2026-04-19T04:04:16 1776571456

> the thought and idea is theirs, it was communicated

Are they? I don't know how much they used AI, the entire article could be written from a one sentence prompt and so I'd argue that the thoughts and ideas are not their own.

This isn't like using a spell checker, it's like using a ghost writer.

notepad0x90 · 2026-04-22T15:46:29 1776872789

I have no reason to believe the idea wasn't theirs, neither do you. It's just another dialect, or jargon-set being used for communication. that's about as much as anyone could claim to prove about the post. but here we are debating about speculative things that mean nothing, other than some anti-ai crusading.

I think efforts are better spent boiling oceans, or getting angry at the sun.

ben-schaaf · 2026-04-15T13:19:37 1776259177

That sounds like it's only helpful for ddos mitigation, in which case the attacker could trivially synthesize a correct checksum.

phire · 2026-04-15T14:05:46 1776261946

You don't have to use a publicly documented checksum.

If you use a cryptographically secure hashing algorithm, mix in a secret salt and use a long enough checksum, attackers would find it nearly impossible to synthesise a correct checksum.

ben-schaaf · 2026-04-15T14:16:54 1776262614

I don't follow. The checksum is in "plain text" in every key. It's trivial to find the length of the checksum and the checksum is generated from the payload.

Others have pointed out that the checksum is for offline secret scanning, which makes a lot more sense to me than ddos mitigation.

phire · 2026-04-16T00:16:34 1776298594

I'm not sure it's a good idea.

But it's trivial to make a secret checksum. Just take the key, concatenate it with a secret 256-bit key that only the servers know and hash it with sha256. External users might know the length of the checksum and that it was generated with sha256. But if they don't know the 256-bit key, then it's impossible for them to generate it short of running a brute force attack against your servers.

But it does make the checksum pretty useless for other usecases, as nobody can verify the checksum without the secret.

ben-schaaf · 2026-04-16T02:51:26 1776307886

Ah that makes sense. I wouldn't call that a checksum though; that's a signature :)

phire · 2026-04-16T03:16:43 1776309403

I don't think it counts as a signature, because it can't be verified without revealing the same secret used to create it.

ben-schaaf · 2026-04-16T03:47:02 1776311222

You're right, the correct term seems to be MAC (Message Authentication Code).