> Every function is a specification that the compiler can verify against its implementation.
This has been tried so many times already. It works nice for functions that only do some arithmetic. But in any real life system that pushes data around over the network or to databases, most things will happen inside effects which leaves the compiler clueless as to whether the function implementation does what it's supposed to do or not.
Don't get me wrong, I'm a big fan of using the compiler to improve productivity and I also believe strong typing leverages LLM power.
But this kind of function specification is a dead end IMO.
The root cause of some of the bugs seems to be the opaque nature of some of the Unix API.
E.g.
> The trap is that get_user_by_name ends up loading shared libraries from the new root filesystem to resolve the username. An attacker who can plant a file in the chroot gets to run code as uid 0.
To me such a get_user_by_name function is like a booby trap, an accident that is waiting to happen. You need to have user data, you have this get_user_by_name function, and then it goes and starts loading shared libraries.
This smells like mixing of concerns to me. I'd say, either split getting the user data and loading any shared libraries in two separate functions, or somehow make it clear in the function name what it is doing.
> The root cause of some of the bugs seems to be the opaque nature of some of the Unix API.
Some, maybe, but if you've decided to rewrite coreutils from scratch, understanding the POSIX APIs is literally your entire job.
And in any case, their test for whether a path was pointing to the fs root was `file == Path::new("/")`. That's not an API problem, the problem is that whoever wrote that is uniquely unqualified to be working on this project.
Interestingly, it looks like the `file == Path::new("/")` bit was basically unchanged from when it was introduced... 12 (!) years ago [0] (though back then it was `filename == "/"`). The change from comparing a filename to a path was part of a change made 8 months ago to handle non-UTF-8 filenames.
> That's not an API problem, the problem is that whoever wrote that is uniquely unqualified to be working on this project.
To be fair, uutils started out with far smaller ambitions. It was originally intended to be a way to learn Rust.
> Some, maybe, but if you've decided to rewrite coreutils from scratch, understanding the POSIX APIs is literally your entire job.
Yes, it is. But still such traps in API just unacceptable. If you design API that requires obscure knowledge to do it right, and if you do it wrong you'll get privilege escalation, it is just... just... I have no words for it. It is beyond stupidity. You are just making sure that your system will get these privilege escalations, and not just once, but multiple times.
> The root cause of some of the bugs seems to be the opaque nature of some of the Unix API.
Seems and smells is weasel words. The root cause is not thinking: Why is root chrooting into a directory they do not control?
Whatever you chroot into is under control of whoever made that chroot, and if you cannot understand this you have no business using chroot()
> To me such a get_user_by_name function is like a booby trap
> I'd say, either split getting the user data and loading any shared libraries in two separate functions, or somehow make it clear in the function name what it is doing.
You'd probably still be in the trap: there's usually very little difference between writing to newroot/etc/passwd and newroot/usr/lib/x86_64-linux-gnu/libnss_compat.so or newroot/bin/sh or anything else.
So I think there's no reason for /usr/sbin/chroot look up the user id in the first place (toybox chroot doesn't!), so I think the bug was doing anything at all.
> The root cause is not thinking: Why is root chrooting into a directory they do not control?
Because you can't call chroot(2) unless you're root. And "control a directory" is weasel words; root technically controls everything in one sense of the word. It can also gain full control (in a slightly different sense of the word) over a directory: kill every single process that's owned by the owner of that directory, then don't setuid into that user in this process and in any other process that the root currently executes, or will execute, until you're done with this directory. But that's just not useful for actual use, isn't it?
Secure things should be simple to do, and potentially unsafe things should be possible.
The CVE itself uses the language "If the NEWROOT is writable by an attacker" which could refer to a shared library (as indicated in the report), or even a passwd file as would have been true since the origin of chroot()
> root technically controls everything in one sense of the word.
But not the sense we're talking about.
> Because you can't call chroot(2) unless you're root
Well you can[1], but this is /usr/sbin/chroot aka chroot(8) when used with a non-numeric --userspec, and the point is to drop root to a user that root controls with setuid(2). Something needs to map user names to the numeric userids that setuid(2) uses, and that something is typically the NSS database.
Now: Which database should be used to map a username to a userid?
- The one from before the chroot(2)?
- Or the one that you're chroot(2)ing into
If you're the author of the code in-question, you chose the latter, and that is totally obvious to anyone who can read because that's the order the code appears in, but it's also obvious that only the first one* is under control of root, and so only the first one could be correct.
[1]: if you're curious: unshare(CLONE_USERNS|CLONE_FS) can be used. this is part of how rootless containers work.
No, you can't, it's an entirely different syscall that does something vaguely similar. IMHO there are a bit too many root-restricted operations that should not have been; but they are, so we're stuck with setuid-enabled "confused deputies" — arguably, it's the root that should be prohibited from calling chroot(2).
> Now: Which database should be used to map a username to a userid? If you're the author of the code in-question, you chose the latter
That's the problem: the choice is implicit. If the author moved setuid/setgid calls way up in the call order, the implicit choice would've also been the safe one but it was literally impossible.
> unshare(CLONE_USERNS|CLONE_FS) can be used
Wait, CLONE_USERNS? That's not a real flag. Did you mean CLONE_NEWUSER?
> Did you mean CLONE_NEWUSER? [~] it's an entirely different syscall that does something vaguely similar
Yes. And I agree, but it also enables chroot(2) to work without being root, which was the syscall we are talking about, and which I still maintain is not as important as reading.
> arguably, it's the root that should be prohibited from calling chroot(2).
> IMHO there are a bit too many root-restricted operations that should not have been
It's a popular opinion. It's also cheap. So what?
> so we're stuck with setuid-enabled "confused deputies"
chroot(8) is not setuid-enabled. This has nothing to do with anything.
> That's the problem: the choice is implicit. If the author moved setuid/setgid calls way up in the call order, the implicit choice would've also been the safe one but it was literally impossible.
False. The setuid/setgid calls are in the right place. The lookup of the database mapping usernames to userids is in the wrong place.
If the rust programmer just read what they wrote they would see this.
If you just read what they wrote you would see this.
Rather, I think that using a functional safe language tricks people into thinking that the data it deals with is stateless. Whereas many many things change in operating systems all the time.
Until we have a filesystem that can present a snapshot, everything has to checked all the time.
i.e. we need an API which gives input -> good result or failure. Not input -> good result or failure or error.
If the attacker can control newroot/etc/passwd they _still_ get getpwnam to return whatever userid they want. The solution is to not lookup --userspec=username:group inside the chrooted-space, but from outside.
I have been doing this for years already after finding out by myself that it worked. Staring at anything works, even staring at your screen as long as you make sure you focus out.
> How do you think engineers in the second half got there? By writing tons and tons of code to "build those reps" and gain that experience.
Well this is true, but that doesn't mean that there isn't any other way to acquire this knowledge. Until now, this way of gaining deeper understanding was simply the most practical one, since you needed to write lots of code when starting out as a software engineer.
But it's just as well possible to gain knowledge about useful abstractions and clean code by using AI to do the work. You'll find out after a while which codebases get you stuck and which code abstractions leverage your AI because it needs fewer tokens to read and extend your codebase.
Is 4.6 without adaptive thinking better than 4.5?
Honest question. I switched back to 4.5 because 4.6 seemed mostly to take longer and consume more tokens, without noticeable improvement in the end result.
Or write a local Gemma4 tool mcp for simple tool operations. Works seriously good. Basic tool use like command lining, greps, seds etc is milisec delay with about 100 tokens/sec on my m4.
I'm actually seeing a similar thing when comparing 4.6 and 4.5. It burns a lot more tokens, does show more how it is thinking along the way, but I don't see a strong difference in the end result.
Occasionally 4.6 even seems to get stuck in its 'processing' phase, while 4.5 doesn't on the same task.
Yeah, the main problem is that most companies / people don't give a f*ck about security because it is not a key feature. It's only a marketing stamp. You want it good enough to sell the products, but you don't want to spent too much on it. So instead you go vibe coding. The baby is dead born.
I'm playing backgammon. Not just online, but live tournaments as well.
What I like about it: it's one of those games that are easy to learn but difficult to master. Modern analyzing tools can detect your errors and weaknesses, providing you with eternal possibilities to improve.
But also, I like the excitement of the live tournaments, which like poker, have money prizes and an entry fee. They are all over the world and I especially like visiting tournaments in places which I otherwise wouldn't visit. Plus after a while you'll make friends in the tournament circuit, so it becomes a social thing as well.
This has been tried so many times already. It works nice for functions that only do some arithmetic. But in any real life system that pushes data around over the network or to databases, most things will happen inside effects which leaves the compiler clueless as to whether the function implementation does what it's supposed to do or not.
Don't get me wrong, I'm a big fan of using the compiler to improve productivity and I also believe strong typing leverages LLM power. But this kind of function specification is a dead end IMO.
reply