By the nature of the LLM architecture I think if you "colored" the input via tokens the model would about 85% "unlearn" the coloring anyhow. Which is to say, it's going to figure out that "test" in the two different colors is the same thing. It kind of has to, after all, you don't want to be talking about a "test" in your prompt and it be completely unable to connect that to the concept of "test" in its own replies. The coloring would end up as just another language in an already multi-language model. It might slightly help but I doubt it would be a solution to the problem. And possibly at an unacceptable loss of capability as it would burn some of its capacity on that "unlearning".
One of the reasons I'm comfortable using them as coding agents is that I can and do review every line of code they generate, and those lines of code form a gate. No LLM-bullshit can get through that gate, except in the form of lines of code, that I can examine, and even if I do let some bullshit through accidentally, the bullshit is stateless and can be extracted later if necessary just like any other line of code. Or, to put it another way, the context window doesn't come with the code, forming this huge blob of context to be carried along... the code is just the code.
That exposes me to when the models are objectively wrong and helps keep me grounded with their utility in spaces I can check them less well. One of the most important things you can put in your prompt is a request for sources, followed by you actually checking them out.
And one of the things the coding agents teach me is that you need to keep the AIs on a tight leash. What is their equivalent in other domains of them "fixing" the test to pass instead of fixing the code to pass the test? In the programming space I can run "git diff *_test.go" to ensure they didn't hack the tests when I didn't expect it. It keeps me wondering what the equivalent of that is in my non-programming questions. I have unit testing suites to verify my LLM output against. What's the equivalent in other domains? Probably some other isolated domains here and there do have some equivalents. But in general there isn't one. Things like "completely forged graphs" are completely expected but it's hard to catch this when you lack the tools or the understanding to chase down "where did this graph actually come from?".
The success with programming can't be translated naively into domains that lack the tooling programmers built up over the years, and based on how many times the AIs bang into the guardrails the tools provide I would definitely suggest large amounts of skepticism in those domains that lack those guardrails.
I know it's not what people want to hear but my response to a lot of the comments here is just a general, I agree, it's time to stop using Windows.
They won't let you secure your drive the way you want. They won't let you secure your network the way you want (per the top-level comment about Wireguard). In so doing they are demonstrating not just that they can stop you from running these particular programs but that they are very likely going to exert this control on the entire product category going forward, and I see little reason to believe they will stop there. These are not minor issues; these are fundamental to the safety, security, and functionality of your machine. This indicates that Microsoft will continue to compromise the safety, security, and functionality of your machine going forward to their benefit as they see fit. This is intolerable for many, many use cases.
I think it is becoming clear that Microsoft no longer considers Windows users to be their customers any more. Despite the fact that people do in fact pay for Windows, Microsoft has shifted from largely supporting their customers to out-and-out exploiting their customers. (Granted a certain amount of exploitation has been around for a long time, but things like the best backwards compatibility in the industry showed their support, as well.)
I suspect this is the result of a lot of internal changes (not one big one) but I also see no particular reason at the moment to expect this to change. To my eyes both the first and second derivative is heading in the direction of more exploitation. More treating users like a cattle field and less like customers. When new features or work is being proposed at Microsoft, it is clear that it is being analyzed entirely in terms of how it can benefit Microsoft and users are not at the table.
No amount of wishing this wasn't so is going to change anything. No amount of complaining about how hard it is to get off of Windows is going to change anything; indeed at this point you're just signalling to Microsoft that they are correct and they can treat you this way and there's nothing you will do about it for a long time.
Open source developers are doing Microsoft a big favor when they support Windows and publish Windows builds and installers. It's a substantial effort, and apparently that effort isn't appreciated.
If all open source software dropped support for Windows, it wouldn't really affect the open source community that much. It would definitely cause headaches for Microsoft however.
I agree that supporting Windows helps its ecosystem.
But also open source software on Windows is an important gateway to the free world. When you are already used to Firefox, LibreOffice and VLC, you might as well switch to Linux painlessly, but if those didn't run on Windows, switching to Linux would require relearning everything.
Irrelevant. If it's time to stop using windows, all those windows users will have to relearn everything either way. Whether they do it in a windows environment or a linux one doesn't really change the equation.
A sudden lack of software on windows will increase user migration. If we all keep publishing for windows, users will just stay there because their needs are already met.
> If it's time to stop using windows, all those windows users will have to relearn everything either way.
No, that's the thing; they ideally would only need to replace the OS. Many long years ago, when I switched from Windows to Ubuntu (this was back when it was good), part of why it was so easy is because I mostly kept the same applications. If you use eg. Firefox, VLC, open/libreoffice, audacity, etc., then you can install a new OS, reinstall the same applications, and barely have to change anything. That's huge.
I agree to some extend but we (or at least I) publish open source software (amongst other reasons) because I like helping others and it so happens that most users that could benefit are still using Windows so it doesn't feel right to stop doing that as long as the effort is reasonable (which it is, unlike for macOS).
Nah, it's simpler. Microsoft just lost sense of UX and touch with the reality to their own internal management vibes.
Look at the Windows start menu. It used to be trivial to switch users. Two clicks, one to open the user list, another to switch - done. Now it's four: user panel, three-dots, switch user, pick user.
Look at the login sequence. They want their Windows Hello and they don't care if it works well or not - no way to get a pin or password prompt instantly, you gotta click three times (one to show a method picker, another to pick PIN entry, and lastly one to focus the goddamn field) despite no reasons to hide this UI.
It's not like they're trying to scam or sell user into something. It looks like some internal decision-makers that don't ever dogfood their decisions losing touch with the common sense.
Apple has that too, and this rot spreads elsewhere. But it's not intently malicious, a lot of things simply don't make sense - just total lack of self-reflection capabilities at the corporate level.
> I think it is becoming clear that Microsoft no longer considers Windows users to be their customers any more.
Quite obviously. Look at the out of box new user experience on a Windows 11 Home installation. What you get when you open a new $600 laptop from Best Buy for the first time. The entire thing is designed to drive users towards perpetual monthly recurring subscription billing for various MS services for life (OneDrive, Office, Xbox Live, Xbox game store purchased games, etc). It's a platform which is built atop a rent seeking cloud services ideology that shows no sign of ever letting up.
I think they've been heading that way for a while, and it's only getting clearer.
I've been thinking, and said before, 90s Microsoft was far from perfect, but they at least seemed to care a lot about the quality of Windows. 2020s Microsoft seems to see Windows users as a captive audience they can exploit for whatever the corporate executives fancy at the moment. It seems more like a gradual transition.
In any case, it seems to be getting more clear that Linux is destined to be the best OS for power-users.
I see two basic cases for the people who are claiming it is useless at this point.
One is that they tried AI-based coding a year or two ago, came to the IMHO completely correct at that time conclusion that it was nearly useless, and have not tried it since then to see that the situation has changed. To which the solution is, try it again. It changed a lot.
The other are those who have incorporated into their personal identity that they hate AI and will never use it. I have seen people do things like fire AI at a task they have good reasons to believe it will fail at, and when it does, project that out to all tasks without letting themselves consciously realize that picking a bad task on purpose skews the deck.
To those people my solution is to encourage them to hold on to their skepticism. I try to hold on to it as well despite the incredible cognitive temptation not to. It is very useful. But at the same time... yeah, there was a step change in the past year or so. It has gotten a lot more useful...
... but a lot of that utility is in ways that don't obviate skilled senior coding skills. It likes to write scripting code without strong types. Since the last time I wrote that, I have in fact used it in a situation where there were enough strong types that it spontaneously originated some, but it still tends to write scripting code out of that context no matter what language it is working in. It is good at very straight-line solutions to code but I rarely see it suggest using databases, or event sourcing, or a message bus, or any of a lot of other things... it has a lot of Not Invented Here syndrome where it instead bashes out some minimal solution that passes the unit tests with flying colors but can't be deployed at scale. No matter how much documentation a project has it often ends up duplicating code just because the context window is only so large and it doesn't necessarily know where the duplicated code might be. There's all sorts of ways it still needs help to produce good output.
I also wonder how many people are failing to prompt it enough. Some of my prompts are basically "take this and do that and write a function to log the error", but a lot of my prompts are a screen or two of relevant context of the project, what it is we are trying to do, why the obvious solution doesn't work, here's some other code to look at, here's the relevant bugs and some Wiki documentation on the planning of the project, we should use {event sourcing/immutable trees/stored procedures/whatever}, interact with me for questions before starting anything. This is not a complete explanation of what they are doing anymore, but there's still a lot of ways in which what an LLM can really do is style transfer... it is just taking "take this and do that and write a function to log the error" and style-transforming that into source code. If you want it to do something interesting it really helps to give it enough information in the first place for the "style transfer" to get a hold of and do something with. Don't feel silly "explaining it to a computer", you're giving the function enough data to operate on.
I can see huge utility with AI as a guide and helper.
But not being one leg in the code myself is not something I am comfortable with. It starts feeling like management and not development. I really feel the abdication very strongly and it makes me unable and unwilling to put a hard stamp on quality. I have seen too much hallucination or half missed requirements to put that much trust in AI.
It's the same with code reviews of hard tickets. You can scroll past and just approve, but do you really understand what your colleague has built? Are you really in the driver's seat? It feels to me like YOLOing with major consequences.
I dont but, at all that people doing 20x output have any idea what they are coding. They are just pressing the yolo button and no one, not the engineer, not the AI and not management is in the driver's seat. it is a very scary time.
"Also, it seems like all the Copilot 'connected experiences' are really just a chat window without any real integration with the applications they are embedded in."
I was triple-booked today. Two of the meetings in question should have had significant overlap between attendees. I figured, hey, there's this Copilot thing here, I'll ask it what the overlap is, that's the sort of thing an AI should be able to do. It comes back and reports that there is one person in both meetings, and that "one person" isn't even me. That doesn't seem right. One of the autocompleted suggestions for the next thing to ask is "show me the entire list of attendees" so I'm like, sure, do that.
It turns out that the API Copilot has access to can only access the first ten attendees of the meetings. Both meetings were much larger than that.
Insert rant here about hobbling 2026 servers with random "plucked out of my bum" limits on processing based on the capabilities of roughly 2000-era servers for the sheer silliness of a default 10-attendee limit being imposed on any API into Outlook.
But also in general what a complete waste of hooking up an amazingly sophisticated AI model to such an impoverished view of the world.
"say that AI developers should incorporate more real-world diversity into large language model (LLM) training sets,"
Are you kidding me?
How much more "real-world diversity" could they possibly incorporate into the models than the entire freaking Internet and also every scrap of text written on paper the AI companies could get a hold of?
How on Earth could someone think that AIs speak like this because their training set is full of LLM-speak? This is transparently obviously false.
This is the sort of massive, blinding error that calls everything else written in the article into question. Whatever their mental model of AI is it has no resemblance to reality.
LLM speak isn't even quite the average either. It's something more like the average, then pushed through more training to turn it into the agents we think of today (a fresh-off-the-training-set LLM really is in some sense that "fancy autocomplete" that people called it for a while), then trained by the AI companies to be generally inoffensive and do the other things they want them to do. All of the further actions push the agents away from the original LLM average. The similarity of the "LLM tone" across multiple models and multiple companies, and the fact I don't think this tone has been super directly trained for, strongly suggests that the process of converting the raw LLM into the desirable agents we all use is some sort of strong strange attractor for the LLMs that are pushed through that process.
Maybe they are training for that tone now, either deliberately or accidentally. But my belief that they weren't initially comes from the fact that it's a new tone that I doubt anyone designed with deliberation. It bears strong resemblance to "corporate bland", but it is also clearly distinct from it in that we could all tell those two apart very easily.
There is a study that shows that what the model is doing behind the scenes in those cases is a lot more than just outputting those tokens.
For an LLM, tokens are thought. They have no ability to think, by whatever definition of that word you like, without outputting something. The token only represents a tiny fraction of the internal state changes made when a token is output.
Clearly there is an optimal for each task (not necessarily a global one) and a concrete model for a given task can be arbitrarily far from it. But you'd need to test it out for each case, not just assume that "less tokens = more better". You can be forcing your model to be dumber without realizing it if you're not testing.
High dimensional vectors are thought (insofar as you can define what that even means). Tokens are one dimensional input that navigates the thought, and output that renders the thought. The "thinking" takes place in the high dimension space, not the one dimensional stream of tokens.
But isn't the one dimensional tokens a reflex of high dimensional space? What you see is "sure let's take a look at that" but behind the curtains it's actually an indication that it's searching a very specific latent space which might be radically different if those tokens didn't exist. Or not. In any case, you can't just make that claim and isolate those two processes. They might be totally unrelated but they also might be tightly interconnected.
I assume in practice, filler words do nothing of value. When words add or mean nothing (their weights are basically 0 in relation to the subject), I don't see why they'd affect what the model outputs (except cause more filler words)?
Politeness have impact (https://arxiv.org/abs/2402.14531) so I wouldn't be too fast to make any kind of claim with a technology we don't know exactly how it works.
The existence of science does not obligate us to either receive a double-blind study of massive statistical significance on the exact question we're thinking about or to throw our hands up in total ignorance and sit in a corner crying about the lack of a scientific study.
It is perfectly rational to rely on experience for what screens do to children when that's all we have. You operate on that standard all the time. I know that, because you have no choice. There are plenty of choices you must make without a "data" to back you up on.
Moreover, there is plenty of data on this topic and if there is any study out there that even remotely supports the idea that it's all just hunky-dory for kids to be exposed to arbitrary amounts of "screen time" and parents are just silly for being worried about what it may be doing to their children, I sure haven't seen it go by. (I don't love the vagueness of the term "screen time" but for this discussion it'll do... anyone who wants to complain about it in a reply be my guest but be aware I don't really like it either.)
"Politicians" didn't even begin to enter into my decisions and I doubt it did for very many people either. This is one of the cases where the politicians are just jumping in front of an existing parade and claiming to be the leaders. But they aren't, and the parade isn't following them.
I've been waiting for the article talking about how AI is affecting COBOL. Preferably with quotes from actual COBOL programmers since I can already theorize as well as the next guy but I'm interested in the reports from the field.
While LLMs have become pretty good at generating code, I think some of their other capabilities are still undersold and poorly understood, and one of them is that they are very good at porting. AI may offer the way out for porting COBOL finally.
You definitely can't just blindly point it at one code base and tell it to convert to another. The LLMs do "blur" the code, I find, just sort of deciding that maybe this little clause wasn't important and dropping it. (Though in some cases I've encountered this, I sometimes understand where it is coming from, when the old code was twisty and full of indirection I often as a human have a hard time being sure what is and is not used just by reading the code too...) But the process is still way, way faster than the old days of typing the new code in one line at a time by staring at the old code. It's definitely way cheaper to port a code base into a new language in 2026 than it was in 2020. In 2020 it was so expensive it was almost always not even an option. I think a lot of people have not caught up with the cost reductions in such porting actions now, and are not correctly calculating that into their costs.
It is easier than ever to get out of a language that has some fundamental issue that is hard to overcome (performance, general lack of capability like COBOL) and into something more modern that doesn't have that flaw.
reply