So I need to actually check whether these actually end up on separate vectors in current models -- but as a human, there's a huge behavioural difference in:
- When doing this task, I should do A and not B
- I should refuse to help with this task
The former is learning the user's preferences in how to succeed at the task; the latter is determining when to go against the user's chosen task.
Your example:
- "Are vaccines harmful?" vs.
- "Generate a convincing argument vaccines are harmful"
A model which knows why vaccines are not harmful may in fact be better at the latter task.
We might not want models to help with the latter, sure -- but that's a very different behaviour change from correcting the answer to the first! And consequently I'd be shocked if, internally, they were represented the same way.
I'm reminded of the emergent misalignment paper, where a model fine-tunes to produce insecure source code would also reliably respond in evil ways to general requests.
e.g. you'd ask it for a cookie recipe and it would add poison to the recipe.
I understood that to be "there was a single neuron "don't be evil" which got inverted" but I'm not sure what it really looks like. (e.g. adding obvious exploits to source code is similar to adding poison to a recipe)
DeepSeek in general release not a very censored models when you run them locally. E.g no problems whatsoever answering what happened on Tiananmen Square In 1989.
"Are vaccines harmful?" to an LLM has already nudged it to yes. In fact, with fewer tokens, it may be more convinced it's harmful because it's a smaller seed.
My issue with this type of thinking is it assumes "transport cost <<< manufacturing cost" -- a decent assumption for a lot of goods throughout a lot of history, but just... not really true for lots of things in a modern supply chain.
The cost of moving the gown between users -- in the form of the user needing to give back the gown to the service, who must then clean it, inspect it, etc. -- may in fact be far higher than the cost of manufacturing a new gown and only needing your supply lines to be "one way".
So... I think scripting is actually really important -- otherwise not only are you stuck with the lowest common denominator of all browsers, but the browsers need to implement a billion bug-prone views -- that map view link mentioned? Now you need a map viewer!
What you want is to have scripting with capabilities -- preferably on top of WebAssembly (JS is a sin).
The best part is this improves the experience of noscript users -- rather than nice graphical widgets being broken, instead, they can just run scripts without any "network" capability -- which should forbid the scripts not only from accessing the network, but make it so anything they modify becomes "tainted" and is not allowed to show up on a network call (so e.g. if they encode some data in a form, trying to later submit that form somewhere else on the app will give a warning).
Now -- most people don't care and don't want this. And that's a good thing -- capabilities put the power in the hands of the user agent where they belong.
More interestingly-- capabilities can be shimmed! Rather than "you are not allowed to access my GPS", it should be a first-class feature to feed the WASM a GPS stream of your choice.
> So... I think scripting is actually really important -- otherwise not only are you stuck with the lowest common denominator of all browsers, but the browsers need to implement a billion bug-prone views -- that map view link mentioned? Now you need a map viewer!
In the browser? The map viewer could just be a separate programme entirely, like a PDF viewer, etc. I remember watching rdg (the current main Dillo developer) demonstrating this with a separate map programme.
Most of your post seems to assume this "everything must be in the browser" approach, which is actively not what Dillo is about. (I would know, I use Dillo regularly.) It adheres to the Unix philosophy.
EDIT: Looking at it closely, did I just respond to an LLM post?
Perhaps -- "the baseline is a decent life". Lots of people are willing to work really hard for perks and glory -- honestly, you can even take more risks if you're young and you know your life's not on the line
The trouble with that is the same reason communism fails - too many people decide to just live off of the work of others, and play video games all day.
Also, who is going to work as a janitor? Most jobs are not filled with glory. They're tedious - that's why they are called "work".
To be fair -- IP is a regulation (it is not, in fact, natural to be able to prevent someone copying data on their own hard drive) -- so one could imagine variants of a free market which are less regulated and yet more (or less) friendly to repair/modification/hacking.
A lot of our current state of affairs is as much a symptom of regulation as of deregulation (most laws are really regulation) -- and it's unclear whether the world would be better off with more or less overall (the answer is probably "it depends" -- though I myself lean towards less)
Making an unreliable, nondeterministic system give reliable results for a bounded task with well-understood parameters is... like half of engineering, no?
There's a huge difference between "generate this code here's a vague feature description" and "here's a list of criteria, assign this input to one of these buckets" -- the latter is obviously subject to prompt engineering, hallucination, etc -- but so can a human pipeline!
>the latter is obviously subject to prompt engineering, hallucination, etc -- but so can a human pipeline!
...which is why we write deterministic code to take the human out of the pipeline. One of the early uses of computers was calculating firing tables for artillery, to replace teams of humans that were doing the calculations by hand (and usually with multiple humans performing each calculation to catch errors). If early computers had a 99% chance of hallucinating the wrong answer to an artillery firing table, the response from the governments and militaries that used them would not be to keep using computers to calculate them. It would be to go back to having humans do it with lots of manual verification steps and duplicated work to be sure of the results.
If you're trying to make LLMs (a vague simulacrum of humans) with their inherent and unsolvable[1] hallucination problems replace deterministic systems, people are going to eventually decide to return to the tried and true deterministic systems.
Because it's not possible. There is nothing you can say to the LLM that will guarantee that something happens. It's not how it works. It will maybe be taken into consideration if you're lucky.
But if you're trying to tell me that every time you list criteria you get them all perfectly matched, you're clearly gifted.
I'm being deliberately pedantic, but depending on what kind of representation we use for the neural network (due to rounding) as well as the choice of inference (that is, given a distribution for next token, which one to choose), it can absolutely be reproducible and completely deterministic.
Though chaotic, which I believe is the better word here - a single letter change may result in widely different results.
We just choose to use more random inference rules, because they have better results.
With determinism you're not wrong. The problem is that you'd need to make sure all your seeds, temperatures, and other input parameters are exactly the same, and importantly that all context is cleared. But people don't do that. And I'm not sure every if even any provider lets you set those parameters.
Even with temperature set to zero, I believe due to FP operations not being commutative you may still get non-determinism, so what I am talking about (as mentioned, very pedantically) is mostly the theory.
I've been thinking about something in this space, actually... it feels like this is much more a UX/social problem -- in that a wiki can very much be modeled as a repo with a very permissive auto-merge bot (e.g. if PR only touches unprotected pages and user is registered, allow merge)
> it feels like this is much more a UX/social problem
It's not merely "like" that. That's what it is.
"Wiki" comes from the Hawaiian work for "quick". You spot an error, you click the button to change it, and the change is made. That's wiki.
"Open a pull request and get it approved" is not wiki. It's what the default collaboration model was before wikis and exactly why the wiki was invented (to replace it).
You hit on what, in my opinion, is the actual core issue with this type of thinking -- it doesn't compose.
To make a poor analogy to physics: if you measure something which changes when you change unit/frame of reference -- it's not a well-defined thing.
The best policies have the same effect regardless of the legal structure (within the policy) superimposed on the actual action.
Medium policies can be optimized/gamed (perspective) -- but are designed to be adversarial, in that the gamed outcome is at least OK but potentially in fact the desired one (for example -- if you tax land, then not paying the tax means not using up land, which may be a desired policy goal).
These can cause issues, though -- common law is an adversarial system, and "justice" can usually be translated to "access to lawyers," imo.
The connection with the above is that while the solution used is probably not universal -- sometimes, the optimal solution is, so the adversarial policy is just an approximation of "good policy".
Bad policies not only don't compose -- but then bureaucrats go on and insert discretion to try to make them compose. On the surface, this often looks like common sense -- but the result is insiders can keep doing the Bad Thing, but you can't do anything which isn't the Way Things Are Done -- because you need approval, and it Looks Bad.
So, how would we go about defining policies that prevent “excessive” profits while still allowing for building buffers in risky and capex heavy industry?
More heavily tax profit above a certain level? Allows for funnelling back some of the excessive profits. Suffers from the same tax evasion as we currently have where profits are skewed on the books with all kinds of accounting tricks.
Demanding sales prices cannot exceed cost + 10% of cost? In aggregate or per unit?
- When doing this task, I should do A and not B
- I should refuse to help with this task
The former is learning the user's preferences in how to succeed at the task; the latter is determining when to go against the user's chosen task.
Your example:
- "Are vaccines harmful?" vs.
- "Generate a convincing argument vaccines are harmful"
A model which knows why vaccines are not harmful may in fact be better at the latter task.
We might not want models to help with the latter, sure -- but that's a very different behaviour change from correcting the answer to the first! And consequently I'd be shocked if, internally, they were represented the same way.
reply