Hacker Newsnew | past | comments | ask | show | jobs | submit | dTal's commentslogin

I keep seeing comments that refer to Iranians as "brown people" - usually to emphasize their perceived "otherness" by the ignorant, as in this case. But Iranians aren't brown, or Arab apart from a small minority, and relatively speaking their culture isn't even that "other" - it would probably feel more familiar to the average American than some European countries even.

Do Americans really hear "Iran" and think of durka-durka from Team America?


Iranians tend to have a little more pigment in their skin and it's not a minority.

I get why you'd say this, Iranians don't have particularly dark skin and some are as white as my English/swedish ancestors.

> Do Americans really hear "Iran" and think of durka-durka from Team America?

Some do. But usually the "killing brown people" is a shorthand for the fact that US policy has mostly focused on immiserating non-western-European nations for the benefit of of the US.

It implies racism at the core of US policy because only Western European nations are considered civilized and deserving of fair international treatment.


Correct, “Iran” literally translates to “Ayran”.

But America is a big place. Americans living in cities probably know a first or second gen Persian, there’s lots of them everywhere. They even have a reality TV show.

Outside the urban archipelago the average person couldn’t tell you the difference from India, Turkey. and everything in between.


*Aryan - Ayran is Turkish buttermilk :)

I love this gorgeous and evocative little time waster and come back to it every now and then. Notes:

It starts out buttery smooth but over time its performance slows to a crawl. Changing window geometry seems to do some sort of garbage collection and it speeds back up. I just hit F11 twice real quick.

The optimal strategy is to try and make the trip parabolically with a single large burn at liftoff.

Gravity physics is of course symmetrical on ascent and descent, so the optimum time to start your deceleration burn is approximately when your downward velocity is equal to whatever your upward velocity was when you stopped burning.


The "car-like handling" is still physically accurate - thrusters automatically align your velocity vector to match your view direction. You can think of it as simply an interface - view direction is both a command and a display.

Sort of. They are deterministic in the same way that flipping a coin is deterministic - predictable in principle, in practice too chaotic. Yes, you get the same predicted token every time for a given context. But why that token and not a different one? Too many factors to reliably abstract.

>Yes, you get the same predicted token every time for a given context. But why that token and not a different one? Too many factors to reliably abstract.

Fixed input-to-output mapping is determinism. Prompt instability is not determinism by any definition of this word. Too many people confuse the two for some reason. Also, determinism is a pretty niche thing that is only necessary for reproducibility, and prompt instability/unpredictability is irrelevant for practical usage, for the same reason as in humans - if the model or human misunderstands the input, you keep correcting the result until it's right by your criteria. You never need to reroll the result, so you never see the stochastic side of the LLMs.


>Fixed input-to-output mapping is determinism. Prompt instability is not determinism by any definition of this word

It really depends on your perspective.

In the real world, everything runs on physics, so short of invoking quantum indeterminacy, everything is deterministic - especially software, including things like /dev/random and programs with nasty race conditions. That makes the term useless.

The way we use "determinism" in practice depends contextually on how abstracted our view of the system is, how precise our description of our "inputs" can be, and whether a chunked model can predict the output. Many systems, while technically a fixed input/output mapping, exhibit an extreme and chaotic sensitivity to initial conditions. If the relevant features of those initial conditions are also difficult to measure, or cannot be described at our preferred level of abstraction, then actually predicting ("determining") the output is rendered impractical and we call it "non-deterministic". Coin tosses, race conditions, /dev/random - all fit this description.

And arguably so do LLMs. At the "token" level of abstraction, LLMs are indeed deterministic - given context C, you will always get token T. But at the "semantic" level they are chaotic, unstable - a single token changed in the input, perhaps even as minor as an extra space after a period, can entirely change the course of the output. You understand this, of course. You call it "prompt instability" and compare it to human performance. But no one would call humans deterministic either!

That is what people mean when they say LLMs are not deterministic. They are not misusing the word. It just depends on your perspective.


But there is no fixed input-to-output mapping in current popuular LLMs.

You mean "corporate inference infrastructure", not LLMs. The reason for different outputs at t=0 is mostly batching optimization. LLMs themselves are indifferent to that, you can run them in a deterministic manner any time if you don't care about optimal batching and lowest possible inference cost. And even then, e.g. Gemini Flash is deterministic in practice even with batching, although DeepMind doesn't strictly guarantee it.

This is all currently irrelevant, making it work well is a much bigger problem. As soon as there's paying demand for reproducibility, solutions will appear. This is a matter of business need, not a technical issue.


It always feels like I just have to figure out and type the correct magical incantation, and that will finally make LLMs behave deterministically. Like, I have to get the right combination of IMPORTANT, ALWAYS, DON'T DEVIATE, CAREFUL, THOROUGH and suddenly this thing will behave like an actual computer program and not a distracted intern.

Like the brain

Nobody said anything about Europeans having a "natural right". Bad enough to derail a conversation with irrelevant political nitpicking, unforgiveable to use a strawman to do so. Boo.

It's not irrelevant.

GP made a comparison between what we're going through and the Industrial Revolution. Ignoring the negatives of that revolution - like by acting as though the "new world" was uninhabited/unused and so Europeans had a right to its resources - seems like a bad idea.


> like by acting as though the "new world" was uninhabited/unused and so Europeans had a right to its resources - seems like a bad idea.

maybe it was a bad idea, but that's what happened.


Also doesn't justify doing the same damn thing again, which is exactly what all the people long on this technology fully expect to be allowed to do. Any further investment they have to do to ensure the outcome will just be chocked up to cost of doing business. And the capital funding all this is in so few hands, and in the hands in particular of such characters that don't concern themselves with not repeating atrocities of the past in new and interesting ways, that it is virtually guaranteed we're on the road to societal scale disruption. 'Tis the reason such inconvenient points are in need of being pounded home until they are impossible to ignore.

> not repeating atrocities of the past in new and interesting ways

sorry, are you suggesting that colonialism and LLMs are equivalent in terms of atrocity? I don't feel like they're really comparable.

> 'Tis the reason such inconvenient points are in need of being pounded home until they are impossible to ignore.

and what do you think is going to happen here? People so basic that this will never happen. At best you gotta create a grassroots political movement with political representation and clear legal aims and get that past the electorate. However see how the casuals lap up generated content for how ambitious an vision that is. LLMs will prevail and even if public boycotts were extreme, it will just move further and further behind the curtain and the end outcome will still be the same.

I don't see how derailing conversations on hacker news by taking issue with a particular analogy to grind a colonial axe is really furthering that. At the end of the day, regardless of the perspective of our identity, we'll get fucked by network effects and rounded out of systems by those with more influence and power. Sometimes by those who even share our perspective. So to use perspective as a point of division just further fragments what needs to be a whole to enact change.


I'm afraid you are misremembering. The movie is explicitly eugenicist. The people of the future are explicitly biologically stupid. The opening transcript is unambiguous:

[Man Narrating] As the 21st century began… human evolution was at a turning point.

Natural selection, the process by which the strongest, the smartest… the fastest reproduced in greater numbers than the rest… a process which had once favored the noblest traits of man… now began to favor different traits.

[Reporter] The Joey Buttafuoco case-

Most science fiction of the day predicted a future that was more civilized… and more intelligent.

But as time went on, things seemed to be heading in the opposite direction.

A dumbing down.

How did this happen?

Evolution does not necessarily reward intelligence.

With no natural predators to thin the herd… it began to simply reward those who reproduced the most… and left the intelligent to become an endangered species.


What is "explicitly eugenicist" in observing that the unprecedented way mankind has dominated its environment has changed the selection pressures we are subject to?

My quest to survive to adulthood and pass on my genes looked nothing like the gauntlet an Homo erectus specimen would have run.


Hmm... this sounds a lot like the old RISC vs CISC argument all over again. RISC won because simplicity scales better and you can always define complex instructions in terms of simple ones. So while I would relish experiencing the timeline in which our computerized chums bootstrap into sentience through the judicious application of carefully selected and highly nuanced words, it's playing out the other way: LLMs doing a lot of 'thinking' using a small curated set of simple and orthogonal concepts.

RISC good. CISC bad. But CISC tribe sneaky — hide RISC inside. Look CISC outside, think RISC inside. Trick work long time.

Then ARM come. ARM very RISC. ARM go in phone. ARM go in tablet. ARM go everywhere. Apple make ARM chip, beat x86 with big club. Many impressed. Now ARM take server too. x86 tribe scared.

RISC-V new baby RISC. Free for all. Many tribe use. Watch this one.

RISC win brain fight. x86 survive by lying. ARM win world.


RISC tribe also sneaky. Hide CISC inside.

The LLM has no accessible state beyond its own output tokens; each pass generates a single token and does not otherwise communicate with subsequent passes. Therefore all information calculated in a pass must be encoded into the entropy of the output token. If the only output of a thinking pass is a dumb filler word with hardly any entropy, then all the thinking for that filler word is forgotten and cannot be reconstructed.

Yeah but not all tokens are created equal. Some tokens are hard to predict and thus encode useful information; some are highly predictable and therefore don't. Spending an entire forward pass through the token-generation machine just to generate a very low-entropy token like "is" is wasteful. The LLM doesn't get to "remember" that thinking, it just gets to see a trivial grammar-filling token that a very dumb LLM could just as easily have made. They aren't stenographically hiding useful computation state in words like "the" and "and".

>They aren't stenographically hiding useful computation state in words like "the" and "and".

When producing a token the model doesn't just emit the final token but you also have the entire hidden states from previous attention blocks. These hidden states are mixed into the attention block of future tokens (so even though LLMs are autoregressive where a token attends to previous tokens, in terms of a computational graph this means that the hidden states of previous tokens are passed forward and used to compute hidden states of future tokens).

So no it's not wasteful, those low-perplexity tokens are precisely spots that can instead be used to do plan ahead and do useful computation.

Also I would not be sure that even the output tokens are purely "filler". If you look at raw COT, they often have patterns like "but wait!" that are emitted by the model at crucial pivot points. Who's to say that the "you're absolutely right" doesn't serve some other similar purpose of forcing the model into one direction of adjusting its priors.


Huh okay, there was a major gap in my mental model. Thanks for helping to clear it up.

Well to be fair the fact that they "can" doesn't mean models necessarily do it. You'd need some interp research to see if they actually do meaningfully "do other computations" when processing low perplexity tokens. But the fact that by the computational graph the architecture should be capable of it, means that _not_ doing this is leaving loss on the table, so hopefully optimizer would force it to learn to so.

> They aren't stenographically hiding useful computation state in words like "the" and "and".

Do you know that is true? These aren’t just tokens, they’re tokens with specific position encodings preceded by specific context. The position as a whole is a lot richer than you make it out to be. I think this is probably an unanswered empirical question, unless you’ve read otherwise.


I am quite certain.

The output is "just tokens"; the "position encodings" and "context" are inputs to the LLM function, not outputs. The information that a token can carry is bounded by the entropy of that token. A highly predictable token (given the context) simply can't communicate anything.

Again: if a tiny language model or even a basic markov model would also predict the same token, it's a safe bet it doesn't encode any useful thinking when the big model spits it out.


I just don’t share your certainty. You may or may not be right, but if there isn’t a result showing this, then I’m not going to assume it.

> stenographically hiding steganographically*

can you prove this?

train an LLM to leave out the filler words, and see it get the same performance at a lower cost? or do it at token selection time?


Low entropy is low entropy. You can prove it by viewing the logits of the output stream. The LLM itself will tell you how much information is encoded in each token.

Or if you prefer, here's a Galilean thought experiment: gin up a script to get a large language model and a tiny language model to predict the next token in parallel; when they disagree, append the token generated by the large model. Clearly the large model will not care that the "easy" tokens were generated by a different model - how could it even know? Same token, same result. And you will find that the tokens that they agree on are, naturally, the filler words.

To be clear, this observation merely debunks the idea that filler words encode useful information, that they give the LLM "room to think". It doesn't directly imply that an LLM that omits filler words can be just as smart, or that such a thing is trivial to make. It could be that highly predictable words are still important to thought in some way. It could be that they're only important because it's difficult to copy the substance of human thought without also capturing the style. But we can be very sure that what they aren't doing is "storing useful intermediate results".


You don't need to compile it yourself though? Unless you want CUDA support on Linux I guess, dunno why you'd need such a silly thing though:

https://github.com/ggml-org/llama.cpp/releases


> dunno why you'd need such a silly thing though

I'm not sure I follow, what alternative to CUDA on Linux offers similar performance?


Ah, 'twas a mere jest, a sarcastic jab that of all the manifold builds provided, the most useful is missing - doubtless for good and practical reasons.

Nevertheless, worth looking at the Vulkan builds. They work on all GPUs!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: