More

nicebyte · 2026-02-10T21:09:44 1770757784

some of this is what's khronos standards are theoretically supposed to achieve.

surprise, it's very difficult to do across many hw vendors and classes of devices. it's not a coincidence that metal is much easier to program for.

maybe consider joining khronos since you apparently know exactly how to achieve this very simple goal...

flohofwoe · 2026-02-10T22:08:57 1770761337

> it's not a coincidence that metal is much easier to program for

Tbf, Metal also works on non-Apple GPUs and with only minimal additional hints to manage resources in non-unified memory.

nicebyte · 2026-02-06T19:26:49 1770406009

assembler is far from trivial at least for x86 where there are many possible encodings for a given instruction. emitting the most optimal encoding that does the correct thing depends on surrounding context, and you'd have to do multiple passes over the input.

jmalicki · 2026-02-07T17:47:22 1770486442

What is a single example where the optimal encoding depends on context? (I am assuming you're just doing an assembler where registers have already been chosen, vs. a compiler that can choose sse vs. scalar and do register allocation etc.)?

chris_swenson · 2026-02-07T22:52:45 1770504765

“mov rcx, 0”. At least one assembler (the Go assembler) would at one point blindly (and arguably, incorrectly) rewrite this to “xor rcx, rcx”, which is smaller but modifies flags, which “mov” does not. I believe Go fixed this later, possibly by looking at surrounding instructions to see if the flags were being used, for instance by an “adc” later, to know if the assembler needs to pick the larger “mov” encoding.

Whether that logic should belong in a compiler or an assembler is a separate issue, but it definitely was in the assembler there.

jmalicki · 2026-02-08T18:50:06 1770576606

Ok fair, I saw that as out of scope for an assembler - since that is a different instruction not just how to encode.

nicebyte · 2026-02-10T21:53:05 1770760385

jumps is another one. jmp can have many encodings depending on where the target offset you're jumping to is. but often times, the offset is not yet known when you first encounter the jump insn and have to assemble it.

nicebyte · 2025-12-29T18:30:09 1767033009

shameless plug: if you want to understand the content of this post better, first read the first half of my article on jumps [1] (up to syscall). goes into detail about relocations and position-independent code.

[1] https://gpfault.net/posts/asm-tut-4.html

nicebyte · on March 3, 2025

including ai generated illustrations in your articles or presentations is very cringe

nicebyte · on Feb 21, 2025

yeah no. I've mainlined dwm + dmenu all the way back in 200x, I've written tons of makefiles and have the scars to prove it.

These days I'm off of this minimalism crap. it looks good on paper, but never survives collision with reality [1] (funny that this post is on hn front-page today as well!).

[1] http://johnsalvatier.org/blog/2017/reality-has-a-surprising-...

snailmailstare · on Feb 21, 2025

I like these tools because they are minimalist.. I don't really care for the fact that they are C/make oriented and would rather help someone rewriting them in go or rust than show that I have a non minimal amount of scar tissue to work with a needlessly complicated past.

nicebyte · on Feb 21, 2025

my comment isn't about things being written using c/make/whatever, it's precisely about the faulty assumption that complexity is needless.

snailmailstare · on Feb 21, 2025

Oh then I totally disagree (or don't understand why you would need to see a psychoanalysis of a blacksmith to evaluate their offerings?). Many projects have places that need some complexity, configuration or advanced tools that doesn't imply the hardware store should stop selling average hammers or make you wade through an aisle of crap from providers like peloton to see if they better meet your needs.

(I.e. show me where in the article he replaced a standard tool like the hammer or pot with a complex one customized to exactly what he wanted to solve or explain why that advanced tool wouldn't suck given that there's a lot more details than one would expect.)

skydhash · on Feb 21, 2025

I just went back to fedora+gnome on my PCs from FreeBSD+(tiling wm). I think minimalism is good when your workflow is very focused and you already know the requirements for your stack. But if you have unexpected workflows coming in everyday, the maintenance quickly becomes a burden. Gnome may not be perfect, but it's quite nice as a baseline for a working environment.

yoyohello13 · on Feb 22, 2025

Same. I ran dwm for a long time. These days I just run Gnome. You can make it work very similar to a tiling window manager, and all that random crap the world throws at you (printers, projectors, random other monitors, Java programs) "Just Work".

nicebyte · on Feb 21, 2025

I bet 90% of the reason this is on the front page is the Berkeley mono font. the system itself sucks.

f1shy · on Feb 22, 2025

The first time it was posted I said: I hate the system, but I like the presentation.

The system is great if you like to remember the IPs of the sites you need instead of the urls…

nicebyte · on Feb 18, 2025

How did you draw that conclusion from reading the contents of the link? This is a benchmark.

> We evaluate model performance and find that frontier models are still unable to solve the majority of tasks.

nicebyte · on Feb 5, 2025

I already knew a lot of what was written here but for some reason reading this made me uninstall bumble.

nicebyte · on Jan 29, 2025

I was 11 or 12 when I first saw Clint Eastwood and the video + the song lived in my head rent free. Genius work, and aged well.

nicebyte · on Jan 28, 2025

>. they are an extremely unusual person and have spent upwards of $10,000

eh? doesn't the distilled+quantized version of the model fit on a high-end consumer grade gpu?

bakugo · on Jan 28, 2025

The "distilled+quantized versions" are not the same model at all, they are existing models (Llama and Qwen) finetuned on outputs from the actual R1 model, and are not really comparable to the real thing.

raxxor · on Jan 29, 2025

That is semantics and they are strongly comparable with their input and output. Distillation is different to finetuning.

Sure, you could say that only running the 600+b model is running "the real thing"...

KolmogorovComp · on Jan 28, 2025

a distilled version running on another model architecture does not count as using "DeepSeek". It counts as running a Llama:7B model fine-tuned on DeepSeek.

HnUser12 · on Jan 28, 2025

That’s splitting hairs. Most people refer to running locally as in running model on your hardware rather than the providing company.

bakugo · on Jan 28, 2025

Except you're not running the model locally, you're running an entirely different model that is deceptively named.

You can pretend it's R1, and if it works for your purpose that's fine, but it won't perform anywhere near the same as the real model, and any tests performed on it are not representative of the real model.

HnUser12 · on Jan 28, 2025

That’s a good point. Thanks!

lovich · on Jan 28, 2025

Pretty sure this is just layman vs academic expert usage of the word conflicting.

For everyone who doesn’t build LLMs themselves, “running a Llama:7B model fined-tuned on DeepSeek.” _is_ using Deepseek mostly on account of all the tools and files being named DeepSeek and the tutorials that are aimed as casual users all are titled with equivalents of “How to use DeepSeek locally”

KolmogorovComp · on Jan 28, 2025

> “running a Llama:7B model fined-tuned on DeepSeek.” _is_ using Deepseek mostly on account of all the tools and files being named

Most people confuse mass and weight, that does not mean weight and mass are the same thing.

lovich · on Jan 28, 2025

Ok, but it seemed pretty obvious to me that the OP was using the common vernacular and not the hyper specific definition.