Hacker Newsnew | past | comments | ask | show | jobs | submit | more nicebyte's commentslogin

some of this is what's khronos standards are theoretically supposed to achieve.

surprise, it's very difficult to do across many hw vendors and classes of devices. it's not a coincidence that metal is much easier to program for.

maybe consider joining khronos since you apparently know exactly how to achieve this very simple goal...


> it's not a coincidence that metal is much easier to program for

Tbf, Metal also works on non-Apple GPUs and with only minimal additional hints to manage resources in non-unified memory.


assembler is far from trivial at least for x86 where there are many possible encodings for a given instruction. emitting the most optimal encoding that does the correct thing depends on surrounding context, and you'd have to do multiple passes over the input.


What is a single example where the optimal encoding depends on context? (I am assuming you're just doing an assembler where registers have already been chosen, vs. a compiler that can choose sse vs. scalar and do register allocation etc.)?


“mov rcx, 0”. At least one assembler (the Go assembler) would at one point blindly (and arguably, incorrectly) rewrite this to “xor rcx, rcx”, which is smaller but modifies flags, which “mov” does not. I believe Go fixed this later, possibly by looking at surrounding instructions to see if the flags were being used, for instance by an “adc” later, to know if the assembler needs to pick the larger “mov” encoding.

Whether that logic should belong in a compiler or an assembler is a separate issue, but it definitely was in the assembler there.


Ok fair, I saw that as out of scope for an assembler - since that is a different instruction not just how to encode.


jumps is another one. jmp can have many encodings depending on where the target offset you're jumping to is. but often times, the offset is not yet known when you first encounter the jump insn and have to assemble it.


shameless plug: if you want to understand the content of this post better, first read the first half of my article on jumps [1] (up to syscall). goes into detail about relocations and position-independent code.

[1] https://gpfault.net/posts/asm-tut-4.html


including ai generated illustrations in your articles or presentations is very cringe


yeah no. I've mainlined dwm + dmenu all the way back in 200x, I've written tons of makefiles and have the scars to prove it.

These days I'm off of this minimalism crap. it looks good on paper, but never survives collision with reality [1] (funny that this post is on hn front-page today as well!).

[1] http://johnsalvatier.org/blog/2017/reality-has-a-surprising-...


I like these tools because they are minimalist.. I don't really care for the fact that they are C/make oriented and would rather help someone rewriting them in go or rust than show that I have a non minimal amount of scar tissue to work with a needlessly complicated past.


my comment isn't about things being written using c/make/whatever, it's precisely about the faulty assumption that complexity is needless.


Oh then I totally disagree (or don't understand why you would need to see a psychoanalysis of a blacksmith to evaluate their offerings?). Many projects have places that need some complexity, configuration or advanced tools that doesn't imply the hardware store should stop selling average hammers or make you wade through an aisle of crap from providers like peloton to see if they better meet your needs.

(I.e. show me where in the article he replaced a standard tool like the hammer or pot with a complex one customized to exactly what he wanted to solve or explain why that advanced tool wouldn't suck given that there's a lot more details than one would expect.)


I just went back to fedora+gnome on my PCs from FreeBSD+(tiling wm). I think minimalism is good when your workflow is very focused and you already know the requirements for your stack. But if you have unexpected workflows coming in everyday, the maintenance quickly becomes a burden. Gnome may not be perfect, but it's quite nice as a baseline for a working environment.


Same. I ran dwm for a long time. These days I just run Gnome. You can make it work very similar to a tiling window manager, and all that random crap the world throws at you (printers, projectors, random other monitors, Java programs) "Just Work".


I bet 90% of the reason this is on the front page is the Berkeley mono font. the system itself sucks.


The first time it was posted I said: I hate the system, but I like the presentation.

The system is great if you like to remember the IPs of the sites you need instead of the urls…


How did you draw that conclusion from reading the contents of the link? This is a benchmark.

> We evaluate model performance and find that frontier models are still unable to solve the majority of tasks.


I already knew a lot of what was written here but for some reason reading this made me uninstall bumble.


I was 11 or 12 when I first saw Clint Eastwood and the video + the song lived in my head rent free. Genius work, and aged well.


>. they are an extremely unusual person and have spent upwards of $10,000

eh? doesn't the distilled+quantized version of the model fit on a high-end consumer grade gpu?


The "distilled+quantized versions" are not the same model at all, they are existing models (Llama and Qwen) finetuned on outputs from the actual R1 model, and are not really comparable to the real thing.


That is semantics and they are strongly comparable with their input and output. Distillation is different to finetuning.

Sure, you could say that only running the 600+b model is running "the real thing"...


a distilled version running on another model architecture does not count as using "DeepSeek". It counts as running a Llama:7B model fine-tuned on DeepSeek.


That’s splitting hairs. Most people refer to running locally as in running model on your hardware rather than the providing company.


Except you're not running the model locally, you're running an entirely different model that is deceptively named.

You can pretend it's R1, and if it works for your purpose that's fine, but it won't perform anywhere near the same as the real model, and any tests performed on it are not representative of the real model.


That’s a good point. Thanks!


Pretty sure this is just layman vs academic expert usage of the word conflicting.

For everyone who doesn’t build LLMs themselves, “running a Llama:7B model fined-tuned on DeepSeek.” _is_ using Deepseek mostly on account of all the tools and files being named DeepSeek and the tutorials that are aimed as casual users all are titled with equivalents of “How to use DeepSeek locally”


> “running a Llama:7B model fined-tuned on DeepSeek.” _is_ using Deepseek mostly on account of all the tools and files being named

Most people confuse mass and weight, that does not mean weight and mass are the same thing.


Ok, but it seemed pretty obvious to me that the OP was using the common vernacular and not the hyper specific definition.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: