Hacker Newsnew | past | comments | ask | show | jobs | submit | varispeed's commentslogin

Could 2x RTX5080 work just as well?

2xRTX5080 would be awesome. You'd only be able to run a q6, which it's already pretty good, but moreover you'd be able to use P2P and use Blackwell full speed, which I can't.

I don't know, I've been using Mythos this week quite sceptically and I found it to be incredibly dumb. For instance gave it a dialogue between 3 people and it was constantly mixing up who said what to whom, which looked like early Gemini behaviour. But latest Opus does that too. It would also make nonsensical inference about given papers and only correct itself when pointed out what it said wrong. If that is what US government fears... maybe the fear is that someone follows the dumb things the model suggests.

it feels like it's mostly just tuned to up it's level of capability on long horizon tasks - stop context rot and keep persisting at all costs until a goal is done.

The base intelligence does not feel much greater to me.


This is a ridiculous thing to test on it. Other models are trained on that kind of thing, use those instead.

Fable was designed for _really_ hard software engineering problems. Possibly large, but especially hard. For those tasks, you feel the difference immediately.


No it wasn't, Fable is a general purpose model for use in regular chat, analysis, as well as coding.

And yes, the parent poster is accurate, Fable is just as prone to moronic mistakes as Opus was. Stop being so AI-pilled.

Codex is still a better model, and yes, for the hardest engineering problems. I use Claude for UI/GUIs and Codex for all my backend, because I have 20 years of experience of actual hard engineering, and I can see that Codex writes, cleaner code, and is far more steerable.

Bad engineers think Claude is better because it writes more lines of code and is more "proactive", but lines of code doesn't make a better system.


> Fable is a general purpose model for use in regular chat, analysis, as well as coding

This is a forum filled with experts. Putting marketing aside, in a forum like this, it is most useful to assess models according to the toughest problems in the domain they were specifically refined on. For DeepSeek, that's math. For Claude, that's programming. Gemini and ChatGPT are generalist. Yes, you can use every model for anything you like. But Fable is a bit special, it's very expensive, and very clearly designed for particular types of tasks.

> Fable is just as prone to moronic mistakes as Opus was.

"Just as" is up for debate, but yes, all models are capable of moronic mistakes. That's not helpful information though.

> Codex is still a better model

You're comparing agentic workflows, which relies on a lot more than just the underlying model. It sounds like you're using it like a precision instrument, which is great! It's very different compared to my use cases though, and the ones that Fable seems to excel at. I'm using it for scientific computing, and you really, really want it to one shot a solution. It's either the right algorithm for the task, or the wrong one. So for the hardest problems, it needs to successfully implement a solution in effectively one shot. I use Codex too, but it's often too careless for the delicate tasks. If it gets it wrong, it is really hard to steer it back. You have to start from scratch.

> Bad engineers think Claude is better because it writes more lines of code and is more "proactive".

Think you missed the mark on this one. Not really an engineer, have as much experience as you do in my job. A solution to my problems comprises few lines of code. Fable actually gets it right, first time, every time (so far), but this is with a very long prompt and a bunch of attachments. No other model has done this for me. Not shilling for Anthropic, just impressed. This isn't particularly subjective for me; it is quantitatively measurable.

Don't assume everyone using AI is going to have the same experience you have, or the same types of use cases. And please don't assume that because others have different experiences that it makes them "bad".

Also, Claude has always been mediocre at creative tasks. For your line of work, I would have already recommended Codex hands down.


> This is a forum filled with experts

Half of HN commentators probably work on basic CRUD. Armchair experts, maybe.


I tested it on that too. A problem I usually give a model to test is to optimise already well optimised function that performs certain calculations. I give it reference to CPU instruction set, how instructions can be paired to take advantage of superscalar execution pipeline etc. In that test also it fell on its face by producing code that was demonstrably slower and with extra bug.

Interesting, thanks for sharing. That is something I would have expected it to do well on, unless it tripped the internal rerouting. My experience on computational geometry problems has been universally positive (virtually flawless), and falling back to Opus has been a huge and frustrating step back. Opus has been frequently making errors and regressions, Fable never made a single one.

Did Trump write this personally?

> In fact, our safeguards are so strong that many users have complained that they are overly broad.


"i can has the bestest powers of the world, i is the strongest, i is the badest president ever" what a retard

They are unusable (unless you want to deliberately destroy your codebase). So if Cursor's models are Kimi based, then well. I'll skip them altogether.

Kimi works great in their CLI, but their CLI has a number of workarounds for quirks of their models, including detecting when the model gets into a loop, and reverting to a checkpoint but letting the model compose a "message" to its past self (search their CLI for "BackToTheFuture"...) It doesn't work so well in a harness that doesn't take those quirks into account.

I'm using Composer extensively, and it works great for me. Your experiences are not universal.

They are far from unusable. They aork great for 80-90% of a typical full stack dev. Alot less useful for more noche stuff

I wouldn't skip at least testing the original. Model distilling done by Cursor could be the culprit.

Composer 1.x was poor. The new one is a totally different beast and absolutely fine for day to day.

They're not unusable, they're just bad when compared with all the real frontier models.

I only use composer 2.5 day to day and it works fine with human review.

I had this phone when it was released. I really loved it. But one thing I remember the most was using it as fidgeting toy. Just opening and closing it. So satisfying.

In other words security by obscurity.

Security by ineffective obscurity is worthless but it’s clearly a continuum and not a buzzword that wins the conversation.

For example, if I had a 128bit port number that I randomly rotated my service on, you’d be hard pressed to find my service unless I told you the port - obscurity still but clearly closer to a password. So ipv4 and 16 bit numbers are not because it’s a relatively small space vs the resources needed to map it out quickly (ie equivalent to a weak password and also not suitable for public facing services that need that connection). And obviously relying on this kind of stuff exclusively isn’t wise but it is valuable as an additional barrier an attacker has to overcome and raises the cost of the attack.

I’ll put the anarchist cookbook out there [1] as an example, a book even the original author changed his mind on. Without easy recipes, doing all the things in that book requires you to work to gain that knowledge and that process of working it shapes you into someone who understands and appreciates the consequences of that knowledge and that it’s wise to be careful who you share it with. As is there’s reasonable links between the book and all kinds of mass violence that was more easily perpetrated. Would those people still have been violent? Possibly? Would there have been as much damage? Possibly less.

[1] https://en.wikipedia.org/wiki/The_Anarchist_Cookbook


I am still on Sonoma. Why would I "upgrade" to macOS 27?

So if China attacks Taiwan and NATO intervenes, how Canada will ensure BYD will not remotely brick the charging infrastructure or will not make cars suddenly speed up and crash into oncoming traffic?

From Canada's perspective, China is willing to cooperate even through conflict, and the US is literally threatening to invade.

NATO cannot be called in to defend Taiwan. NATO article 6 makes this perfectly clear:

"For the purpose of Article 5, an armed attack on one or more of the Parties is deemed to include an armed attack: on the territory of any of the Parties in Europe or North America... [or] on the islands under the jurisdiction of any of the Parties in the North Atlantic area north of the Tropic of Cancer..."

The US may invoke ANZUS or treaties with Japan and SK.

If China attacks the US directly, such as attacks on US soil, that might change, but it is highly unlikely that NATO would ever get directly involved.


Same question, but for Tesla and the proposed US invasion of Canada.

(one of those things which the POTUS says that we're all told shouldn't be taken as serious or real, as if that wasn't a massive disqualification for him)


We already have US threatening to invade for absolutely no reason while the American people stand by and keeps arguing about the scandal of the day, I think this argument has sailed for Canadians. The US is now a very unreliable business partner, nothing else.

We can get fucked on both sides but will do business with those that don't want to destroy our economy.


How does it help?

By withholding it from bad actors.

It withholds it from good actors (they cannot use it to harden their code against bad actors) and assumes bad actors don't have access to such tools anyway.

because they don't. That's the whole point.

It's been refusing work not related to cybersecurity and claiming it is related to cybersecurity and then blocking the session.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: