Hacker Newsnew | past | comments | ask | show | jobs | submit | tekacs's commentslogin

It wasn't even the local-ness so much. Even if they stored at remotely it would be okay like ChatGPT or Claude but unlike the others for a long time the only way to let it store history on their servers was also allowing them to train on it. I haven't checked if it's changed.

I think this is so relevant, and thank you for posting this.

Of course it's trivially NOT true that you can defend against all exploits by making your system sufficiently compact and clean, but you can certainly have a big impact on the exploitable surface area.

I think it's a bit bizarre that it's implicitly assumed that all codebases are broken enough, that if you were to attack them sufficiently, you'll eventually find endlessly more issues.

Another analogy here is to fuzzing. A fuzzer can walk through all sorts of states of a program, but when it hits a password, it can't really push past that because it needs to search a space that is impossibly huge.

It's all well and good to try to exploit a program, but (as an example) if that program _robustly and very simply_ (the hard part!) says... that it only accepts messages from the network that are signed before it does ANYTHING else, you're going to have a hard time getting it to accept unsigned messages.

Admittedly, a lot of today's surfaces and software were built in a world where you could get away with a lot more laziness compared to this. But I could imagine, for example, a state of the world in which we're much more intentional about what we accept and even bring _into_ our threat environment. Similarly to the shift from network to endpoint security. There are for sure, uh, million systems right now with a threat model wildly larger than it needs to be.


Problem is, the way economic activity is organised in general, there is no transition path from complex bloated systems to well designed completely human auditable systems. For example given the inherent (and proven) security risks of the Wordpress ecosystem, nobody should run WP anymore.

> whether or not Gemini really does forget what it has seen as easily as claimed

Whoever is writing this seems to have absolutely no clue how AI works.

Given that Google is clear about the fact that they don't train on your emails, the worst that could be happening here is that... within the scope of your account they maintain an extra index or two, or... additional synthesized data, in addition to the many indexes that they already maintain over your email.


While composing a reply to recipient B leaking some details that it "learned" when reading a mail from sender A, which you did not want to share with B. I have no idea how they organize sessions, indexes and whatever they use. But if no "side-channels" existed, I would be extremely surprised.

Of course reading generated text remains the sole responsibility of the user before clicking "Send". We all know that reading drafts can happen more or less carefully, especially when being in a hurry.


> Whoever is writing this seems to have absolutely no clue how AI works.

The question isn’t really about how AI works. It’s about how Google (the company) works. Do their actions match their stated intentions? Which is really a question of trust. Are they incentivised to lie? Yes. Are they likely to survive a disclosure scandal? Facebook’s experience inclines me to believe yes.


Google has been "clear" on many things in the past that were outright lies.

https://meta.ai/share/pe4HxOfv2Bp

Finding a little bit tricky to evaluate because the harness is unfortunately very, very bad (e.g. search is awful). Can't wait to try this in some real external services where we can see how it performs for real.

Definitely getting ordinary high-quality results, overall. But hard to test agentic behavior and hard to test prose quality, even, when just working off of the default chat interface.

One thing that stands out is that _for_ the quality it feels very, very fast. Perhaps it's just only very lightly loaded right now, but irrespective it's lovely to feel.

I'm quite impressed with the tone overall. It definitely feels much more like Opus than it does, like, GPT or Grok in the sense that the style is conversational, natural and enjoyable.


This seems pretty good.

"We want to see risks in the models, so no matter how good the performance and alignment, we’ll see risks, results and reality be damned."

i mean, to be fair, these are professional researchers.

i'm very inclined to trust them on the various ways that models can subtly go wrong, in long-term scenarios

for example, consider using models to write email -- is it a misalignment problem if the model is just too good at writing marketing emails?? or too good at getting people to pay a spammy company?

another hot use case: biohacking. if a model is used to do really hardcore synthetic chemistry, one might not realize that it's potentially harmful until too late (ie, the human is splitting up a problem so that no guardrails are triggered)


"for example, consider using models to write email -- is it a misalignment problem if the model is just too good at writing marketing emails?? or too good at getting people to pay a spammy company?"

But who gets to be the judge of that kind of "misalignment"? giant tech companies?


Might makes right; brains hold reigns.

OpenAI have literally gone out of their way to explicitly support this sort of thing. As they did with OpenCode.

Honestly, this just looks like what Dylan of SemiAnalysis suggested on Dwarkesh – that they've massively under-provisioned capacity / under-spent on infrastructure.

That would honestly be a comforting answer if true, because I would gladly take 'we can't afford to do this right now' over 'we are self-preferencing, and the FTC should really take a look at us, even if we're technically not a monopoly right now, since we're the only strongly-instruction-following model in town and we clearly know it'.


OpenAi is burning cash to stay relevant aiui, i.e. they will keep subsidizing

You can use these tools with most providers today, just no subscription plan. If you have enough spend, you can likely get bulk deals


> we are self-preferencing, and the FTC should really take a look at us, even if we're technically not a monopoly right now

Tell me you have zero clue what a monopoly is or what the law is, without telling me.

Monopoly law relies on broad categories, not narrow ones. You can’t call Microsoft a monopoly because they are the only company that makes Windows. You can’t call Amazon a monopoly because they are the only company that makes AmazonBasics. You can’t call Anthropic a monopoly because their product is 20% better for your use case, otherwise by definition no company has any incentive to do a good job at anything.


Somehow this was coming up a few years ago where people kept saying that Apple could face antitrust because they were the only company who made iOS and controlled the App Store. Given that android exists, and has roughly equal market share, that didn’t make much sense to me, but I kept seeing it being discussed.

And Apple did lose that case now so they were correct; sometimes, one can be a monopolist in the market they created.

> Tell me you have zero clue what a monopoly is or what the law is, without telling me.

Monopoly law is subject to reinterpretation over time and anybody who has studied the history of it knows this. The only people argue for "strict" interpretations of current monopoly law are those who currently benefit from the status quo.

> Monopoly law relies on broad categories, not narrow ones.

And this is currently a gigantic problem. Because of relying on broad categories to define "monopoly", every single supply chain has been allowed to collapse into a small handful of suppliers who have no downstream capacity thanks to Always Late Inventory(tm). This prevents businesses from mounting effective competition since their upstream suppliers have no ability to support such activities thanks to over-optimization.

To be effective on the modern incarnation of businesses, monopoly law needs to bust every single consolidated narrow vertical over and over and over until they have enough downstream capacity to support competition again.


Well, Apple did recently lose as they're the monopolist in their walled garden for app distribution.

Oh, give me a break. I know the law around this incredibly well. Reasonable people can disagree about whether the law is appropriate. The whole point of laws is that they should match intent – and as for '20%': "tell me you don't understand how a small quantitative gap can result in a step change in capability."

> Oh, give me a break. I know the law around this incredibly well.

Then don’t make BS up like implying Anthropic is a monopolist for the crime of competence.

> tell me you don't understand how a small quantitative gap can result in a step change in capability

The law does not give a darn about this. Being a good competitive option does not make you a league of your own. If I invent a new flavor of shake, the Emerald Slide, am I a monopolist in shakes because I’m the only one selling Emerald Slides? If you go and then start a local business reselling shakes and I’m your only supplier, am I a monopolist then? Absolutely not.


You do realize that I called out in my post they are absolutely not a monopoly by the law, right? I know all-too-well what the definition is.

We have a similar situation in mobile where Apple may not be considered a monopoly, but people have walked around for a decade with a supercomputer in their pocket that is wildly underused.

Things have gotten faster; things are different than they were decades ago when a lot of this was devised.

The reality of the matter is that some of us just want to see innovation actually happen apace, and not see 5, 10, or 30 years of slowdown while we litigate whether or not such a company is holding all the cards, while everyone is collectively waiting at the spigot for a company to get its shit together because we're not allowed to fix the situation.

For what it's worth, I'm hopeful that the other model providers will catch up and put us in a situation where this conversation is irrelevant.

What I'm afraid of is a situation where we see continued divergence, and we end up with another Apple situation.


> “we are self-preferencing, and the FTC should really take a look at us, even if we're technically not a monopoly right now”

That is not calling out that they are “absolutely not a monopoly by the law” in any way, shape, or form. You’re framing it as though they aren’t by a technicality, when they aren’t anywhere near discussion by even the most extreme of legal theories. You won’t find Lina Khan or Margarethe Vestager, both ousted for going too far, complaining about Anthropic.

> “We have a similar situation in mobile where Apple may not be considered a monopoly, but people have walked around for a decade with a supercomputer in their pocket that is wildly underused.”

In that we can’t run a Torrent client to download illegally redistributed media 99% of the time? Otherwise, in what way, are they underused? For the degrees of public addiction, a more underutilized phone would be a social benefit.


Let me back up what you're saying. They absolutely are not a monopoly today by any definition, by any stretch, in any conceivable way.

I'm looking forward. Things are moving very quickly. As I said above, I'm afraid of us diverging into another Apple situation in the future. If I suggest that they should be looked at and thought about, it's not for today, it's for tomorrow. If divergence continues. Because as with everything in AI, it might hit us a lot faster than people expect. Hell, given their approach to morality, I suspect that Anthropic folks have already thought deeply about these sorts of concerns. That's why it's actually a lot more in character for them to be doing this not due to self-preferencing, but due to unaffordability, which - if you look at my first post - is what I said seems to be happening.

Suffice to say that I have a graveyard of things that I think phones could have been, where unfortunately we've ended up with these - as you say - addicting consumerist messes.

Gonna stop here so I don't flood the thread. We're getting very off topic.


You’re welcome to start OpenSpigot yourself, and see how investors feel about you giving away your technical / IP / market advantage on launch day.

In the app, it now reads:

> current: 2.1.88 · latest: 2.1.87

Which makes me think they pulled it - although it still shows up as 2.1.88 on npmjs for now (cached?).


Too little to late. Someone has it building now.

https://github.com/oboard/claude-code-rev


This is interesting but I wonder if you would accept that this also has the downside of moving at the speed of humans.

In a situation where you're building, I find the orphan rule frustrating because you can be stuck in a situation where you are unable to help yourself without forking half of the crates in the ecosystem.

Looking for improvements upstream, even with the absolute best solutions for option 1, has the fundamental downside that you can't unstick yourself.


This is also where I find it surprising that this article doesn't mention Scala at all. There are MANY UX/DX challenges with the implicit and witness system in Scala, so I would never guess suggest it directly, but never have I felt more enabled to solve my own problems in a language (and yes the absolute most complex, Haskell-in-Scala libraries can absolutely an impediment to this).

With AI this pace difference is even more noticeable.

I do think that the way that Scala approaches this by using imports historically was quite interesting. Using a use statement to bring a trait definition into scope isn't discussed in any of these proposals I think?


The problem is existentials, or rather the existence of existentials without the ability to explicitly override them. Even in Haskell, overriding typeclass instances requires turning off orphan checks, which is a rather large hammer.

So once you've identified this, now you might consider the universe of possible solutions to the problem. One of those solutions might be removing existentials from your language; think about how Scala would work if implicits were removed (I haven't used Scala 3, maybe this happened?). Another solution might be to decouple the whole concept of "existential implementations of typed extension points" from libraries (or crates, or however you compile and distribute code), and require bringing instances into scope via imports or similar.

Two things are true for sure, though: libraries already depend on the current behavior, whether that makes sense or not; and forcing users to understand coherence (which instance is used by which code) is almost always a giant impediment to getting users to like your language. Hence, "orphan rules", and why everyone hates Scala 2 implicits.


Yep, familiar with all of this.

That said, I would love to see a solution in my favorite class of solution: where library authors can use and benefit from this, but the average user doesn't have to notice.

I tend to think that the non-existential Scala system was _so close_, and that if you _slightly_ tweaked the scoping rules around it, you could have something great.

For example, if - as a user - I could use `.serialize(...)` from some library and it used _their_ scoped traits by default, but if I _explicitly_ (named) imported some trait(s) on my side, I could substitute my own, that'd work great.

You'd likely want to pair it with some way of e.g. allowing a per-crate prelude of explicit imports that you can ::* import within the crate to override many things at once, but... I think that with the right tweaks, you could say 'this library uses serde by default, but I can provide my own Serializer trait instead... and perhaps, if I turn off the serde Cargo feature, even their default scoped trait disappears'.


That was my first thought! I never had this problem with Scala (2.x for me, but I guess there's similar syntax/concepts in 3).

The article author does talk about naming trait impls and how to use them at call sites, but never seems to consider the idea that you could import a trait impl and use it everywhere within that scope, without extra onerous syntax.

Does this still solve the "HashMap" problem though? I guess it depends on when the named impl "binds". E.g. the named Hash impl would have to bind to the HashMap itself at creation, not at calls to `insert()` or `get()`. Which... seems like a reasonable thing?


Unfortunately, we're all stuck moving at the speed of the model labs because of the subscription models that they've provided.

The rest of us were able to implement things like push a long time ago, but because Claude Code and Codex stubbed those things out, we couldn't really use them for 'most agent users'.

In fairness to OpenAI, they have been generous in allowing for example OpenCode to sign in with your ChatGPT subscription – so you _could_ build a more powerful agent (which OpenCode is... not) – but unfortunately GPTs' instruction following just isn't up to snuff yet. Hopefully they pre-train something amazing this year!


I mean you can just use /loop in both Claude Code and Codex for heartbeats.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: