More

connorboyle · 2026-06-10T22:45:37 1781131537

A startup that uses agentic coding tools such as Claude Code or Codex is packaging up their entire codebase and sending it directly to their LM provider. Depending on their product, they might be sending it directly to a potential competitor.

Odd times we are living in!

ai-x · 2026-06-10T22:56:00 1781132160

people over-rate how much software/IP is useful in running a successful business. There are genuinely very few IP in this world that needs to be protected. Everyone else is running stupid CRUD apps

They also over index fear of LargeCo stealing IP from SmallCo. In fact, LargeCo is typically more scared about even the possibility of any product team looking at competitor internals due to lawsuits.

59nadir · 2026-06-11T07:44:29 1781163869

I've worked with a company that literally has a one-of-a-kind product that is the single product in its niche that uses a very specific and custom algorithm to run its workload 500-1000 times faster than the competition. Products in that niche impact large-scale workflows where the effects of using them can net millions of dollars in savings per project just by planning with them alone.

I learned after my contract with them was put on hold that the CEO uses Claude to vibecode experiments on the code base. Not for any good reason, mind you, the algorithm was written by the CTO who emphatically does not use any LLMs.

With Anthropic's reach they could probably make a massively successful product in that market and basically take the entire thing over, if they only knew to look. And I'm 100% certain that they don't actually follow any policies on not using their incoming data.

mxkopy · 2026-06-11T11:51:31 1781178691

This is what bugs me about the whole AI fanaticism thing coming from the top down, because what evidence is there that the AI labs aren’t going to try and eat everyone’s lunch after they’ve done whatever they need to developing the actual AI. We’ve already seen this with Gemini and OpenAI trying to eat video production and making workflows explicitly for that purpose, what makes people think that Claude isn’t going to do the same exact thing once they get bored of making models? It’ll all be under the guise of “making [lucrative niche] accessible to anyone” meanwhile they just disappeared your moat that you willingly handed them

deaton · 2026-06-11T13:28:30 1781184510

We've also seen ample evidence that AI labs are not overly concerned with the legality of how they obtain training data. Its not a stretch to say maybe they look at some other stuff they shouldn't too.

59nadir · 2026-06-11T14:44:50 1781189090

Yeah, I really don't know what people are thinking. We specifically didn't use any LLMs in the development on the project specifically to not leak anything (though admittedly also because we just didn't think they were particularly useful at the time, even for smaller things). The same CEO is also deathly afraid of people reverse-engineering the application so I have no idea how he reconciles these two things. I would've thought it's either fine to blast the codebase out there to essentially unknown parties and also fine to deliver a binary without shitting your pants, or it's not fine to do either.

Iolaum · 2026-06-11T13:54:45 1781186085

They (Anthropic) don't need to "look" at the data. Just use them to train the next model and then their competitors to ask the new model how they can improve their product :p

freejazz · 2026-06-11T17:32:33 1781199153

Goodbye tradesecret!

hnlmorg · 2026-06-10T23:01:36 1781132496

I’d be more scared of a data leak due to LargeCo being hacked than I would about LargeCo prying into the data.

What I don’t trust LargeCo with is personal information. I’ve heard too many horror stories about Govs and LargeCos swapping customer nudes or stalking ex’s to be comfortable with anything personal on those systems. But that’s a whole different topic.

rkozik1989 · 2026-06-11T15:00:24 1781190024

Well, I mean, basically any data leak violated privacy laws and opens you up to extremely expensive lawsuits to litigate. Anyone dealing with healthcare/patient data, police customers, military customers, etc. should not be using LLMs in general or at least ones that are not on-premise. Because if there is a data leak it could bankrupt the business.

hnlmorg · 2026-06-11T15:23:09 1781191389

There is a massive difference between using LLMs as coding agents, and using them to analyze PII like healthcare data.

Eridrus · 2026-06-11T04:56:44 1781153804

In general, I agree with you.

However, in the case of model providers, I think it is a more real concern since it could make it into some training data, and then one of your actual competitors could ask the model to code something up and get your IP.

I sort of assume the frontier AI labs are good about not doing this when they promise not to, but if you don't have airtight restrictions on what your devs are doing, they might be sending it somewhere that hasn't agreed....

physicsguy · 2026-06-11T09:21:13 1781169673

I worked in very technical engineering software company and they were super paranoid about their special sauce IP of a product that did analysis of a certain type of data, without being able to see that all the pieces of that special sauce were actually just functions from SciPy strung together and which you could look up in a textbook. Don't get me wrong, you need the right background to understand it and that's not trivial, but if you got someone from the right area you could replicate it pretty easily.

switchbak · 2026-06-10T23:35:52 1781134552

LargeCo is probably struggling under the weight of technical debt and organizational challenges/politics.

I bet if you gave them the Codebase of the Gods, it’d be a heap of hacks inside a couple months.

Peacefulz · 2026-06-11T06:19:46 1781158786

At a growing LargeCo now, and have been entrusted to some internal flows as an associate. I honestly don't know how Ops Managers get through the day. So many pipelines with basically non-existent audit trails. So much money leaking from the cracks in these places that it's criminal. I wouldn't trust these people to hold my beer, let alone sensitive data.

noncoml · 2026-06-11T00:06:19 1781136379

How can you make such bold and generic claims without some data backing it?

IshKebab · 2026-06-11T09:25:47 1781169947

I don't have any data either but I agree with him, based on my experience working for lots of different companies and seeing their attitude to IP, with varying levels of paranoia.

Companies can be really paranoid about IP theft. The worst company I've worked at was Dyson, who are super paranoid. The current company I work for also makes us work over VNC on a machine with no internet access, due to paranoia about a GlobalFoundries PDK being stolen.

In the vast majority of cases, stealing IP would be not useful at all. For example I worked on a RISC-V CPU. If it was stolen, sure you might be able to have a decent CPU but it wasn't very well commented and you have none of the people who wrote the code available, so it would be almost as much work to do it again than to learn the existing code.

Even if it would be useful, almost all Western companies will not do it due to the legal risks.

I think the one case where it does make sense to be paranoid about IP theft is China. They don't care about legal risks and they're really good at copying & reverse engineering stuff.

ai-x · 2026-06-11T01:13:40 1781140420

actuaries look for data. visionaries take leaps in faith. There was no data proving LLMs will work at scale. Google waited for the Data. OpenAI and then Anthropic took the leap of faith. The result is there for all to see. The core attribute of a successful AI Researcher was were they AGI-pilled and not were they waiting for data for unknown unknowns?

nozzlegear · 2026-06-11T03:28:30 1781148510

> actuaries look for data. visionaries take leaps in faith

Oh, what a whimsical aphorism.

noncoml · 2026-06-11T03:51:22 1781149882

"trust me bro"

sly010 · 2026-06-10T23:20:31 1781133631

> people over-rate how much software/IP is useful in running a successful business

Indeed, by a couple trillions...

raron · 2026-06-11T03:35:49 1781148949

> They also over index fear of LargeCo stealing IP

That seems to be a bold statement considering the whole business of this LargeCo is based on stolen IP.

bob1029 · 2026-06-11T00:30:48 1781137848

Trust and liability are the actual currency in a software business.

Your email domain is significantly more important than whatever is in your corporate GitHub repositories.

tsunamifury · 2026-06-11T00:57:19 1781139439

You could not be more wrong in the aggregate.

Literally how LLMs will continue to learn to code and easily replace whatever you build with them.

Incredible that you could so blithely misunderstand this

drchaim · 2026-06-10T22:47:58 1781131678

and all their keys, because sooner or later, the harness is gonna read them

fastball · 2026-06-11T05:03:34 1781154214

Claude code is actually very good at not reading your keys these days.

drchaim · 2026-06-11T10:00:06 1781172006

Not the case for me. I tried .envs, ansible-vault and sops, and it always ends up reading the unencrypted ones for some reason, usually in debugging sessions, it finds a way to read them.

fastball · 2026-06-11T16:24:17 1781195057

Well it reads them, but (at least for me) it reads them in a way where it filters out the actual key values.

ai-x · 2026-06-10T22:57:03 1781132223

One company's irrational fear is a competitive advantage for someone else.

sreekanth850 · 2026-06-11T03:03:23 1781147003

A Startup using gitlab or github or bitbucket also have the same risk right?

c0balt · 2026-06-11T10:04:58 1781172298

For self-hosted GitLab or BitBucket, no. GitHub enterprise (self-hosted) also no (though that is rather rare).

sreekanth850 · 2026-06-11T11:31:27 1781177487

We are only talking about saas. every saas have access to your data at disc or storage level.

puttycat · 2026-06-11T12:23:23 1781180603

100%. Companies are paperclip optimizers, with money as the objective. For example, Uber used ride data to circumvent investigations by regulators. There is absolutely no reason to assume that AI companies would not use their data in any way possible to reach their objectives.

skybrian · 2026-06-10T23:37:26 1781134646

Yes, it certainly is an odd situation when some people believe you cannot use Mythos-class models because security while others believe you must do code reviews with Mythos-class models because security.

tobyhinloopen · 2026-06-11T05:01:00 1781154060

You mean these tools you can now rebuild at the cost of a night and one Claude code subscription?

You have to have an ordinarily unique startup if your software can’t be recreated quickly.

Ifkaluva · 2026-06-10T23:11:36 1781133096

Not just “a startup”! Also, famously, Meta, with their famous AI usage dashboards

stainablesteel · 2026-06-11T02:08:22 1781143702

they would kill their own product if they did this

it would be like if tsmc started designing their own chips to compete with the people they sell their services to, they have more to gain by limiting their participation to a specific corner

connorboyle · 2026-06-10T02:16:19 1781057779

I gave it a question I've been trying to answer for a long time: "What star designation system does Joseph Needham use in Science & Civilization in China? What star is referred to by the designation '4339 Camelopardi' in that book"?

Fable blew me away with its detailed answer[0] showing a chain of references going from J. E. Bode's 1801 catalogue Allgemeine Beschreibung und Nachweisung der Gestirne to Gustave Schlegel's 1875 work Uranographie Chinoise. I was excited, until I checked scanned copies of the cited books and did not actually find any star with the designation "4339 Camelopardi".

Upon following up with Claude, I was forced to downgrade to Opus, which admitted that Fable's answer was likely a hallucination. Ah, well!

[0]: https://claude.ai/share/0252a3f6-3d29-4de8-a893-010181d8b4e7

Aperocky · 2026-06-10T02:24:28 1781058268

> I was forced to downgrade to Opus,

So you were forced to downgrade to opus because you dared to challenge the output of fable?

connorboyle · 2026-06-10T02:37:29 1781059049

I had thought it said something about token usage, but I just clicked on "Switched to Opus 4.8 - Why?" and it says:

> Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Send feedback or learn more.

Perhaps Mythos realizes the true danger in studying Chinese Archaeoastronomy that we mere mortals fail to recognize!

connorboyle · 2026-06-08T20:06:01 1780949161

> The credential-stealing function in the Miasma worm infecting the Microsoft packages was triggered as soon as a developer opened it in AI agents, including Claude Code, Gemini CLI, Cursor, and VS Code. Follow-on attacks are likely to occur in the highly feasible event that credentials were successfully harvested from machines that opened the packages in one of the affected AI agents.

It's really crazy that the most valuable companies in the world are suddenly allowing or even encouraging their employees to run programs whose entire functionality is undefined behavior right on their work computers, with access to important credentials and proprietary source code.

rbanffy · 2026-06-09T08:18:46 1780993126

I think we’ll witness the birth of the single-credential virtual desktop shortly. Remote VSCode in a very constrained environment - with access only for inbound connections from the desktop/thin client, source control, and trusted package repos.

And all serious credentials ephemeral and single-use.

This is why we can’t have nice things.

connorboyle · 2026-06-07T03:36:33 1780803393

Another Argentina/cloning-connected story is that President Javier Milei cloned his dog Conan at least four times: https://english.elpais.com/international/2024-04-26/the-myst...

The stories make me wonder if Argentina is a cloning hotspot, though I may be reading too much into two stories.

allthetime · 2026-06-07T03:44:25 1780803865

That is where many of the nazi war doctors who escaped prosecution ended up…

zzzoom · 2026-06-07T04:41:52 1780807312

Not as many as the ones that the US snatched in operation paperclip

mmustapic · 2026-06-07T09:24:39 1780824279

Exactly, the dog was cloned in the US

wahern · 2026-06-07T04:15:48 1780805748

Seems Brazil lost its early lead after cloning Hitler: https://en.wikipedia.org/wiki/The_Boys_from_Brazil_(film)

connorboyle · 2026-06-03T16:31:42 1780504302

Has there been "no progress" on classical prime factorization? What about the AKS primality test, a polynomial-time algorithm to test the primality of a number, published in 2002? (This is not my field of expertise; I'm genuinely curious if there's a good reason to discount this as progress towards efficient prime factorization)

mswphd · 2026-06-03T17:47:49 1780508869

Primality testing was essentially solved in the 70s with Miller-Rabin. AKS made that (randomized) algorithm deterministic, albeit at much higher (polynomial) running-time.

For your overall question, the current record-holders for integer factorization wrote a paper on this a few years ago that is probably a good reference

https://hal.science/hal-03691141/file/cryptography.pdf

The (rough) outline of the paper is that

1. theoretically there's been no progress on factoring in ~30 years

2. practically, there have been both improved hardware + efficient implementations driving the progress. They estimate that current nation-states can (classically) break RSA-1024. The cost would be approximately 500,000 core-years of computation. At current cloud prices this is doable on aws for < $1B.

3. attacks against factoring use a technique ("index calculus") that can also be used to attack finite-field discrete logarithm. There were significant advances on that problem in the 2010s (at least for certain parameters, namely the "small characteristic" setting). An easy way to communicate this is that the RSA factoring record is ~830 bits, while the binary-field discrete logarithm record is > 30,000 bits. These significant advances have not been able to be ported over to factoring, nor have they been ported over to medium/large-characteristic discrete logarithm. It is a (very upsettingly) large open question of whether similar-magnitude improvements are possible more generally for index calculus algorithms.

cyberax · 2026-06-03T18:37:24 1780511844

> Has there been "no progress" on classical prime factorization?

Not recently. The primality tests don't really help all that much. We already had polynomial tests that are really fast since the 70-s.

Think about this idea: the output of the counting function for the number of primes ("Euler's totient function") lies almost on the logarithmic curve, and we can compute logarithms quickly to any precision. So we can easily find the general area of the curve that should contain the current prime. And then we can quickly test if the given number is in fact the prime number within it.

This is probabilistic because the prime distribution is not _strictly_ logarithmic. We can imagine that by computing a logarithm we might end up in the next "bucket" and check for the wrong prime.

The fascinating part is that zeroes of the Riemann zeta function encode these corrections on top of the logarithmic curve. If the Riemann hypothesis is correct, then these corrections are _bounded_ and we simply can not end up in a different "bucket" by accident.

connorboyle · 2026-05-27T23:25:10 1779924310

Wow, it seems that 100% of sev-3 ("critical") incidents in the last year (=365 days) have occurred between April 22, 2026 and now.

Is it possible that there has been a change in the way the data are collected/recorded that even partially accounts for this sudden onset?

gen220 · 2026-05-28T00:41:53 1779928913

One tangent, I believe sev-0 is actually "critical" (at least as how I'm used to reading it), and the higher you go the less critical something is.

IMO as a github-watcher, I think they changed their definition of what constitutes a sev-0 between sev-1 for the better. In particular, they had a few "sev-1"'s around the turn of the year that would be classified as sev-0's if they happened today.

Pre-4/22 GitHub sev-1 was a normal SaaS company's sev-0, imo. So I think their new system is more reflective of reality. My guess is that a few of their big customers bullied them to have more accurate SEV categorization.

connorboyle · 2026-05-28T17:42:10 1779990130

Ah, thank you for the correction on sev-0.

To be clear, your observation that "they changed their definition of what constitutes a sev-0" is based just on your external observation of incidents and their designations, correct? I.e. they haven't officially released a statement saying they have changed their standards

lazide · 2026-05-28T00:42:41 1779928961

Waves around it had to break eventually eh?

connorboyle · 2026-05-22T17:57:31 1779472651

They are overstating how much the user experience is degraded in this particular case. But there is a much broader implication to the fact that Google is apparently not properly sanitizing user input to its search engine!

connorboyle · 2026-05-22T17:54:04 1779472444

There's apparently still a lot of user input going unsanitized in 2026.

connorboyle · 2026-05-22T17:39:35 1779471575

Your cynicism appears justified here. Pangram rates the first few paragraphs as "100% AI-generated": https://www.pangram.com/history/d06c8513-9ee3-4a1d-b02f-c1ec...

connorboyle · 2026-05-21T21:19:25 1779398365

The first several paragraphs of this article got a score of "100% AI-generated" on Pangram:

https://www.pangram.com/history/d06c8513-9ee3-4a1d-b02f-c1ec...

squibonpig · 2026-05-21T21:22:27 1779398547

Yeah I noticed it myself, "You asked a simple question. They lobbed a document."

semitones · 2026-05-22T01:22:25 1779412945

Agreed, this article has a lot of tell-tale signs of AI writing... Which is so ironic