More

lmeyerov · 2026-06-05T20:57:12 1780693032

tesla not paying bills: https://www.cnn.com/2025/07/31/us/elon-musk-company-unpaid-l...

x not paying bills: https://www.cnbc.com/2023/02/24/musks-twitter-has-been-sued-...

spacex not paying bills: https://www.fastcompany.com/91124157/spacex-contractors-texa...

root-parent · 2026-06-05T20:58:59 1780693139

It has been leaked the have a huge open bill with AWS....

lmeyerov · 2026-06-01T19:29:48 1780342188

4-8 quarters for most tech IPOs to settle. IPOs are manufactured for the good times around young co's, so not surprising, and economic stability isn't a question of days/weeks/months.

And yes often a falling knife

This is pretty predictably wall street & federal regulators scamming normal people, retirement funds, etc, taking their fees and exit window at everyone else's expense

JumpCrisscross · 2026-06-01T20:19:56 1780345196

> 4-8 quarters for most tech IPOs to settle

Where are you getting this timeline from?

lmeyerov · 2026-06-01T21:12:28 1780348348

Mostly by having a pulse for the last 10-20 years as someone in the bay area seeing it repeatedly play out as tech IPOs get dumped onto retail investors repeatedly, including the 'good' ones. Being lucky enough to participate in IPOs makes you check these wrt when to balance IPO pop exit (weeks/months) vs long-term tax benefits of holding (2yr+).

- The initial pop is known to be manufactured by banks, so mostly benefits insiders, so good time to diversify. I'm conservative so sold to cover effective basis or whatever risk strategy :)

- The lockup period (6mo) is a similarly known artificial event, and studies show that

- Tech companies take ~8 quarters of prep for the IPO as they do financial engineering to transition from VC growth-at-all-costs to public $, and I'd expect the same for whatever nonsense they pulled to juice numbers to shake out. And that's not including oddballs like the Musk alternate universe, just normal tech companies covering up EBITDA and low interest rate madness.

- Tech is especially volatile as an industry, so even more skepticism here. Eg, the latest IPO I was involved in was a successful professional social network play, and chatgpt killed it.

Most/all of these are googleable things

JumpCrisscross · 2026-06-01T21:58:15 1780351095

Almost every retail investor has a random vibe like this about a market-timing hypothesis. They’re pretty much all cocktail conversation at best.

Lock-up expiry is a real effect. Everything else you mention is Reddit stuff—trading the pop is practically a gamble.

lmeyerov · 2026-06-02T03:33:19 1780371199

? Very much agreed, the IPO pop is a manufactured pricing event focused on investor dynamics rather than direct fair market pricing, making it more of a gamble than normal. Including gambles in index funds defeats the point.

Maybe the confusing point was my involvement is (discounted) pre-IPO shares, which almost by definition, is not an activity accessible to retail investors.

lmeyerov · 2026-05-30T14:55:10 1780152910

Useless russian-troll-style argument:

- With no workers working, no worker fraud problem, sure. If you cut core scientific processes, politicize science, and destablize paycheck predictability enough to chase everyone good out of science, then yes any small amount of waste is also caught in the cuts.

- This seems to increase what you call bad "fun": Increases abuse of tax funding being corruptly given to projects advocated by political appointees despite rejection by scientific peer review. Vicious feedback loop.

xtiansimon · 2026-05-30T15:22:43 1780154563

> "Useless russian-troll-style argument"

Surprise! I'm just a middle-age American reading HN with his coffee trying to wrap my head around the topic. I don't think this remark helps anyone understand your argument. Doth protest too much.

I'm wondering if you're focused on the "approved" science, and missing the idea this corruption is riding on the back of even a "small amount of waste", and an overall rejection of scientific activities in the face of the replication crisis. All part of the schism of your facts and our facts insanity.

lmeyerov · 2026-05-30T23:02:59 1780182179

R1 work generally doesn't have a replication crisis, and generally incrementalism is the bigger issue there, which is in turn tied to penny pinching

The bigger issue is failure to significantly increase r&d funding, vs last decade+ shrinkages and Trump-era eating of the young, and focuses like you now propose suggest a continuation of such economy-inhibiting thinking. Also, note how your post was goalpost moving. This in turn is classic trolling with asymmetric effort, so I don't see your response in good faith.

xtiansimon · 2026-06-01T11:04:42 1780311882

> “This in turn is classic trolling with asymmetric effort, so I don't see your response in good faith.”

I feel like I commented on someone’s marriage. Peace.

lmeyerov · 2026-05-26T15:35:39 1779809739

Fwiw, the cost per answer, which is what ultimately matters, is going down. In a competitive market with oss and multiple frontier labs, it is hard to maintain a premium long-term.

The big question is how subsidies vs technology improvement will play out. As we saw with Uber, selling at a loss can happen for a very long time, and technology improves relentlessly.

For reference, we publish https://botsbench.com/ that shows time and cost per answer are going down while quality is going up.

lmeyerov · 2026-05-22T16:59:56 1779469196

oss models don't directly matter when multiple at-scale frontier API providers have to compete on price: they are limited in defensible margin

They do matter in that oss researchers enable faster cross-pollination of good inferencing efficiency improvements to help the big boys adapt ideas from the community

Long-term local ai may matter more, but imo not there until models + hw get way better (1-2 years?) . Reasoning grade quality at speed is still $$$: we need fast opus, not slow sonnet.

lmeyerov · 2026-05-17T15:32:00 1779031920

Not really. Claude Code harness with Sonnet 4.5 model showed you don't really need bigger GPU rollouts, and it's only a matter of time for OSS combos to hit that. Overtime, this will only get better, and the set of enterprise tasks smaller deployments can handle will only go up.

lmeyerov · 2026-05-17T15:26:50 1779031610

It's felt awhile similar to what we see in parallel computing:

- shift towards throughput-oriented vs latency-oriented. Can juggle more tasks, but increasingly hard to speed up individual ones.

- strong scaling is tough. Might even see slowdowns for individual tasks, so reliable benefits come from being able to juggle more and eat the per-task inefficiency

- amdahl's law: we can't speed up tasks beyond their longest sequential (human) unit, so our work becomes identifying those bits and working on them. Related: you can buy bandwidth, but you can't buy latency

lmeyerov · 2026-05-16T17:01:51 1778950911

It's tough. We run botsbench.com , which tracks AI progress on a top CTF, and I gave a talk at CCC a few months ago on our own results doing AI speed runs, so I think about this a lot.

In our own trainings we give (AI agents for security, and a graph masterclass), we ended up leaning into it. For example, we ship with a skills bundle. There are plus sides, like less code-forward participants can go further and are appreciating that, and less of a gap between high-level concepts and successful hands-on. But at the same time, manual work does build a lot of intuition & knowledge that gets missed in auto modes.

nine_k · 2026-05-16T17:09:01 1778951341

Will this bring back the age of LAN parties, where the LAN is disconnected from the internet, and mobile connectivity is blocked?

lmeyerov · 2026-05-16T21:15:50 1778966150

I think that ship has sailed as well --

botsbench.com shows Sonnet 4.5+ with Claude Code harness does pretty well, and Sonnet roughly tracks the edge of what self-hosted models do on the upper tier of affordable GPUs, like running 1-2 DGX Sparks and waiting 6mo for oss to catch up a bit

lmeyerov · 2026-05-11T14:40:21 1778510421

Yes, being comprehensive, so early or blatant cheapo findings do not distract from other ones. That's important for base results. Splitting in both file and task is (currently) important.

Additionally, we run in a loop until it stops finding things, and as part of that, do test amplification when it does find any. We regularly see 3-8 rounds yielding valid results.

IMO half the value is customization to your repo, so copying these and specializing to your repo is super quick and pays off almost immediately . How to find style guides, how to run tests, what dimensions of correctness to look for, etc.

We do a similar look here: https://github.com/graphistry/pygraphistry/blob/master/agent...

This kind of thing makes me question how important Mythos is for security bug finding - doing a High effort loop with a frontier model in code reviews until convergence has already outperformed human review for us . (Doesn't replace, but does find things we miss, and catches many we do see earlier).

esperent · 2026-05-11T15:30:28 1778513428

How do you prevent it from increasing scope?

That's the main issue I've found from running loops like this. Each loop has ~7 agents, say, looking through different lenses (security, UX, performance, etc.). Each one notes a few issues, each issue gets fixed, you do 5 to 8 loops, as you say. Each individual item that gets fixed looks minor but when you add it all up at the end you've increased PR size and scope significantly.

adamthegoalie · 2026-05-11T15:58:18 1778515098

That is such a good point.

I recently opened a PR against this AI personal finance tool Ray https://github.com/cdinnison/ray-finance/pull/8 to add an Apple Card import feature, since Apple Card is not supported by Plaid.

I built the manual import feature, opened the PR, and then ran a code review.

What I hadn't thought about when I built the feature, was the myriad ways that the implications of importing data from Apple would have to be considered and integrated into the rest of the app, for the manual import to be a first-class feature, not "just a manual import" of data.

I ended up running adamsreview against it like 5-10 times, before considering it complete, as I learned that there was much more to the integration than I realized.

Now is that necessarily a problem? Maybe not. I should have realized from the start that the import feature was going to much more than just a small feature. But at least, thanks to the review loop, I got it completely right before the PR was merged.

lmeyerov · 2026-05-12T00:43:41 1778546621

Yep, a few views here:

- one wave is code reduction via DRY removals and architectural fixes, and another is adverserial to get rid of false additions, so this helps AI bloat either way

- as the other comment says, underspecification is a problem, so this ends up finding when the implementation, tests, docs, quality guide, and spec are out of sync, with whichever to blame.

- Usable, well-designed, secure, and well-typed code ends up being bigger, so this helps cut to the chase. Ultimately, either you get there or you don't, and this helps cut review burden so you can do your part of it faster and at a higher level.

Funny enough, I'm now playing with gardening agents whose job it is to reduce code. But I wouldn't want to slow PRs on that so view as seperate PRs.

azurewraith · 2026-05-12T13:26:46 1778592406

I've had similar experiences when I throw a bunch of agents at a problem... some things get flagged but a lot gets truncated in the summarization step. Per-phase constraints solve this naturally, and I think the problem is better suited to be solved serially. Have each specialized 'review' phase scoped to only read and annotate (even better with a code-owners style read scoping) with max iterations in deterministic code. The scope can't creep past the constraints you've set for it. Scope explosion comes from agents having unbounded tool access and no transition gates between phases... it will overreach if given the opportunity to

lmeyerov · 2026-05-07T04:39:11 1778128751

Maybe a failure to automate?

The volume of people successfully adopting agentic engineering practices suggests this stuff isn't rocket science, but it is a learned skill and takes setup.

A year later into heavy AI coding, my experience is what you're describing should aid in being able to run 5+ agents simultaneously on a project because you know what you're doing, you set it up right, and you know how to tell agents to leverage that properly.

stephenr · 2026-05-07T06:56:50 1778137010

> successfully adopting agentic engineering practices

What's your definition of "successfully"?

More LOC committed per day is probably the only one that's guaranteed when you let spicy autocomplete take the wheel.

I don't think it's at all possible to reason about the other more meaningful metrics in software development, because we simply don't have the context of what each human is working on, and as with the WYSIWYG fad of 3 decades ago, "success" is generally self-reported, by people who don't know what they don't know, and thus they don't know what spicy autocomplete is getting woefully wrong.

"But it {compiles,runs,etc}" isn't a meaningful metric when a large portion of the code in question is dynamic/loosely typed in a non-compiled language (JavaScript, Python, Ruby, PHP, etc).

bdangubic · 2026-05-07T13:23:32 1778160212

If you are on the right team with the right professionals you can measure. when we first started using LLMs we decided to run the same process as if they did not exist, same sprint planning meetings, same estimation. we did this for 6 months and saw roughly 55% increase in output compared to pre-LLM usage. there are biases in what were tried to achieve, it is not easy to estimate something will take XX hours when you know some portion (for example writing documentation or portions of the test coverage) you won’t have to write but we did our best. after we convinced ourselves of productivity gains we stopped doing this. saying you can’t measure something is typical SWE BS like “we can’t estimate” and the other lies we were able to convince everyone off successfully

pepperoni_pizza · 2026-05-07T12:20:28 1778156428

Also, if your boss tells you "we're AI company now, you will use AI or be fired" then of course you will use AI and claim it is productive.

dodu_ · 2026-05-07T06:19:18 1778134758

Maybe you're the exception and are actually doing it right and actually getting good results, but every time I have heard this, it has been an ignorance-is-bliss scenario where the person saying it is generating massive amounts of code that they don't understand, not because they're incapable but because they don't care to, and immediately wiping their hands of it afterward.

To give an example of where I hear this, it is indistinguishable from the things I hear from my coworkers: "You just need the right setup!" (IMO the actual difference is I need to turn off the part of my brain that cares about what the code actually does or considers edge cases at all) What I actually see, in practice, are constant bugs where nobody ever actually addresses the root cause, and instead just paves over it with a new Claude mass-edit that inevitably introduces another bug where we'll have to repeat the same process when we run into another production issue.

We end up making no actual progress, but boy do we close tickets, push PRs, and move fast and oh man do we break things. We're just doing it all in-place. But at least we're sucking ourselves off for how fast we're moving and how cutting edge we are, I guess.

I dunno, maybe I'm doing it wrong, maybe my team is all doing it wrong. But like I said the things they say are indistinguishable from the common HN comment that insists how this stuff is jet fuel for them, and I see the actual results, not just the volume of output, and there's no way we're occupying the same reality.

lmeyerov · 2026-05-07T15:41:52 1778168512

Yes and no

I've seen productivity surveys of senior programmers that share the reverse, and that matches our experience. A common finding is that gardening projects are a lot cheaper now when they're just a few extra terminal tabs running in parallel - security, refactoring, more testing, etc. Non-feature backlog items that senior developers value around tech debt are less of a discussion now. They're often essential now: to make AI coding work well, there is an effective automation poverty line around verification, testing, and specification that needs to be reached.

The understanding code thing is tough. Eg, when a non-senior fullstack developer manually edits frontend css code and didn't start from pixel-perfect designs across all form-factors, do they really understand what they did? I wrote the first formal mechanized specification of the CSS standard, and would claim 95%+ of web developers do not understand core CSS layout rules to beginwith: it was a struggle to semantically formalize even a tiny core of the box model as soon as you have floats. If the AI generates live storybooks and in-tool screenshots of all these things as part of the review process, and doing code review "looks good", what's the difference?

I don't truly think this way - my point is to challenge basic claims of manual coding to be good to begin with and whether AI coding is being held to an artificial standard. What I see in commercial and defense software is a joke compared to what we do in the verification world. AI coding automating review iteration fixes in areas like security engineering and test coverage+amplification has been a blessing for quality improvement.

More fundamentally, we require developers by default to be responsible for knowing what the code does and having tested it. Every case of relaxing that rule has to be explicit, eg, clear that something is a prototype, or an area is vibed with what alternate review/test flow, and we are learning as a team what that means in different situations. In practice, our senior ai coders are doing more quality engineering work than the manual coders, both per-pr and in broader gardening contributions.

dodu_ · 2026-05-07T23:54:10 1778198050

> do they really understand what they did? ...

I know you said you don't truly think that way, but to counter anyway since some people seem to legitimately hold this viewpoint:

I take issue with the implication that not necessarily having a full understanding of what the code/library/driver/compiler/abstraction is doing is somehow justification/permission to embrace and celebrate having basically no understanding of what any of the code is doing. The in-between space there is the vast majority of the surface area where nuance can and should exist.

>my point is to challenge basic claims of manual coding to be good to begin with and whether AI coding is being held to an artificial standard

That's fair, and I can only speak for myself here; I don't have any inherent philosophical issue with manual vs AI, but my personal experience is that AI coding is just straight-up a frustrating nightmare to deal with, IMO orders of magnitude worse than manual. It's faster, sure, but I end my rage-filled LLM debugging session walking away knowing I learned pretty much nothing and that there's no compounding knowledge or outcome that will keep me from experiencing the same thing tomorrow, and I hate that. I am Sisyphus rolling prompts into a terminal.

But I'm not gonna sit here and act like manual coding makes you morally virtuous or pure or whatever. IMO it's a great forcing function to better (even if not completely) understand what is going on in your system(s) and I think most everyone would agree with that. What's up for debate is probably whether that's worth the time tradeoff now that we have a magic time compressor machine available to us.

Maybe I only find that knowledge tradeoff valuable because I'm a lowly IC and not some super turbo chad 10x principal who built a distributed database in brainfuck 10 years ago for fun and has nothing left to learn, or a technical founder of 5 concurrent startups who is optimizing for business value. It's possible that a heavy bias for learning/skill acquisition blinds me here.

>we require developers by default to be responsible for knowing what the code does and having tested it. Every case of relaxing that rule has to be explicit

This sounds pretty reasonable tbh.

human305893 · 2026-05-07T07:52:36 1778140356

1. If what you're replying to was a thing, wouldn't there be a open source project where I could see this in action? or Some sort of example I could watch on youtube somewhere. 2. The people that talk like this in my company, spin up new projects all the time and then just get to hand them off for other teams to clean up the mess and decode what the heck is going on.

lmeyerov · 2026-05-07T15:03:27 1778166207

1. Probably most of https://github.com/simonw , but take care to seperate adopted / semi-professional from exploratory personal work

2. That sounds like your company has a weak engineering culture and is early on its upskilling journey. We explicitly seperate projects into prototypes vs production, where vibes are fine for the former, eg, demos by designers / data scientists / sales engineers but traditional code review standards for whatever is going into production. That mirrors my qualifier in #1.

I find that success here is a combination of engineering seniority, prompting experience, and domain experience . Anything lacking breaks the automation loop, like not knowing how and what to automate. Ex: All of our team finds value in ai coding, but junior engineers struggle on these dimensions, so are not running the 3+ agents that senior ones are.

necovek · 2026-05-07T04:54:16 1778129656

You seem to have missed OP's point: some things are only encoded in our brains when you are sufficiently experienced.

Translating that into code can happen directly by you, or into prompt iterations that need to result in the same/similar coded representation.

In other words, when it matters how something works and it is full of intricate details, you do not need to specify it, you just do it (eg. as an example which is probably not the best is you knowing how to avoid N+1 query performance issue — you do not need a ticket or spec to be explicit, you can just do it at no extra effort — models are probably OK at this as it is such a pervasive gotcha, but there are so many more).

lmeyerov · 2026-05-07T05:52:01 1778133121

That's the failure to automate. The AI isn't telepathic, so agentic engineers not automating this stuff is skipping out on the engineering part.

You setup the environment and then you do the work. Unless you are switching employers every week, you invest in writing that stuff down so the generation is right-ish and generate validation tooling so it auto-detects the mistakes and self-repairs.

mittensc · 2026-05-07T09:50:26 1778147426

sometimes you write the feature and write it well so it's reusable.

imagine you have to implement a specific algorithm for a quantum computer.

There's no value setting up AI to do the writing for you. That might be orders of magnitude harder then writing the algorithm directly.

For highly specialized one-off features, it doesn't always pay off.

On the other hand, if all you do are some generic items that AI can do well... then I'm not sure you're going to have a job long term, your prompts and automation will be useful for the new junior hires that will be specialized in using these and cost effective.

lmeyerov · 2026-05-07T19:53:18 1778183598

That feels like true in theory, but in practice, we see the reverse for advanced projects where AI is helping us a lot. A decent chunk of our core IP falls into the bucket you're describing:

We have been building a GPU-accelerated graph investigation platform that has grown over 10+ years with fancy stuff all over the place - think accelerated query languages, layout kernels, distribution, etc. R&D-grade high performance engineering projects and kernels end up needing a lot of iterations to make a prototype and initial release. Likewise, they're more devilish to maintain when they need a small tweak later because of the sophistication and bus factor. Both phases benefit.

AI coding helps automate investigation, testing, measurement, patching, etc. The immediate effect is we can squeeze in many more experimental iterations with more fidelity and reach. Having an AI help automatically explore the design space and the details helps a LOT. And later, maintaining a wide surface area of code here that is delicate to touch and infrequently edited is traditionally stressful for teammates, and AI editing + AI-generated automation is helping destress that a LOT. We very much invest in upgrading our team, processes, and tooling here.

mittensc · 2026-05-08T06:08:11 1778220491

Allright, thank you! I need to re-evaluate then.

timschmidt · 2026-05-07T05:18:06 1778131086

I think there's a level above that where the words to describe such structure are familiar and readily available and hey guess what? The model understands those too. Just about every pattern has a name. Or a shape. Or an analog or metaphor in other languages or codebases. All work as descriptors.

necovek · 2026-05-07T06:42:19 1778136139

This presumes that most of this stays encoded as words in our brains: the effort to translate some of these into words might be similar to translating it into code (still words, just very precise).

It's like talking legalese vs plain English; or formal logic vs English. Some people have the formal stuff come more naturally, and then spitting code out is not a burden.

timschmidt · 2026-05-07T22:21:57 1778192517

No, it really doesn't presume anything about brains or information encoding. Just points out that there is a level of mastery in which all the techniques and all the forms have names or adequate descriptions. Teachers often attempt to achieve this, to facilitate education.

necovek · 2026-05-07T22:35:02 1778193302

It's no accident there is an adage from Aristotle in the vein of: "Those who can, do. Those who understand, teach."

So yes, there is a level of mastery that is beyond being able to do a good job of designing and evolving complex systems which enables people to teach others the same skill set.

However, this is a smaller number of practitioners, and most have learned through practice and looking over how more experienced engineers apply their knowledge.

Where I disagree is that this means everybody is equally capable of teaching with words, or that there are no experts who are bad at teaching (humans or directing AI) — this clearly indicates it is not encoded as words for said experts.

timschmidt · 2026-05-08T00:53:13 1778201593

It's been pretty clear in my experience that experts tend to be capable of working with the same ideas in many different forms. That's what I would call mastery. It implies "complete" knowledge, which probably means several interrelated encodings with loci in different parts of the brain. Those interrelated encodings will be highly associated, and discerning in an expert. Which implies a high degree of usefulness and specificity in communication. This matches my experience.