Hacker Newsnew | past | comments | ask | show | jobs | submit | smoe's commentslogin

I think we too often treat other people’s jobs like spherical cows out of ignorance. Not just AI researchers.

Long before LLMs, programmers regularly and massively underestimated how hard it is to automate other people’s work. Knowledge workers often think carpenters just bang nails into wood, while blue collar workers think knowledge work as sitting in front of a screen copying values from Excel on the left into a form on the right while sipping a latte.

Only like 2.5 years ago, I thought programming would be one of the last knowledge worker jobs to be significantly affected by LLMs, not one of the first. I think AI models will continue to be very impactful. But for quite a while, they may mostly turn knowledge work into a last mile problem rather than eliminating it.


Earlier this week I started testing Chinese models on my codebase. I haven’t really looked at interactive coding yet, but more at issue triage, bug auto-fixing, log analytics, etc.

I used DeepSeek, Kimi, GLM, Qwen, and MiMO against GPT-5.5 high as reference, all running in Pi harness without anything installed.

So far, Kimi and MiMO look the most promising to me. I haven’t tested them rigorously enough to make a strong statement, but my first impression is that, in practice, all those models may be less behind on typical daily tasks than people think.

They are a bit “work hard, not smart". Getting to same-ish results more slowly and using more tokens, but at a fraction of the price


I just did a little comparison using benchmarks for GPT 5.1 through 5.4 to map out the equivalent capability-level of some of the Chinese models.

Based on these benchmarks, here's a rough mapping:

- Qwen 3.7 ~= GPT 5.3

- Kimi K2.6 ~= GPT 5.15

- DS V4 ~= GPT 5.1

So yes, we have GPT 5 at home now. No need to pay the Legacy Labs anymore.

Here's the benchmark I used since I can't post images here: https://x.com/trydotworks/status/2058004995195490706?s=20


I switched to predomentantly using mimo this week, mostly out of curiosity to see how dependant I was on frontier models. Honestly I cant really tell the difference. I would say I work on pretty average codebases with well know frameworks doing pretty typical things and initial impressions is that mimo, kimi and deepseek can probably handle what I need more or less the same as gpt5.5 or claude.


I personally really like DS4 Flash - it's the largest I can run locally with decent speeds and I feel like it's good enough to maintain a codebase with less effort


What hardware and quant do you run it with?


maybe i need to give it second chance, surprisingly Kimi 2.6 consistently fail even to generate valid json plan, where gemma 4 was doing really good, but slow.


Are you going through OpenRouter or direct? I’ve had nothing short of excellent results from Kimi.


At least according to this, GPT-5.5 Cyber is on par with Mythic, as the only two models that were able to finish their 32-step corporate network attack simulation.

https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...


My worry is that, at least among the artists I know, many kept themselves afloat early career by doing commercial freelance jobs like illustrations for local events or companies. Those kinds of jobs might largely vanish.

On the other hand, with the internet inevitably becoming swamped by AI generated content, I can definitely see a de-digitalization of art moving into offline spaces. At least for independent work, you don’t necessarily need mass appeal or exposure, but rather access to individuals and small groups with an actual willingness to pay for art.


I suppose with those same artists, at least for the smarter members of the group, might start using AI for basic commercial freelance jobs and just act as the human review, perhaps doing some final adjustments to the finished artwork.

So instead of being paid a small amount of money for something that they spend hours on, they create 10 artworks in that same number of hours and earn 10x what they did previously.


> I suppose with those same artists, at least for the smarter members of the group, might start using AI for basic commercial freelance jobs and just act as the human review, perhaps doing some final adjustments to the finished artwork.

And how they create demand for this? AI "enthusiasts" are enthusiastic about it exactly because they feel they don't need to outsource things to meatbags.


innocent. like demand won't lower prices?


I think you mean supply. Demand usually causes prices to go up.


> Those kinds of jobs might largely vanish.

have already largely vanished


That's not art though, and while it might have paid a small amount of money, it can also be incredibly degrading and soul crushing. That's the kind of work that AI tools are doing now. Those jobs should vanish. People shouldn't need to degrade themselves for money, we can have a system where people are generally taken care of, and the people who build extra cool shit can live even better.


> That's not art though

Why not? Would you also argue that most of the works by painters like Rembrandt, such as The Night Watch aren't art - just because they were contracted to make it? Does book cover art stop being art the second a book's title gets placed on it?

And sure, plenty of corporate work is boring and soulless. But the worst of that switched to spending 10 minutes with clipart and PowerPoint decades ago: if you were still hiring an artist, you cared at least a little about what the result looked like, which means there was at least some space for artistic vision.

> People shouldn't need to degrade themselves for money, we can have a system where people are generally taken care of

We should, but we don't. What's your proposal for letting artists grow and mature while paying their bills in the meantime? AI is currently killing their "degrading" jobs, do you think forcing them to take a shift at McDonald's is going to help their artistic career advance?


Rembrandt would argue that he was a craftsman, although some of the liberties he took in stuffing his paintings with hidden innuendo and symbolic jokes at the expense of some of his clients, definitely makes many of his paintings works of art.

Alas, only when taking a shirt at MacDonald's becomes equaly obsolete, and it has been made apparent that any task or job humans do could also be done by technology, only then will it help the artist in your example with their artistic career.

It is remarkable, when you think about it, that artists seem to be the first people that are made to feel obsolete. There are plenty of jobs that could have been fully automated, steampunk-style, from the moment the industrial revolution took hold.

Maybe it becomes slightly less remarkable if you take into consideration that collecting/investing in art has always been an integral part of people of considerable wealth. Even if they did not care much for it, didn't understand any of it, or were only motivated for the money,... regardless of your field, being very wealthy forced you into developing at least some connection with art. The billionaires in tech all seem to be an exception to this rule, and their lack of any connection with art, may have made them feel that art is easiest of all to replace using generative software. And for them, this was probably true - and they lack the connection to have developed any taste or eye for quality in art, so they're easily pleased with something a computer makes for them.

If only the artists are actively excluded though, people in other jobs will never fully appreciate that given the effort, their job is just as easily automated. Once people in every possible job have been made to feel just as obsolete, the world may be ready to order itself based on individual preference and mutual appreciation of whatever it is you choose to do 'for a living'.


Having used Python on and off for 20 years, my experience with LLMs writing Python has been mixed. I don’t think that’s necessarily because of a low-quality dataset, but rather because Python’s applications are so broad and the language has gone through several paradigm shifts over time: sync vs. async, typed vs. untyped, scientific Python looking very different from web application code, some people really wishing it were an FP language, and others doing the clean-architecture OOP onion soup. It has gotten so fragmented.

Recently, I had a more pleasant experience using LLMs with Go. It reminds me a bit of Python 2.x, when the community seemed, in my view, more focused on embracing a stupid simple language, with everyone trying to write roughly similar "Pythonic" code.


> Having used Python on and off for 20 years, my experience with LLMs writing Python has been mixed. I don’t think that’s necessarily because of a low-quality dataset, but rather because Python’s applications are so broad and the language has gone through several paradigm shifts over time

If there’s one language that is the prime example of this, it’s C++, and according to this benchmark it ranks incredibly high.

I’m also thoroughly confused why Kimi 2.6 scores 83% while Opus 4.7 scores 67% for C++, GPT5.5 isn’t even in the top10.

Gemma 4 31B scores 100% success rate for Python (!!) while Opus 4.6 only 65%.

This benchmark really seems to be all over the place and doesn’t make sense.


The more filters you apply (single model and single language, especially if you also filter by pipeline like agentic vs one-shot), the fewer samples, so there is variance. Known limitation that is inevitable with any finite budget. This is why we are selective about adding more languages because it will dilute the amount of samples we can run per language per model. But the aggregated statistics hold up well and are very consistent in our testing.


I just applied a single filter, programming language.


I honestly think that, given the sorry state of the pre-GenAI internet, with all the SEO optimization nonsense, clickbait, and supplement peddling everywhere, LLMs are for now actually better than Google for “doing your own research” on many things.

At least at the entry level. Once you want to go in depth, the outcome in my experience is the same as with LLM use on any topic depends heavily on the domain knowledge of the prompter and their ability to steer it.


In my opinion, it has always been the “easy” part of development to make a thing work once. The hard thing is to make a thousand things work together over time with constantly changing requirements, budgets, teams, and org structures.

For the former, greenfield projects, LLMs are easily a 10x productivity improvement. For the latter, it gets a lot more nuanced. Still amazingly useful in my opinion, just not the hands off experience that building from scratch can be now.


I’m the same. Often the first step is a time-boxed exploration, just trying to make the key pieces work in any way to encounter major blockers as early as possible. No planning, no design, not following any best practices, often all in a single file. Then from there, either refactor/rewrite or just use it as input for planning.

Of course, it requires some discipline to not just yolo the prototype into production when that’s not appropriate.


It is kind of funny that throughout my career, there has always been pretty much a consensus that lines of code are a bad metric, but now with all the AI hype, suddenly everybody is again like “Look at all the lines of code it writes!!”

I use LLMs all day every day, but measuring someone or something by the number of lines of code produced is still incredibly stupid, in my opinion.


Microsoft never got that memo. They still measure LoC because it’s all MBAs.


Fuck is there a way to have that degree and not be clueless and toxic to your colleagues and users.


It all comes from "if you can't measure it you can't improve it". The job of management is to improve things, and that means they need to measure it and in turn look for measures. When working on an assembly line there are lots of things to measure and improve, and improving many of those things have shown great value.

They want to expand that value into engineering and so are looking for something they can measure. I haven't seen anyone answer what can be measured to make a useful improvement though. I have a good "feeling" that some people I work with are better than others, but most are not so bad that we should fire them - but I don't know how to put that into something objective.


Yes, the problem of accurately measuring software "productivity" has stymied the entire industry for decades, but people keep trying. It's conceivable that you might be able to get some sort of more-usable metric out of some systematized AI analysis of code changes, which would be pretty ironic.


There’s this really awful MBA tool called a “9-box”…


All evidence continues to point towards NO.


They seem better at working in finance and managing money.

Most models of productivity look like factories with inputs, outputs, and processes. This is just not how engineering or craftsmanship happen.


It's because the purpose of engineering is to engineer a solution. Their purpose is to create profit, engineering gets in the way.


How do you create profit?


No man, it's in the title, master bullshit artist


If so, it hasn't always been that way. Steve Ballmer on IBM and KLoC's: https://www.youtube.com/watch?v=kHI7RTKhlz0

(I think it is from "Triumph of the Nerds" (1996), but I can't find the time code)


Ballmer hasn’t been around for a long long time. Not since the Red Ring of Death days. Ever since Satya took the reins, MBAs have filled upper and middle management to try to take over open source so that Sales guys had something to combat RedHat. Great for open source. Bad for Microsoft. However, Satya comes from the Cloud division so he knows how to Cloud and do it well. Azure is a hit with the enterprise. Then along comes AI…

Microsoft lost its way with Windows Phone, Zune, Xbox360 RRoD, and Kinect. They haven’t had relevance outside of Windows (Desktop) in the home for years. With the sole exception being Xbox.

They have pockets of excellence. Where great engineers are doing great work. But outside those little pockets, no one knows.


I believe the "look at all the lines of code" argument for LLMs is not a way to showcase intelligence, but more-so a way to showcase time saved. Under the guise that the output is the/a correct solution, it's a way to say "look at all the code I would have had to write, it saved so much time".


The line of code that saves the most time is the one you don't write.


It's all contextual. Sometimes, particularly when it comes to modern frontends, you have inescapable boilerplate and lines of code to write. Thats where it saves time. Another example is scaffolding out unit tests for series of services. There are many such cases where it just objectively saves time.


Reason went out of fashion like 50 years ago, and it was never really in vogue.


> measuring someone or something by the number of lines of code produced is still incredibly stupid, in my opinion.

Totally agree. I see LOC as a liability metric. It amazes me that so many other people see it as an asset metric.


I wonder if we can use the compression ratio that an LLM-driven compressor could generate to figure out how much entropy is actually in the system and how much is just boilerplate.

Of course then someone is just going to pregenerate a random number lookup table and get a few gigs of 'value' from pure garbage...


it's still a bad metric and OP is also just being loose by repeating some marketing / LinkedIn post by a person who uses bad metrics about an overhyped subject


Yeah. I honestly feel 1m LOC is enough to recreate a fully featured complete modern computing environment if one goes about it sensibly.


I think the charitable way to read the quote is that 1M LOC are to be converted, not written.


Ironically, AI may help get past that. In order to measure "value chunks" or some other metric where LoC is flexibly multiplied by some factor of feature accomplishment, quality, and/or architectural importance, an opinion of the section in question is needed, and an overseer AI could maybe do that.


My favorite movie quote as it pertains to software engineering has for a long time been Jurassic Park's: “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”

That’s how I feel about a lot of AI-powered development. Just because you can have 10 parallel agents cranking out features 24/7 and have AI write 100% of the code, that doesn’t mean you’re actually building a product that users want and/or that is a viable business.

I’m currently in this situation, working on a greenfield project as founder/solo dev. Yes, AI has been tremendously useful in speeding things up, especially in patching over smaller knowledge gaps of mine.

But in the end, as in all the projects before in my career, building the MVP has rarely been the hard part of starting a company.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: