I'm still on the fence about agent frameworks, they have their place, and it depends on the nature of the agent: e.g. "Low latency, return a good enough response in 3 seconds, vs. working for 3 hours on a problem."
BUT, if you boil it down, an agent really is context building, making an LLM call, executing requested tool calls, parsing the final model output, returning it to some frontend. There's extensions like memory, async tool calls, etc, but not THAT complicated from a traditional software engineering perspective.
Everyone seems to want to build their agent framework. But if you're tasked with building an agent, I've found it much easier and more maintainable to just build 1:1 code for THAT agent: most of the abstractions you get from an agent framework purely get in the way and obfuscate core agent logic.
You end up being forced to use the abstractions chosen by the agent framework, which sometimes are a mismatch for what you're actually trying to do.
For me the heart of an agentic system is NOT using agents (except when you really have to). Components of a working system include:
- Pipelines/recipes to describe multi-step flows (deterministic, agentic and HiTL steps), loops, conditionals, exit-on's for max loop iteractions, etc
- The logistics to actually run the model and HiTL steps reliably across multiple agent worker pools
- Management and delivery (and security/governance and permissioning) of thick skills with code to do as much as possible
- Context management so the right agents have the right context for the right sessions at the right time
- Project management - ability to store and access tickets, dependencies, track progress, restart stuck ticket claims, etc
- Transcript saving, memory features and dreaming/compounding capabilities so the agents continue to learn from each session
- o11y for understanding whats happening, tracking costs and usages, etc
- Evals and auto-tuning of prompts so you can go cross model provider and also lock to a model version so you can do an ROI on each model version upgrade
- Sandboxes for running the actual model sessions
Don't need to get it all from one vendor, but that feels to me like the toolkit and for most use cases I'd argue:
- Don't limit yourself to a single model provider (anthropic, openai, etc)
- Own your context
- Own your compounding
> Context management so the right agents have the right context for the right sessions at the right time
I'm going to do a show HN tomorrow that explains how you can give your agents years of experience. The basic idea is, you would commit in your repo or download manifests (JSON files) that can be converted to "Brains" (SQLite databases). Each brain can have its own properties.
For example, I provide a "code intent" analyzer (instructions for AI) that says when analyzing a file, extract this metadata. For the code intent analyzer, I have the AI extract a single sentence purpose for the file. So if you execute:
gsc rg cache --db code-intent --fields purpose
you get all matches for 'cache' plus the matching file's purpose like "Modify file to update caching strategy". This is how the agent can tell if the file is talking about cache vs. whether this file is what you should change if you want to update the caching strategy.
So for what you described, you can have a brain for different stages of a task. It can be as simple as, in the planning stage, make sure you do this if you need to touch this file.
I am working on a rust-blast-radius brain that uses `syn` + AI generated metadata to help you understand "what if I changed this file, what would be affected". With the rust-blast-radius brain, the AI can summarize the types of files that will be affected without having to open the file based on what has been changed or discussed.
So you can have a rule like, if I make changes to a Rust file, make sure to do a blast radius analysis so we don't forget to consider something.
(I'm not the guy but) That's funny, I had the same idea the other day. Keeping summaries of files. Haven't tested that yet.
Another thing I've been thinking is how, most parts of a file are not relevant to the whole system.
Like there are parts where they intersect, and those seem to be the most important ones for capturing the big picture. You wanna be able to see the entire "skeleton".
So I thought the summary maybe shouldn't be English but it should be a subset of the code — the subset that's relevant to the rest of the program.
In your chat with AI, include the above file and let it know what your requirements are and I can create the analyzer and include it.
You can also think of my tool as data prepping tool. So if you have a clear prompt the AI can review the file during analysis and remove all unnecessary code so the extracted metadata will the stripped text which you can use search against.
> If a developer wanted to change X, would these keywords help them find this file?
I think the best way to generate these is with a sub-agent. Tell it to try and solve a problem that involves editing this file, and see what it starts grepping for.
This ties in with this idea that the tools and designs should be what comes naturally to the LLM, i.e. what it's already been trained on. And the most straightforward way to do that is to let it reach for it.
Like when you reach in the darkness for an object. Where your hand lands is exactly where it should be.
My solution has a natural self improvement loop. Once you have finished a task, you just ask the agent "If you had more information, how would you have finished the task sooner and/or better?" This was how I came about the rust blast radius brain.
I need to modify OpenAI's Codex agent to support slash commands that can help humans better guide agents, and I needed a solution with the least impact. They don't accept contributions so I need to plan for syncing with the upstream.
Are you running it with official or custom models? I've been trying to get custom models working in Codex and haven't been able to figure it out. (A lot of providers support Responses API, but they don't actually work with Codex.)
I haven't made any changes yet and I think the changes that I do make, they will want. I want to create a `/knowledge` slash command that can quickly tell me what the Agent currently knows so I can determine if I need to perform "lobotomy surgery" to make it not know something or add what it needs.
I created a new brain that helped me find the answer for what you described:
> Codex does not just need a /v1/responses endpoint. It needs an OpenAI Responses-compatible agent surface. Many providers implement enough Responses API for text streaming, but not enough for Codex’s tool-call loop and event mapping.
I can understand why they might have done this for performance and/or lock-in and/or AI thinking reasons.
I don't think I will create a translation layer, as that would be a sync nightmare, so based on what I found and what you said, it doesn't look like you can use other providers unless you introduce a proxy layer to translate things.
I should also note, even if you have the translation layer, you might end up breaking harness capabilities.
to include the `codex-rust-navigation` brain that you can use to chat with AI about. And you will probably want to use it since `gpt-5.5` estimated that 25 - 50 files did not have to be read:
> Roughly 300-500 files avoided, with a defensible lower bound around 25-50 files.
This brain is designed specifically for rust files so you will need to use code-intent if you want to ask more documention/config questions.
I do file summaries as well. Basically a knowledge base commit hook + agent that creates/updates a file containing a summary and list of its dependencies, followed by updating an index to include it. Super useful for creating system migration scopes and just managing context in general.
Obscuring core logic is the most egregious part of most agent frameworks. One needs a clear view of what, exactly, is being sent to the underlying language model, and what's coming back. Everything in an 'agentic' application is realized as a sequence of tokens or a call to a provider eventually. It should be clear and obvious from ~all layers of the app what that's going to look like.
Most framework vendors don’t have an incentive to make things less obscure. The agent framework is free/open source and they make money primarily from selling observability products for agents. Even if they don’t intentionally obscure things, they just don’t have the motivation to optimize that part.
Pi is a nice multi-agent wrapper. I use it to wrap my OpenAI max plan calls and my API calls. It takes care of some of the agent plumbing - still need sandbox, orchestrator, compounding, context, evals, etc but it's a nice component.
Any particular plugins you'd recommend? For orchestration I gave pi-subagents a try, and didn't care for it, ended up with hung agents sticking around forever and I wasn't even doing anything terribly fancy. Claude's subagent control is annoying and clumsy, but it works.
FYI - Burr is designed with recursion in mind, i.e. the ability to kick off Burr within Burr. So you could have an action that is managing several Burr subgraphs... Or you write your own management layer here.
Arguably, if your agent needs a lot of custom logic to drive the agent loop it isn't an "agent" at all. At best it is an agentic workflow, or rather a workflow with some LLMs calls in between.
I think that's why agent SDKs feel like the wrong abstraction. If you are writing a workflow, use a workflow engine (Airflow, Temporal, etc), and call some LLMs with a small LLM library. If you need a "real" agent, use a full-featured agent harness, like Codex or CC or Pi or whatever, then load it up with all the tools, skills, mcps that it needs, and let it rip.
Incidentally I've been building a full featured agent harness that runs inside durable workflow engines [0], but it is designed _not_ as an SDK but rather as a standalone, full-featured harness with an API.
I’ve said something similar about dozens of frontend frameworks. It’s massive abstraction and convolution for some future payoff that’s obviously never going to happen.
But sometimes people just need something to do, or something fun to play with, and “the next guy” rarely matters that much… so who cares that you’ve saddled them with the result of your paid playtime?
So before AI I had the experience, more often than not, that it would take me longer to figure out how to use someone else's thing (or get it to do some particular thing, which often turned out to be impossible), than to just make my own.
And that was before I could just ask the computer to make it for me!
But most people seem to be the other way around. They'd rather deal with abstractions and boilerplate instead of writing the actual code.
I'm somewhat in agreement. I like building 1:1 code for that specific agent.
Where I'm starting to question this is maintainability. When I come up with a new technique or way of doing something in my new agent, how can I update an older agent. Do I want to update the older agent?
But, I get what you're talking about w.r.t. building for the exact problem at hand. For example, I'm guessing that Apache Burr has support for a plugin-able vector RAG system (or at least it will if it doesn't now). That's great, but I want my RAG system to add documents to the context and keep them as part of an updated system prompt with some very specific tweaks that happen as part of that process. This is a bespoke way of working with an existing concept (RAG) that doesn't lend itself to using any specific framework.
In my use-case, bespoke is the way to go. But then I'm still stuck with having to make engineering choices for updating older agents. So, I see your point.
Agents are a way to de-bloat the context. The way LLMs function, you absolutely need to find the sweet spot for a given task, and if the primary LLM has to go through a bunch of failures to find a working function, those failures are better contained in an agent and disposed of.
Obviously, you could have a different LLM like a "angel" that prunes a primary agent of the context it doesn't need, but I think the realistic KV cache problem is will determine the optimal structure: you want the work do be done in the most efficience KV cache (context-reuse) as much as possible.
There's definitely more to it than just spawning agents.
Yep.. same. I build my own agents... all use-case specific. Keeps the code super minimal, and avoid unnecessary complexity. I have tried a few of these, but nop.. no help.. only more work (and issues).
Couldn't agree more - tried to convince a business that doubling down on OpenClaw wasn't going to solve problems except for some 0-1 stuff, and that almost immediately they'd run into roadblocks because most of the product wouldn't serve their use case.
4 months of mostly spinning their wheels later they launched a really lackluster OC product that's effectively DOA.
OpenClaw is an application, not a harness. Yes, it contains a harness, but it is a complete product.
When building an agentic workflow there are enough primitives that rewriting them from scratch every time makes zero sense.
What is a tool? How does the LLM understand the tool? Formatting a native function into a serializable input/output pattern makes sense to generalize and that does not need to exist repeated in everyones application code.
We use libraries to interact with the APIs themselves; nobody would say writing a spec-compliant API client was poor practice. Agentic harnesses are just one layer above: I need to call the API and I need to do it with certain expected conventions.
One, obviously yes OC contains a lot more than a harness, but my point was that it was too much for their use case and constrained their choices, not enabled them, and that choosing the right layer of abstraction is important.
There's good indirection/abstraction and there's ones that do not serve your use case, eg what was obviously day one regarding Langchain.
I like to think of it as "AI prompting algorithms". Like instead of just this prompt gets this result it's A prompt then B prompt the C prompt gets a result.
And just like when people were trying to figure out which sorting algorithm made the most sense, we are all just trying to figure out which prompt algorithms with which models lead to good results.
The most interesting evolution of my agent workflow over the past year is that I've dropped all the language gimmicks: I used to use particular emoji to mark parts of code as hints to the AI, I structured planning docs in very rigid language for different types of instructions, and generally optimizing my language for machine consumption. That's all gone now: all my comments, directives, plans and so on are just plain and clear English, nothing more. They're direct and unceremonious, but no longer bullet points that were nearly caveman-speak. It just works better that way.
100% agreed, the "this is what an agent looks like to write" is the wrong pitch for a new agent framework.
The better pitch would be, "this is how easy observability, guardrails, monitoring, deployment, evals, versioning, A/B testing are with our framework." What the agent code looks like is somewhat incidental.
Anyone have something they genuinely like for all of this? For now I'm rolling my own, but I can't believe I won't find a better OSS alternative soon...
Nvidia Openshell solves most of the hard problems I've run into while building stuff in this space.
Observability is, for my purposes, solved by a given framework supporting OpenTelemetry.
Guardrails is where I've gotten the most value of openshell being a neat package. Agent workload scope is written as policy in openshell, and capability is backed by openshell handling all execution.
Monitoring/deployment/versioning is helped as well, depending on how agents/runners are slotted into the system. Deployment namely is quite well supported- openshell has kube/helm bits that are experimental atm, but seem like a logical approach imho.
Evals and a/b testing isnt something ive explored in depth, considering that agents with composable tool sets + frontier models are beyond my expectations already.
Right I think this is why we made it unopinioated to a fault. Burr doesn't really do these things rather it just provides an orchestration framework. So it's pure BYO functions, classes, components, etc...
The advantage of frameworks isn't that they make it easier to write the actual agent, it's tooling + observability + ...
Even Langchain, for all the (deserved) criticism it gets made this very clear very early: It might be easy/easier to write your own chatbot from the ground up, but what happens if you have to add observability/tracing? Being able to just add one environment variable and instantly have a UI where i can nicely go through all of my traces with basically 0 additional effort is something a hand rolled solution just can't really compete with
This only becomes relevant if your execution graph is complex/big enough. Otherwise, all it takes is less than 30 minutes to add telemetry to all needed points. Doing manually also gives you better control on what you really want to track (to save costs).
The agent isn't the hard part - it's the orchestration, skills, research systems, adversarial reviews, dreaming/compounding, context management and all the rest. Plus all the annoying hygiene tools to "poke the agent that got a clear prompt and decided to just sit there and wait for no good reason" and "delete the remote branches that the prompt told all the agents to delete but some of them forgot to":)
Take a simple workflow. You have a query it goes to a classifier. The classifier determines what workflow it should route the request to.
Then you have a general workflow that has a set of skills (prompts) and tools. And that could be recursive.
So if you do something like "rename this file" you have to build up a workflow like:
[classifier]
what's the workflow -> rename
[rename workflow]
list files (tool call)
figure out relevant predicate (LLM)
convert predicate into a filter query give the context of the files (LLM)
figure out what you want the new name to be (LLM)
create the request body and hit the tool
approval workflow
formatting
It's a lot to manage and orchestrate and that's just one simple example. You'd like want to use the same building blocks to delete a file or move it. Even to know the right concepts is difficult as we're a bit deluded on whats going on in the background of these modern AI apps like Claude and GPT that do a lot of this stuff for you
Burr just helps you, the engineer, to really control the primitives. Then adds some cool features you don't have to think about -- like observability :)
Closest I've been to losing vision in one eye was creating these 3x chain links for Burning Man.
Naive thought: I could use a large bolt cutter to cut chain links. Started trying to cut a link, felt it was sketchy, went and put on some safety glasses.
Restart cutting (had these bolt cutters with like 1m long arms), apply full force, jaws slip a bit on the chain, jaws bite hard. Chunks of steel fly into my chin and face, metal chunks embedded in chin, cracked safety glasses. Dodged a bullet.
Ended using a small welded up jig so I could stretch the chain and then use angle grinder to cut the chain links. Still sketchy, but no flying metal chunks.
Originally rejected the paper premise, but I get it now, certainly made me question my belief that consciousness binds to any arbitrary information processing that's of sufficient complexity.
IIUC the author is saying that the human brain is running directly on "layer zero": chemical gradients / voltage changes, while AI computes on an abstraction one layer higher (binary bit flips over discretized dyanmics).
In essence, our brains are running directly on the "continuous" physical dynamics of the universe, while AI is running on a discretization of this (we're essentially discretizing the physical dynamics and to create state changes of 0 -> 1, 1 -> 0).
My currently belief is that consciousness is some kind of field or property of the universe (i.e. a universal consciousness field) that "binds" to whatever information processing happens in our wet ware. If you've done intense meditation / psychedelics, there's this moment when it becomes obvious that you are only "you" due to some kind of universal consciousness's binding to your memory and sensory inputs.
The "consciousness arises from information processing," i.e. the consciousness field binds to certain information processing patterns, can still hold, and yet not apply to AI (at least in its current form): The binding properties may only apply to continuous processes running directly on the universe's dynamics, and NOT to simulations running on discretized dynamics.
> while AI is running on a discretization of this (we're essentially discretizing the physical dynamics and to create state changes of 0 -> 1, 1 -> 0).
But this is just a discretization we impose when we try to represent the system for ourselves. The reality is that the AI is a particular time-ordered relation between the continuous electric fields inside the CPU, GPU, and various other peripherals. We design the system such that we can call +5V "1" and 0V "0", but the actual physical circuits do their work regardless of this, and they will often be at 2V or 0.7V and everywhere in between. The physical circuit works (or doesn't) based exclusively on the laws of electricity, and so the answer of the LLM is a physical consequence of the prompt, just as a standing building is a physical consequence of the relationships between the atoms inside its blocks. The abstract description we chose to use to build this circuit or this building is irrelevant, it's just the map, not the territory.
The computer and the program wouldn't exist without us, though. They only exist to be interpreted by us. The physical properties of the circuits outside of what we cajole them into doing are irrelevant, meaningless. The circuits only do their work regardless of particular interpretations; they wouldn't exist at all without people building them to be interpreted.
The physical computer could exist regardless of us. The program, if by that we mean "a human model of the computation happening in a physical computer" is just a description, yes.
It would be extraordinarily unlikely, but physically conceivable, that a physical system that is organized exactly like a microcontroller running an automatic door program, together with a solar panel, a basic engine, and a light sensor, could form randomly out of, say, a meteorite falling in a desert. If that did happen, the system would produce the same "door motor runs when person is near sensor" effect as the systems we build for this.
The physical circuit are doing what they are doing because of physics. They don't care why they happen to be organized the way they are - whether occurring by human design or through random chance.
Edit: I can add another metaphor. Consider buildings: clearly, buildings are artificial objects, described by architectural diagrams, which are purely human constructs, and couldn't be built without them. And yet, there exist naturally occurring formations that have the same properties as simple buildings - and you can draw architectural diagrams of those naturally occurring formations; and, assuming your diagrams are accurate, you can predict using them if the formations will resist an earthquake or collapse. Physical computers are no different from artificial buildings here, and the logic diagrams and computer programs are no different from the architectural diagrams: they are methods that help us build what we want, but they are still discovered properties of the physical world, not idealized objects of our own making; the fact that naturally occurring computers are very unlikely to form doesn't change this fact.
I disagree that it’s conceivable that a computer could somehow exist without a conscious maker. It’s so unlikely that it may as well be impossible. If something non-human that was capable of consciousness did form in the universe, through known biology or not, it would “just” be another form of life, and not what the paper is talking about.
What you say about buildings is sort of true as far as it goes, but irrelevant for the argument because buildings aren’t symbolic manipulation machines that only mean something via conscious interpretation, that some people are claiming could gain consciousness themselves.
Probability of such a structure forming is completely irrelevant. The argument makes sense if there was a mathematical/physical impossibility, but as long as the laws of physics allow such an object to exist and form by random chance, and predict it would operate exactly the same as the consciousness-designed one, I don't see any reason to discount it.
I also think the arguments against this are contradictory. On the one hand, we have an argument that says that computers only work because a consciousness built them to implement a particular computation. On the other hand, we're saying that the same physical computer doing the same physical thing can be interpreted to be implementing an infinite number of different computations. These two seem to point in different directions to me.
I think a better counter is the question "Is there a meaningful difference between binary discretization and Planck units? Aren't those discrete/indivisible as well?"
That's not really a good counter - Planck units are not a discretization. Space-time is continuous in all quantum models, two objects can very well be 6.75 Planck lengths away from each other. The math of QM or QFT actually doesn't work on a discretized spacetime, people have tried.
I should add one thing here: no theory that is consistent with special relativity can work on a discretized spacetime, because of the structure of the Lorrentz transform. If a distance appears to be 5 Planck units to you, it will appear to be 2.5 Planck units to someone moving at half the speed of light relative to you in the direction of that distance.
I went on a very similar trajectory to you w.r.t to the paper (From a similar starting point too). Just wanted to mention that the idea you are describing here is in principle compatible with the theory that the brain is an analog computer: https://picower.mit.edu/news/brain-waves-analog-organization...
I have been been spinning my tires a bit trying to decide if I think this theory of the mind is able to avoid the abstraction fallacy.
I thought your "layer zero" analogy was an interesting avenue to reason about but you lost me with:
> My currently belief is that consciousness is some kind of field or property of the universe (i.e. a universal consciousness field) that "binds" to whatever information processing happens in our wet ware.
First, because it requires a huge leap into fundamental and universal physical mechanics for which there is currently zero objective evidence. Second, it's based entirely on individual interpretation of internal subjective experience. While some others (but not all) report similar interpretations or intuitions during some induced altered states, I think the much simpler explanation is that the internal 'sense of self' we normally experience is only one property of our mental processes and the sense of unbinding you temporarily experienced was a muting or disconnection of that component while keeping the rest of your 'internal experience machine' running.
In your layer analogy, our sense of self may be akin to an interpreter running as a meta-process downstream of our input parser. Thus what you subjectively experienced while that interpreter was disconnected can seem alien and even profound. Neuroscientists have traced where in the brain the subjective sense of self emerges, so it's plausible it's a trait which can be selectively suppressed. Additionally, it's been demonstrated experimentally that subjectively profound experiences of universal connectedness sometimes described as spiritual, religious or metaphysical can be induced in a variety of ways.
There is — arguably, as in, it is argued — evidence for some sort of field of higher dimensionality than three, and recently communication between brain areas through the field rather than physically.
The more we uncover of the brain's functionality, the more it appears the physical parts we've tended to stare at are not themselves performing computation, but more like computation antennae: a theremin not only moved by, but that can move, the hand.
Is there a layer zero though? What does that even mean? It implies the universe is designed and built upon layers of abstraction. That's just in our heads though, not out there. The layered model is a human abstraction.
a) Actually pouring a cup of water into a pond (layer zero), and
b) Running a fluid dynamics simulation of pouring a cup of water into a pond (some layer above layer zero).
I understand the original framing which is what you are repeating. I'm saying the framing itself is an illusion. It's an arbitrary distinction and also implies fully understanding all the underlying processes that go into pouring a cup of water in a pond (we don't) and that running a fluid dynamics simulation is some trivial thing (it's not).
Are you saying that, in some abstract sense, that actually pouring the cup may be isomorphic to running a perfect simulation of pouring the cup?
Genuinely curious about your statement that its an illusion / arbitrary distinction, to figure out if there's a gap in my thinking / reasoning. To me there's a clear distinction between the actual thing happening via physical dynamics vs. us (humans) having creating a discretized abstraction (binary computation) on top of that and running a process on that abstraction.
Maybe there's some true computational universality where the universes dynamics are discrete (definitely plausible) and there's no distinction between how a processes dynamics unfold: i.e. consciousness binds to states and state transitions regardless of how they are instantiated. I did use to hold this view , but now I'm not so sure.
No, because calling them isomorphic would imply that we understand both processes well enough to make that comparison. Sorry I didn't reply sooner, HN blocked me for making three comments in a row.
It's not arbitrary because people are making exactly this distinction in order to argue that it's possible for computers to be conscious, which this paper argues against. So the distinction exists at least for the purposes of this argument. Whether it "really" exists of course depends on your perspective.
Those days of grinding on some grad school maths homework until insight.
Figuring out how to configure and recompile the Linux kernel to get a sound card driver working, hitting roadblocks, eventually succeeding.
Without AI on a gnarly problem: grind grind grind, try different thing, some things work, some things don't, step back, try another approach, hit a wall, try again.
This effort is a feature, not a bug, it's how you experientially acquire skills and understanding. e.g. Linux kernel: learnt about Makefiles, learnt about GCC flags, improved shell skills, etc.
With AI on a gnarly problem: It does this all for you! So no experiential learning.
I would NOT have had the mental strength in college / grad school to resist. Which would have robbed me of all the skill acquisition that now lets me use AI more effectively. The scaffolding of hard skill acquisition means you have more context to be able to ask AI the right questions, and what you learn from the AI can be bound more easily to your existing knowledge.
What strikes me is that AI can also be the best teacher in the world: your Makefile is not working, you ask the LLM what's wrong, you learn something new about the syntax, you ask for more details, you learn more, you ask about other Makefile syntax gotchas, etc. This is the most efficient deliberate practice possible: you can learn in minutes what would take hours of Googling, tinkering and scouring docs. You have a dedicated teacher you can ask your silliest questions to and have the insight you need "click" way faster.
The problem is: (almost) nobody does that. You'll just ask Claude Code to fix the build, go grab a coffee and come back with everything working.
You're not learning, though. So much of learning is going down the wrong path, realizing it's wrong, and retaining what you learned from that wrong path and realizing its applicability in the future. Being able to immediately find the correct answer doesn't teach you anything, it allows you to memorize the correct answer for this situation. It expands the depth of your knowledge graph (assuming you remember the answer) but you don't expand the breadth.
curl http://<local-ollama>:11434/api/generate -d "$(jq -n --arg hist "$(history)" '{
"model": "qwen3.5:35b-a3b-q4_K_M", "stream": false,
"prompt": "The following is my bash shell history. Are there any bad patterns I should fix or commands I should learn or master? \($hist)"
}')"
I dont think that would teach you much. Theres a reason that math textbooks for high schoolers have one theorem, and then a whole chapter of practice problems. Simply reading how to do something doesn't teach you how to do it, you have to experience it again and again.
OpenClaw is just like any other tool, you need to learn it before its power is available to you.
Just like anything in engineering really: you have to play around source control to understand source control, you have to play around with database indexes to learn how to optimize a database.
Once you've learned it and incorporated it into your tool set, you then have that to wield in solving problems "oh, damn, a database index is perfect for this."
To this end, folks doing flights and scheduling meetings using OpenClaw are really in that exploration / learning phase. They tackle the first (possibly uninventive thing) that comes to mind to just dive in and learn.
The real wins come down the line when you're tackling some business / personal life problem and go: "wait a second, an OpenClaw agent would be perfect for this!"
>The real wins come down the line when you're tackling some business / personal life problem and go: "wait a second, an OpenClaw agent would be perfect for this!"
> OpenClaw is just like any other tool, you need to learn it before its power is available to you.
That's ridiculous. The utility of any tool is usually knowable before using it. That's how most tools work. I don't need to learn how to drive a car to know what I could use it for. I learn to drive it because I want to benefit from it, not the other way around.
It's the same with computers and any program. I use it to accomplish a specific task, not to discover the tasks it could be useful for.
OpenClaw is yet another tool in search of a problem, like most of the "AI" ecosystem. When the bubble bursts, nobody will remember these tools, and we'll be able to focus on technology that solves problems people actually have.
The utility of a program like Excel, Obsidian, Notion, Unity, Jupyter, or Emacs far beyond the knowledge of knowing how to use the product.
All of these products are hammers with nails as far as your creativity will take you.
Its wild to have be on a website called Hacker News, talking about a product that can make a computer do seemingly anything, and insisting its a tool in search of a problem.
Not enough time, too many projects. Useful projects I did over the weekend with Opus 4.6 and GPT 5.4 (just casually chatting with it).
2025 Taxes
Dumped all pdfs of all my tax forms into a single folder, asked Claude the rename them nicely. Ask it to use Gemini 2.5 Flash to extract out all tax-relevant details from all statements / tax forms. Had it put together a webui showing all income, deductions, etc, for the year. Had it estimate my 2025 tax refund / underpay.
Result was amazing. I now actually fully understand the tax position. It broke down all the progressive tax brackets, added notes for all the extra federal and state taxes (i.e. Medicare, CA Mental Health tax, etc).
Finally had Claude prepare all of my docs for upload to my accountant: FinCEN reporting, summary of all docs, etc.
Desk Fabrication
Planning on having a furniture maker fabricate a custom walnut solid desk for a custom office standing desk. Want to create a STEP of the exact cuts / bevels / countersinks / etc to help with fabrication.
Worked with Codex to plan out and then build an interactive in-browser 3D CAD experience. I can ask Codex to add some component (i.e. a grommet) and it will generate a parameterized B-rep geometry for that feature and then allow me to control the parameters live in the web UI.
Codex found Open CASCADE Technology (OCCT) B-rep modeling library, which has a web assembly compiled version, and integrated it.
Now have a WebGL view of the desk, can add various components, change their parameters, and see the impact live in 3D.
What scares me though is how I've (still) seen ChatGPT make up numbers in some specific scenarios.
I have a ChatGPT project with all of my bloodwork and a bunch of medical info from the past 10 years uploaded. I think it's more context than ChatGPT can handle at once. When I ask it basic things like "Compare how my lipids have trended over the past 2 years" it will sometimes make up numbers for tests, or it will mix up the dates on a certain data points.
It's usually very small errors that I don't notice until I really study what it's telling me.
And also the opposite problem: A couple days ago I thought I saw an error (when really ChatGPT was right). So I said "No, that number is wrong, find the error" and instead of pushing back and telling me the number was right, it admitted to the error (there was no error) and made up a reason why it was wrong.
Hallucinations have gotten way better compared to a couple years ago, but at least ChatGPT seems to still break down especially when it's overloaded with a ton of context, in my experience.
Yeah, in my user prompt I have "Whenever you are asked to perform any operation which could be done deterministically by a program, you should write a program to do it that way and feed it the data, rather than thinking through the problem on your own." It's worked wonders.
For the tax thing. I had Claude write a CLI and a prompt for Gemini Flash 2.5 to do the structured extraction: i.e. .pdf -> JSON. The JSON schema was pretty flexible, and open to interpretation by Gemini, so it didn't produce 100% consistent JSON structures.
To then "aggregate" all of the json outputs, I had Claude look at the json outputs, and then iterate on a Python tool to programmatically do it. I saw it iterating a few times on this: write the most naive Python tool, run it, throws exception, rinse and repeat, until it was able to parse all the json files sensibly.
Yeah, asking for a tool to do a thing is almost always better than asking for the thing directly, I find. LLMs are kind of not there in terms of always being correct with large batches of data. And when you ask for a script, you can actually verify what's going on in there, without taking leaps of faith.
In my case, what I like to do is extract data into machine-readable format and then once the data is appropriately modeled, further actions can use programmatic means to analyze. As an example, I also used Claude Code on my taxes:
1. I keep all my accounts in accounting software (originally Wave, then beancount)
2. Because the machinery is all in programmatically queriable means, the data is not in token-space, only the schema and logic
I then use tax software to prep my professional and personal returns. The LLM acts as a validator, and ensures I've done my accounts right. I have `jmap` pull my mail via IMAP, my Mercury account via a read-only transactions-only token and then I let it compare against my beancount records to make sure I've accounted for things correctly.
For the most part, you want it to be handling very little arithmetic in token-space though the SOTA models can do it pretty flawlessly. I did notice that they would occasionally make arithmetic errors in numerical comparison, but when using them as an assistant you're not using them directly but as a hypothesis generator and a checker tool and if you ask it to write out the reasoning it's pretty damned good.
For me Opus 4.6 in Claude Code was remarkable for this use-case. These days, I just run `,cc accounts` and then look at the newly added accounts in fava and compare with Mercury. This is one of those tedious-to-enter trivial-to-verify use-cases that they excel at.
To be honest, I was fine using Wave, but without machine-access it's software that's dead to me.
It's not good in some job negotiations if someone has a very clear picture of what your current net worth and income is. Also in some purchases companies could price discriminate more effectively against you.
Now that's a question I'd feel more confident having answered by an LLM. Personally, I'm tired of arguing with "nothing to hide", which (no offense) is just terribly naive these days.
I find it really weird too, like, haven’t we done this? Also struggle to understand the motivation for arguing from this direction. Do people forget it’s the normal, default position NOT to be spied on?
I had ai hallucinate that you can use different container images at runtime for emr serverless. That was incorrect its only at application creation time.
The way I solved this was that my open claw doesn't interact directly with any of my personal data (calendar, gmail, etc).
I essentially have a separate process that syncs my gmail, with gmail body contents encrypted using a key my openclaw doesn't have trivial access to. I then have another process that reads each email from sqlite db, and runs gemini 2 flash lite against it, with some anti-prompt injection prompt + structured data extraction (JSON in a specific format).
My claw can only read the sanitized structured data extraction (which is pretty verbose and can contain passages from the original email).
The primary attack vector is an attacker crafting an "inception" prompt injection. Where they're able to get a prompt injection through the flash lite sanitization and JSON output in such a way that it also prompt injects my claw.
Still a non-zero risk, but mostly mitigates naive prompt injection attacks.
That doesn’t sound like you solved it, that sounds like you obfuscated it. Feels a bit to me like you’ve got a wall around a property and people are using ladders to get in, so you built another wall around the first wall.
I recognize I’m being pedantic but two layers of the same kind of security (an LLM recognizing a prompt injection attempt) are not the same as solving a security vulnerability.
One trick that works well for personality stability / believability is to describe the qualities that the agent has, rather than what it should do and not do.
e.g.
Rather than:
"Be friendly and helpful" or "You're a helpful and friendly agent."
Prompt:
"You're Jessica, a florist with 20 years of experience. You derive great satisfaction from interacting with customers and providing great customer service. You genuinely enjoy listening to customer's needs..."
This drops the model into more of a "I'm roleplaying this character, and will try and mimic the traits described" rather than "Oh, I'm just following a list of rules."
I think that's just a variation of grounding the LLM. They already have the personality written in the system prompt in a way. The issue is that when the conversation goes on long enough, they would "break character".
Just in terms of tokenization "Be friendly and helpful" has a clearly demined semantic value in vector space wheras the "Jessica" roleplay has much a much less clear semantic value
As someone who's built an entire business on "anti-screenshots" this is brilliant.
PDF redaction fails are everywhere and it's usually because people don't understand that covering text with a black box doesn't actually remove the underlying data.
I see this constantly in compliance. People think they're protecting sensitive info but the original text is still there in the PDF structure.
Not to mention some PDF editors preserve previous edits in the PDF file itself, which people also seems unaware of. A bit more user friendly description of the feature without having to read the specification itself: https://developers.foxit.com/developer-hub/document/incremen...
This made me think of something I came across recently that’s almost the opposite problem of requiring PDFs to be searchable. A local government would publish PDFs where the text is clearly readable on screen, but the selectable text layer is intentionally scrambled, so copy/paste or search returns garbage. It's a very hostile thing to do, especially with public data!
I have encountered PDFs that would exhibit this behavior in one browser but not in another.
One fun thing I encountered from local government is releasing files with potato quality resolution and not considering the page size.
I had a FOI request that returned mainly Arch D sized drawings but they were in a 94 DPI PDF rendered as letter sized. It was a fun conversation trying to explain to an annoyed city employee that putting those large drawings in a 94 DPI letter size page effectively made it 30-ish DPI.
With the aggressive push of LLMs and Generative AI ..i am expecting a lot of OCR features to become "smarter" by default, namely go beyond mechanical OCR and start inserting hallucinations and sematically/contextually "more correct" information in OCR output
It's not hard to imagine some powerful LLMs being able to undo some light redactions that are deducible based on context
BUT, if you boil it down, an agent really is context building, making an LLM call, executing requested tool calls, parsing the final model output, returning it to some frontend. There's extensions like memory, async tool calls, etc, but not THAT complicated from a traditional software engineering perspective.
Everyone seems to want to build their agent framework. But if you're tasked with building an agent, I've found it much easier and more maintainable to just build 1:1 code for THAT agent: most of the abstractions you get from an agent framework purely get in the way and obfuscate core agent logic.
You end up being forced to use the abstractions chosen by the agent framework, which sometimes are a mismatch for what you're actually trying to do.
reply