We're a 30-person ed-tech company. I built a Slack bot that connects our data warehouse, 250k Google Drive files, support tickets, and codebase so anyone on the team can ask it a question and get a sourced answer back. The bot took two and a half weeks to build; the data infrastructure under it took two years. Wrote up the architecture, where trust breaks down, and what I'd build first if starting over.
> The bot took two and a half weeks to build; the data infrastructure under it took two years.
This is the key lesson that everyone needs to step back and pay attention to here. The data is still king. If you have a clean relational database that contains all of your enterprise's information, pointing a modern LLM (i.e., late 2025+) at it without any further guidance often yields very good outcomes. Outcomes that genuinely shocked me no fewer than 6 months ago.
I am finding that 100 tables exposed as 1 tool performs significantly better than 100 tables exposed as 10~100 tools. Any time you find yourself tempted to patch things with more system prompt tokens or additional tools, you should push yourself to solve things in the other ways. More targeted & detailed error feedback from existing tools often goes a lot further than additional lines of aggressively worded prose.
I think one big fat SQL database is probably getting close to the best possible way to organize everything for an agent to consume. I am not going to die on any specific vendor's hill, but SQL in general is such a competent solution to the problem of incrementally revealing the domain knowledge to the agent. You can even incrementalize the schema description process itself by way of the system tables. Intentionally not providing a schema description tool/document/prompt seems to perform better with the latest models than the other way around.
I run a SaaS business on the side of my job and have been for 15 years. There’s a million questions I’ve never had the time to dig into although the data was there. Retention cohorts, free to paid tier conversions, subscription upgrades/downgrades and so much more. Just this week, I decided to just let an agent have access through psql and go nuts, writing all analysis to markdown files. Reading through it, there’s a few things it misunderstood and as a result, some of the analysis was flawed, but all in all I’m honestly mindblown. It would have taken me months to write queries and even just coming up with frameworks of how to think about these metrics.
Agreed. When I watch the llm start to explore the db - it really does impress me.
Can you expand on this:
You can even incrementalize the schema description process itself by way of the system tables. Intentionally not providing a schema description tool/document/prompt seems to perform better with the latest models than the other way around.
If you tell GPT5.x that there is a database it can query by calling ExecuteSql(query), but you don't bother explaining anything about the schema, it will try to figure things out ad-hoc. This has advantages for token budget because it will tend to only lookup the metadata for tables that seem relevant to the user's query.
If you have a gigantic data warehouse with 1000+ tables, there's no way you could fit all of that info into a system prompt without completely jacking something up in the blackbox. So, why bother trying?
Consider that the user's specific request serves as an additional constraint that can be used to your advantage to dramatically reduce the search space. Building a single prompt / schema description that will magically work for all potential user requests is a cursed mission by comparison.
I think context windows are too small for an agent to actually do this properly yet. I have much smaller databases and with 1b context frontier models they still need reminders or nudging or come up with completely wrong stuff in response to basic queries.
Having the c-levels relying on this for off-the-cuff info seems ... dangerous?
I just did the exact same thing for my company. I didn’t do the sql lite approach for gdrive though just a direct search.
The one part that is still difficult is the data modeling and table level descriptions etc. Maybe you make an update to a table - remove a column, etc. The 3rd party systems all have their schemas defined but the data warehouse is a bit more loose. So solving that really helps. Did you just use dbt schema to describe tables and columns then sync that to your bot? How did you keep it updated? And end of the day - worth building or buying? Also how did you track costs? I let users choose their model - but have learned it can get expensive fast. As I can see there are a lot of providers trying to solve this one thing. That said the data warehouse aspect is the loosely defined area and I can see dbt or one of those players try to build something.
Hi. thanks for sharing. One thing I'd like to know is how often do you validate the answers? If a human gives an answer like the one the AI is giving for example, you'd probably expect a margin of error of like 1% of making a mistake. The AI though, is it 1% or less - and who's validating it? Are you trusting it more or less than a human?
Thanks for sharing! Have been hesitating a lot between the swiping animation and the arrow — it has been hard to gage UX on this one, but seems like the arrows are somewhat of a more straightforward indication
I didn't realize that that was a swiping animation. It might be cool to pair it with rotating the cube in the swipe direction just enough to show that there are sides with extra information as a brief animation, then have it come back into place, or even show a slight rotation towards whichever direction the cursor is closest to as it's moving away from the middle. Or even show it in a quick spin right as the page loads
I've recently open-sourced "Helix", a project born out of my frustration with LinkTree's limitations. Helix is a visually appealing and customizable way to showcase your journey, projects, and objectives all in one place.
Built with the latest tech stack, including Nuxt3, Vue3, Rollup, Vite, SwiperJS, LottieJS, and Vuetify3, Helix offers a seamless user experience and performance. Deployment is a breeze, thanks to AWS. I also got to play with MidJourney for assets and GPT-4 for content.
I'm looking for your valuable feedback and suggestions to help me improve this project. Check out Helix on GitHub https://github.com/merylldindin/helix and take it for a spin :)
Polygon is a new kind of psychology practice that provides remote diagnostics for dyslexia, dysgraphia, dyscalculia, ADHD, and other learning differences. The company utilizes technology to provide a better experience for both clients and its team of in-house licensed psychologists. In addition to evaluating student history and behaviors, Polygon’s psychologists utilize a variety of widely accepted gold-standard assessments, are carefully selected over a 5-stage interview process, given extensive training, and diligently adhere to the company’s clinical principles. The tests administered satisfy IDEA requirements and can inform eligibility for services like IEPs, 504 plans, and accommodations on tests like the SAT, ACT, and GRE.
Today, more than 70M people lack access to quality testing due to expensive prices, archaic manual processes, and broken psychologists supply, causing year-long waitlists. Polygon’s mission is to enable every person with a learning difference to reach their full potential. Get reliable information, the latest research, and expert advice for dyslexia, ADHD, and other learning differences on our blog!
reply