Building AI Tools Inside Real Operating Companies: A Saxon Approach
We prototype AI tools inside real operating companies and document the entire process publicly. Here's what we've learned about building software that actually solves problems — not just demos well.
James Davidson
Founder & Head of AI Engineering
Updated
Why Most AI Tools Fail
The AI industry has a demo problem. Tools that look impressive in controlled environments often crumble when they meet the chaos of real business operations. We've seen this pattern repeatedly: a startup builds something that works perfectly on curated datasets, raises funding on impressive benchmarks, then struggles to deliver value when deployed in production.
The root cause is usually the same: isolation from real problems. Engineers build what they think businesses need, not what businesses actually need. The feedback loop is too long, the iteration cycles too slow.
Real problems create better software than hypothetical ones. Every constraint you discover in production is a constraint you would have designed around incorrectly in a lab.
This isn't a criticism of the engineers building these tools. It's a criticism of the development model. When you build in isolation — even well-funded, well-intentioned isolation — you optimise for the wrong things.
The Saxon Methodology
Saxon takes a different approach. Instead of building AI tools in a lab and then trying to sell them to businesses, we embed inside operating companies and build tools that solve problems those businesses already have.
The process follows five steps, and we've found that the order matters:
Step 1: Identify a Real Operational Constraint
We find real friction inside operating businesses — the stuff that slows teams down every single day. Not theoretical inefficiencies from a consulting report. Actual bottlenecks that people complain about in standups.
For Clara, our first public build, the constraint was clear: Cloud Employee's sales team was spending 60% of their time answering the same qualifying questions from inbound leads. The questions were predictable. The answers were in the documentation. But every lead needed a human conversation before they'd book a discovery call.
Step 2: Rapid Prototype Using AI
Fast, scrappy, functional. A working AI tool in days, not months. No spec decks. No roadmaps. Just build.
We use AI-assisted development tools aggressively during this phase. The goal is to get something in front of real users as quickly as possible, not to write perfect code. Perfect code comes later. Understanding the problem comes first.
The first version of Clara was 400 lines of TypeScript. It could answer questions about Cloud Employee's services, qualify leads based on a simple scoring rubric, and book Calendly slots. It wasn't sophisticated. It was useful.
Step 3: Test Inside a Live Business
Real users, real data, real feedback. Clara was deployed on Cloud Employee's website within two weeks of starting the build. Not as a replacement for the sales team — as a supplement. If Clara couldn't handle a conversation, it escalated to a human.
This is where the learning happens. In the first week, we discovered that 40% of inbound leads were asking questions Clara couldn't answer — not because the questions were hard, but because they were phrased in ways we hadn't anticipated. The FAQ database had answers to "What is your retention rate?" but not to "Do people actually stay?" or "What's the churn like?"
Step 4: Bring in Engineers to Scale It
Once validated, proper engineers take the prototype to production. This is where the code gets rewritten properly — correct data models, security hardening, scalability, observability, error recovery. The prototype proved the concept. The production build makes it reliable.
Step 5: Document Everything Publicly
We share what we learn — the wins, the failures, the code. No gatekeeping. Full transparency. This blog post is part of that process.
What We've Learned So Far
Three months into the Saxon experiment, here are the patterns that keep showing up:
Constraints are features. Every limitation we discover in a real business environment is a design input we wouldn't have had in a lab. Clara's conversation model is better because real users asked questions we'd never have thought to test.
Speed beats perfection. The prototype that ships in two weeks and gets real feedback is more valuable than the perfect architecture that ships in three months. You can always refactor. You can't always recover from building the wrong thing.
Documentation compounds. Every build we document creates an asset that informs the next build. Clara's learnings directly shaped how we approached the next project. This compounding effect is the core bet of Saxon.
AI tools need domain context. Generic AI tools solve generic problems. The tools that deliver real business value are the ones built with deep understanding of the specific domain. You can't get that understanding from a pitch deck.
What's Next
Saxon is shipping one new build per quarter. Each one tackles a different operational problem inside a real business. Each one gets documented publicly — including the parts that don't work.
Clara is our first public build. She's open-source, MIT-licensed, and currently handling 70% of Cloud Employee's inbound lead qualification without human intervention. The build log, architecture decisions, and performance data are all on the AI Builds page.
If you have a real operational problem that you think AI could solve, we want to hear about it. Not to sell you anything — to build something that actually works.
Frequently Asked Questions
How long does a typical Saxon build engagement last?
Most projects run 4-8 weeks from initial problem identification to working prototype. Scaling to production typically adds another 4-8 weeks depending on complexity and integration requirements.
Do you work with early-stage startups?
We primarily work with established businesses that have real operational data and existing processes. Startups without product-market fit usually don't have the constraints that make our approach valuable.
What happens to the code you write?
You own everything. We deliver full source code, documentation, and training. We also publish sanitised case studies and learnings publicly, but never share proprietary business logic or data.
How do you handle sensitive data?
We work under strict NDAs and follow enterprise security practices. For highly sensitive projects, we can work entirely within your infrastructure with no data leaving your systems.