How We Build AI Systems, Start to Finish

We’ve done 60-plus full AI deployments in about a year and a half. The tools change constantly. The models improve every quarter. The scaffolding that was best practice four months ago gets replaced by something better. What doesn’t change is the process.

Our process has two phases: discovery and consulting on one side, development and deployment on the other. We break those into four steps — Discovery, Blueprint, Build, Deploy. Each one has a specific output and a decision gate before moving to the next. You’re never locked in. You always know what you’re getting before you commit further.

Here’s how each step works and why it matters.

Phase 1: Discovery

Before we build anything, we need to learn how the business actually runs. Not how leadership thinks it runs — how it actually runs. Those are never the same thing.

During discovery, we’re doing three things simultaneously. We’re shadowing the team, watching how work moves. We’re getting hooked into their tech stack so we can see data flow from the backend. And we’re doing our own external research on the industry — typical tools, common bottlenecks, what the tech stack landscape looks like.

The shadowing is where the real value hides. We sit with each role. We watch them work. We time every task. For each workflow, we capture six things: who does it, how long it takes, how often it happens, what tools are used, what the error or rework rate is, and what happens when something goes wrong.

That last question is critical. It reveals the stakes. Something might look like obvious low-hanging fruit for automation — until you learn that when it goes wrong, the company gets sued, or there’s a compliance issue, or a client relationship is at risk. That changes everything about how you design the system. An LLM might handle the task 99% of the time, but you need to know what the 1% failure mode looks like before you put it in production.

We’re also tracking how humans spend their time. Efficiency of output is a key thing AI can improve, but only if you understand where time is actually being allocated versus where leadership assumes it’s being allocated. Humans are bound by time. If we can make their time more efficient, that’s where the ROI lives.

The output of discovery is a complete map of the operation. Every workflow documented. Every handoff diagrammed. Every data source mapped. The gaps between how leadership thinks work happens and how it actually happens — which are always significant — documented and quantified.

Phase 2: The Blueprint

The Blueprint is where we synthesize everything we learned and build the strategy. It’s a product in its own right — we charge for it, and if a client decides to stop here, they walk away with something any competent team could execute.

What’s inside: operational process maps showing current state for every task, role, tool, time allocation, and cost. A tech stack and data audit. A time and cost allocation analysis that crosses the workflow data with payroll to show what each task actually costs the business. And the AI opportunity matrix.

The opportunity matrix scores every workflow on three dimensions. Automation feasibility — can AI actually do this? Business impact — how much does it move the needle? And measurement reliability — can we prove the before and after?

That third dimension is what most strategy documents miss. A workflow might score 10 out of 10 on feasibility and impact, but if you can’t reliably measure the improvement, you can’t prove ROI. For our partnership model, where compensation is tied to outcomes, this distinction matters enormously. High impact plus high feasibility plus low measurement reliability means we build it under the base fee, not the performance component.

The ROI model uses real data from discovery: time saved × loaded labor cost × frequency × automation percentage. Three scenarios — conservative, expected, optimistic. We manage expectations by being conservative, and even the conservative numbers are usually impressive enough that the thing sells itself. The math either works or it doesn’t. If it doesn’t, why would we build it?

One of the hardest parts of the Blueprint is scoping the ambition correctly. You could rip everything off the existing stack, self-host everything, and build a full operating system from scratch. Or you could just get everyone a Claude account and run everything through a GitHub repo. Both could work. Both would provide ROI. One costs dramatically more and carries more risk. We have to find the right range for each specific business.

Phase 3: Build (Progressive Deployment)

This is where most AI projects go wrong, and it’s where our approach diverges from the traditional playbook.

The old way: scope the entire system, spend six to nine months building a massive solution, test in a controlled environment, deploy on a Monday, and pray. The vendor cleans their hands and walks away. By the time real users touch it, they’ve forgotten what it was supposed to solve. The person who championed the project has moved to a different company. Edge cases that weren’t anticipated start breaking things. The system sits there and dies.

We do progressive deployment. Module by module. Each module is independently valuable, scoped against specific success criteria, and delivers measurable ROI on its own.

We start by pulling the highest-leverage target from the Blueprint and building that workflow end to end. Test it on real data. Deploy it to production. Measure the impact against the success criteria and the baseline metrics we established. Then move on to the next module.

The compounding effect is real and visible. Each module delivers value. Each one generates data that makes the next one easier. Each deployment builds trust with the client — they see results, they get excited, they open up access to other parts of the business. We’ve had clients where a 45-day sprint on one bottleneck built enough trust and delivered enough ROI that the person we worked with went to their colleagues and said: “You’ve been talking about this problem for years. These guys can help.” And now that colleague is fully bought in with a ton of ideas.

There’s a practical reason for this approach too. The AI landscape shifts constantly. What was best practice four months ago might be obsolete. Building a monolith over nine months means you’re deploying yesterday’s architecture. Building module by module means each piece uses the best available tools at the time it’s built.

The philosophy underneath this: every pilot project failure stat you’ve read — 70%, 80%, 85% — comes from one of three patterns. They bolted AI onto an existing workflow without rethinking it. They bought something off the shelf that wasn’t custom to their operation. Or they tried to go too big too fast. Progressive deployment avoids all three.

Phase 4: Deploy and Measure

Deployment isn’t a technical event. It’s a change management event.

There’s a concept in biotech called patient compliance — what percentage of patients actually take their medication on the prescribed schedule. If a pill needs to be taken three times a day, compliance drops. If it requires an injection, it drops further. Doctors factor this into prescribing decisions.

The same principle applies to AI systems. We can build something that works perfectly, but if people aren’t using it, if they’re not providing feedback, if they’re not bought in — the ROI evaporates. Use compliance is everything.

The good news: the systems we build don’t usually require people to learn something new. They live inside existing tools. The team’s experience is that work that used to take hours now takes minutes. There’s no new login, no new interface. The AI is invisible infrastructure.

But we do need to explain how the system works, where humans step in, and how it changes their day-to-day. Most of the time, people are excited. The system handles the grunt work they didn’t want to do. They get to spend more time on the things they were actually hired for.

Measurement has two layers. The technical layer — uptime, runtime, token efficiency, error rates. And the business layer — are the KPIs moving? Is capacity increasing? Is cost decreasing? How are people actually spending their time now? What’s their sentiment?

For partnership engagements, we run quarterly business reviews. These are business-focused conversations with leadership: here’s what we deployed, here’s how it performed against projections, here are the issues that came up, here’s what’s next. If the metrics moved, great. If they didn’t, we dig into why — and because our compensation is tied to outcomes, we’re just as motivated to figure it out as they are.

The Thread That Runs Through Everything

Every step of this process — every call, every Slack message, every deployment, every document — is about building trust. That might sound soft for a technical process, but it’s the hardest-earned insight from 60-plus deployments.

The details between these steps are what separate the 15% of AI projects that succeed from the 85% that don’t. The process itself is intuitive. Anyone who’s done implementation work recognizes it. Discovery. Strategy. Build. Deploy. The difference is the depth of execution at each stage and the ongoing relationship that ties them together.

The companies that treat AI implementation as a project get project-level results. The companies that treat it as an ongoing partnership — with aligned incentives, continuous iteration, and embedded trust — get transformation-level results.

That’s the process. It’s still evolving. Every engagement teaches us something new. But the foundation — discover the real operation, build the strategy with real ROI math, deploy progressively, measure relentlessly, and build trust at every step — that doesn’t change.