When we pitch $15/hr to potential clients, the reaction is usually one of two things. Either cautious interest — “that seems too good to be true, but let’s hear more” — or polite dismissal, which usually means they are thinking “this is going to be cheap offshore work with AI branding.”

Both reactions are fair. So I want to share what three months of actual delivery data looks like, because I think the honest answer is more interesting than either the skeptical or the credulous version.

What we measured and why

Across five client engagements over Q4 2024, we tracked: hours billed, story points delivered, defects found post-delivery by clients, and — for reference — what the same work would have cost from mid-market agencies in the US and UK at their standard blended rates. Not the top-tier agencies. Mid-market. The ones most of our clients were using or considering before us.

The combined dataset was 1,847 hours across the five engagements. Not enormous, but enough to have some confidence in the numbers.

The output numbers

312 story points delivered across the three months. At typical senior developer velocities — 8–12 points per week — that volume would require roughly 26–39 weeks of one developer’s time. We delivered it across three months with two-person pods per engagement, using AI agents for the categories of work where they are reliable.

Approximately 40% of delivered hours involved meaningful AI agent assistance. That 40% would have taken roughly 2–3x longer without agent assist. The human hours freed up by that compression went into review, architecture work, and client communication — the parts where human judgment actually matters.

The quality numbers

Post-delivery defects — bugs found by clients after sprint acceptance — averaged 1.4 per 10 story points across the dataset. I went looking for industry benchmarks to compare this to, and the honest answer is that benchmarks in this area are inconsistent and often not comparable. The figures I have seen for outsourced development generally range from 3–8 defects per 10 story points, but the methodology varies enough that I would not put too much weight on that comparison.

What I can say is that the defect rate was lower than we expected and lower than what our clients reported from their previous agencies. The most likely explanation: AI-generated test suites have higher coverage than developer-written tests under time pressure. When you catch more bugs in testing, fewer reach the client.

The cost comparison

1,847 hours at $15/hr: $27,705. The same hours at a US agency rate of $125/hr (conservative mid-market): $230,875. The same hours at $40/hr blended offshore: $73,880.

I want to be careful here. These are not identical products. A US agency brings different things — stronger architecture judgment, easier communication timezone-wise, sometimes deeper domain expertise. The comparison is not “WizQuest does exactly what they do for 88% less.” It is “for the specific output we delivered — working features, integrations, documentation, and tests — the cost per unit of output was dramatically lower.”

Where the model does not work

These numbers reflect the categories of work where AI-native delivery performs well. Novel algorithmic problems, complex distributed systems design, security-critical implementations that require deep expertise — the advantage shrinks in those domains. The AI agents we use are not equally good at everything. We know which categories they handle reliably and we do not use them for the ones they do not.

The $15/hr rate is real. So is the delivery. The honest asterisk is that it works best for well-defined, product-focused development work — which is the majority of what early-stage and growth-stage companies need, but not all of it.