Every wave of AI tools gives us a new capability. We immediately race to do more of that capability. A decade later, nothing ships.
When scouts discovered Happy Gilmore could drive a golf ball 400 yards, nobody asked whether he could putt. They saw the capability and assumed the rest would follow. We have been making the exact same mistake with AI tools for ten years.
The DataRobot Lesson
In 2012, the promise was clean: give companies the right platform and every analyst becomes a data scientist. By 2016, DataRobot made good on that promise. You could point it at a dataset and get a working model in hours.
But companies would show up with that same promise and the same gap. We want to predict sales. We want to know if a construction project will succeed. We have the platform. Here is our data.
Six to eight months later, after data audits, feature engineering, and weeks working through what "successful project" even meant across departments, we would get to a first model. Not because the technology was slow. Because the actual work was never about the model.
The March of Nines
A decade later, most enterprises still do not run machine learning in production. Not because DataRobot failed. Because we optimized for the wrong thing. Andrej Karpathy put a framework to it on the Dwarkesh Podcast: he called it the march of nines. Waymo had a perfect self-driving demo in 2014. Every additional nine of reliability after that required as much work as the previous one. The demo looked nearly complete. The production system was four nines away. That is not a self-driving problem. That is the problem.

The Hidden Layer
The hidden layer is always the same. Three things that block value, every cycle.
Data quality and documentation. DataRobot could not teach a construction company what "successful project" meant across departments. An agent cannot operate on data nobody has explained to it. The problem was never model accuracy. It was context.
Trust at scale. A model one person validates is not a model fifty people can rely on. Greg Brockman said it plainly on the Big Technology Podcast last month: governance, observability, and oversight have to be built in parallel with capability, not after the demo works. Trust is not shipped as a feature. It is earned.
Problem definition. The first two months of every ML engagement were spent figuring out what exactly was being predicted and why. The tool was ready on day one. The problem was not. The same is true with agents.
The Actual Work
That is the actual work. Not more agents. Not more products.
Use Claude Code or Cursor to put a functioning version of the product in front of users and start collecting feedback. The slide deck and the spec document used to come before the build; now they are the bottleneck. A working Streamlit app or React interface gives your first ten users something real to click through, break, and argue with, and the feedback you get back is grounded in the actual experience, not a hypothetical one. One hour of someone using a live product teaches you more than a month of planning what to build.
Use agents to do the documentation work that never gets done. Point Claude at your database schemas and have it generate a data dictionary. Feed it your existing SQL, dashboards, and reports and have it reverse-engineer business definitions. Run it through a conversation with a subject matter expert and output structured context your agents can actually reason from. Documentation that used to require a dedicated sprint now takes an afternoon. Without it your agents guess. With it they compound.
Use Claude Code to embed governance where the work happens. Instead of a compliance document users have to find, build the policy into the agent's operating layer, surfacing the right rule at the right moment, in context, without breaking the workflow. The agent should know what it can and cannot do before the user has to ask.
"The work has never been the tool. It is always the layer underneath it."
Learn to Putt
Happy Gilmore did not need to drive the ball 500 yards. He needed to learn to putt. If we spend the next decade racing to ship agents the same way we raced to build models, we will be sitting here in 2035 with nothing to show for it.
The work has never been the tool. It is always the layer underneath it.