Friction was the buffer
For centuries, nails were hand-forged slowly. That slowness did something we don't usually account for: it kept the blacksmith in contact with the material. He felt the iron, adjusted the angle, caught problems through touch. The slowness wasn't just a constraint, it was a feedback loop.
Then came cut nails sheared from iron plate. Then wire-cut nails with continuous feed and extraordinary throughput. Nobody optimized wire-cut nails by hiring more blacksmiths to inspect them. They didn't need to. They needed something else: the input had to be right, because the machine couldn't feel its way through ambiguity. The machine didn't ask questions. It just ran.
I’m digging into Brian Potter's The Origins of Efficiency, which breaks any production process into five factors: transformation method, production rate, inputs and outputs, buffer size, and output variability. Potter is writing about physical processes like nail factories and steel mills, but the frame fits software uncomfortably well because AI is disrupting all five simultaneously.
Code is increasingly generated, not written. Production rate has skyrocketed. Inputs are shifting from keystrokes to tokens and context windows. The buffers—code review, CI, testing—are strained. Output variability has widened in both directions.
When it took three days to build a feature, ambiguity in the spec got resolved during development. We bumped into edge cases, asked clarifying questions, and refined as we went. Three days of friction was also three days of thinking. Slow software production speed was itself a buffer that masked upstream variation.
That buffer has shrunk. AI builds whatever we point it at. Whatever ambiguity lives in the prompt, the spec, the upstream decision—it gets faithfully encoded into working code that passes tests.
So then is code review mostly compensation for a failure that happened earlier? Every buffer we add downstream is an admission that something upstream is uncontrolled. Teams pour effort into reducing variability at the code level—linters, type systems, review checklists. But what about the quality of product decisions? Customer selection? Software architecture? API design?
Three reasonable people can look at the same problem and come out with three different scoping choices, three different API shapes, three different bets on what the customer needs. That variation compounds through everything downstream and we have no linter for it. There’s feedback, of course. It comes through revenue metrics, customer engagement, and incidents measured over weeks or months, well after the decision has become load-bearing.
Look at any AI software engineering benchmark: they all start with the problem already defined and measure whether the model solves it. Our entire eval infrastructure starts one step too late. The tools exist now to do what those three days of friction used to do in unearthing the contradictions, the shaky assumptions, the unstated tradeoffs. Instead, we point them at the last step, the code.
Our bottleneck was never whether we can write the code fast and bug-free. It's whether the bet on what the customer actually needs and the architecture that follows have been sharpened before shipping. We have linters for code style and none for decision quality. It’s time to point the tools upstream.