Signal Intent

The Deskilling Paradox

The industry is trying to sell us a half-true AI productivity story. These tools are undoubtedly making us faster at completing tasks, but that comes at a cost far beyond the time required for humans to verify machine outputs.

Across five hundred companies and eight months, Professor Suproteem Sarkar at the University of Chicago Booth School of Business, in partnership with Cursor, tracked how developers actually used AI. As models improved, developers used them more. Perhaps even more revealing: they began offloading higher-complexity work to AI, allowing them to take on more ambitious projects previously beyond their immediate reach.

None of this is particularly shocking in isolation. But a cognitive study from the same period tells a different story about the humans doing the producing–and the industry hasn’t reckoned with the tension between the two quite yet.

The Cognitive Problem

Sarah Baldeo, a researcher in AI and neuroscience at Middlesex University, publishing in Technology, Mind, and Behavior, found a correlation: that the more people relied on large language models, the less they trusted their own reasoning. Participants who depended heavily on large language models were more likely to report that the tools were thinking for them–not with them. As Baldeo puts it: “It really doesn’t have to do with the tool itself.”

So the tool isn’t the variable. The interaction pattern is. Users who questioned, rejected, and edited AI output maintained their reasoning confidence, whereas those who offloaded their thinking entirely lost it. The problem isn’t AI–it’s the way people are choosing to use it, and the way most products are designed to encourage that use.

Grace Liu and colleagues from Carnegie Mellon, MIT, Oxford, and UCLA took this a step further. They ran three randomized controlled trials across 1,222 participants with a simple premise: give people AI assistance during a learning phase, then remove it without warning and measure what happens.

After removal, AI-assisted participants solved 57% of the problems they were given, compared to 73% for the control group that never had access. That’s a 16 percentage point gap that emerged after roughly ten to fifteen minutes of use. But here’s the part that stuck out to me: AI-assisted participants didn’t just perform worse when the tool disappeared–they stopped trying altogether. Skip rates nearly doubled. Participants who used AI for direct answers saw their scores drop up to ten points below their own pretest baseline. The control group improved by 1%.

One line from the paper captures this dynamic precisely: “AI systems are fundamentally short-term collaborators: extraordinarily helpful in the moment, but indifferent to what that help does to the person receiving it over time.”

So, here we see productivity going in one direction while comprehension is going in the other. And the gap between them is growing. It’s worth noting that these findings come from controlled settings, not enterprise-level production environments. However, if ten to fifteen minutes of AI assistance in a lab can produce a 16 point performance drop, the implications for developers using these tools eight hours a day deserves more scrutiny than it’s getting. In most contexts, that divergence is a management problem worth monitoring. Being a cybersecurity practitioner, it feels like something closer to a structural crisis.

The Hinge

Security researcher Mohan Pedhapati, CTO of Hacktron, demonstrated the economics of AI-assisted offense. Pedhapati leveraged Claude Opus 4.6 to generate a full working exploit for CVE-2026-5873, an out-of-bounds vulnerability in Chrome 138’s V8 JavaScript engine. It only cost him $2,283 in API fees and 2.3 billion tokens over twenty hours.

Pedhapati didn’t write any exploit code. Instead, he spent those hours redirecting the model when it hit dead ends, cutting off unproductive lines of attack, and choosing which paths to pursue. He would recognize when the model got stuck and redirect, judging which lines of thought seemed promising and which to abandon. Opus did all the heavy lifting, and Pedhapati took care of the strategic reasoning. 

The independent judgement about what to pursue and discard–that ability to leverage intuition and form hypotheses from it–is precisely the cognitive faculty the Baldeo and Liu studies show as eroding under heavy AI use. The one input that AI can’t yet generate on its own is degrading in the population most responsible for defense.

Pedhapati’s $2,283 investment would have netted roughly $15,000 in combined bug bounties–a 6.5x return before accounting for his time. But models improve and API costs decline, and Pedhapati’s time spent was largely a learning curve that his next attempt won’t require. So, theoretically, the next exploit will be cheaper. In Pedhapati’s words: “Eventually, any script kiddie with enough patience and an API key will be able to pop shells on unpatched software.” Script kiddies already exist. AI lowers their barrier to entry (which feels ironic to say).

The economics favor the attacker. And the one thing separating an experienced exploit developer and a kid with a credit card is the same capability that cognitive research shows eroding.

The Arithmetic

Oxford philosopher Toby Ord, known for his work on existential risk, recently broke down AI agent costs by the hour and, surprise, the productivity narrative doesn’t survive the math.

On the surface, the numbers tell a compelling story. Some AI agents operate at roughly $0.40 per hour, whereas a human software engineer costs around $120. The efficiency argument writes itself. But Ord’s deeper finding is that costs compound as task duration and complexity increase. For example, GPT-5 costs $13 per hour for forty-five-minute tasks but $120 per hour for two-hour tasks, in line with human labor. O3 reaches $350 per hour, or nearly three times the cost of a human engineer. Ord concludes that what looks like progress is “increasingly lavish expenditure on compute,” not sustainable capability gains.

You can’t secure an organization in forty-five-minute bursts. Defense lives in the long-duration, high-complexity tail of Ord’s analysis where AI costs approach or exceed human costs. Offense, on the other hand, is composable from the cheap-end of the curve–one brief, targeted interaction is all an attacker needs.

The PRT-Scan campaign shows what cheap offense looks like at scale. A single threat actor, using six GitHub accounts, initiated over five hundred malicious pull requests over the course of a few weeks. Although the success rate was below ten percent, the campaign still compromised two npm packages (@codfish/eslint-config and @codfish/actions) and enumerated API keys and tokens for platforms like AWS, Cloudflare, and Netlify.

But the breach isn’t the point–it is the evolution of the attack. In phase one, researchers noticed raw bash scripts targeting small repositories. By phase three, AI-generated wrappers were dynamically identifying each target’s language, framework, and CI configurations. Each iteration of the attack was more idiomatically convincing than the last, and the entire campaign cost virtually nothing.

This is the arithmetic that breaks the productivity story. Comprehensive defense at scale costs disproportionately more than iterative offense at volume, and the gap is widening as AI makes low-skill, high-volume attack campaigns trivial to execute.

The Loop

In practice, most AI-driven productivity gains expand the attack surface. Each new line of code, each additional dependency, every expanded feature demands understanding, testing, and defense. Otherwise, the rational response to an attack surface you can no longer comprehend is to delegate. And that’s what some organizations are doing. But that delegation erodes the human capacity to understand what’s being defended. And as understanding erodes, the systems we build become less carefully examined, not because the code is worse in an obvious way, but because our capacity to foresee or foreclose vulnerability quietly disappears. Each cycle widens the gap between what we produce and what we comprehend.

This might read as a slippery slope. It isn’t. Each step in the cycle has independent evidence behind it, and the components are individually documented. The question is whether anyone is modeling this system-level dynamic, and I haven’t found evidence that they are.

That’s a hard thing to sit with. The productivity gains are real. Developers really are doing more ambitious work. The capability expansion is real, and the economic value is measurable. The answer is neither to reject AI nor to celebrate productivity without measuring what it erodes.

I don’t have a confident answer to where this all leads. Anyone who does is either selling something, hasn’t spent enough time with the data, or is seeing something I’m not. The evidence says we’re optimizing for the wrong thing. We shouldn’t be measuring output–we should be measuring understanding. The question I keep arriving at isn’t whether this trajectory leads somewhere dangerous. It’s what happens when we lose the first principles knowledge required to maintain and defend the systems we’re building alongside AI–and whether anyone will notice before it’s gone.