The Perils of AI Overreliance: Why Smart People Get Fooled

One-line summary

A BCG study reveals that AI assistance boosts performance on easy tasks but degrades it on hard ones, with those trusting AI most suffering the biggest drops.

Research from Boston Consulting Group shows that GPT-4 dramatically improved consultants' output for routine tasks but caused a 19% performance drop on complex strategic problems. The danger was greatest for those who trusted AI most, as their own critical judgment became disengaged—mirroring automation complacency documented in aviation pilots. Experts now emphasize 'calibrating autonomy': knowing whether a task falls inside or outside AI's genuine capability frontier before relying on its output.

In 2023, 758 consultants at Boston Consulting Group were given access to GPT-4. For some tasks, their performance soared. For others, they didn’t just stumble—they did worse than colleagues who had no AI at all. The difference wasn’t the consultants’ skill or their prompting technique. It was whether the task fell inside or outside what the researchers, led by Fabrizio Dell’Acqua, called the “jagged technological frontier.” Inside the frontier, GPT-4 was genuinely capable: creative product innovation, brainstorming, basic market analysis. There, the AI-assisted group produced output rated roughly 40% higher in quality. But when the task moved outside that frontier—into strategic problems that required nuanced organizational reasoning or sensitivity to unstated constraints—the pattern reversed. The consultants using AI performed 19% worse than the control group that worked without it. They generated plausible-sounding, confident answers that were quietly wrong, and they lacked the critical distance to notice. The most unsettling detail, though, is who suffered the biggest drop. It wasn’t the skeptics or the novices. It was the people who trusted the tool most on the hardest problems. When a task felt difficult, they offloaded more of their thinking to GPT-4, and the tool’s fluency masked its failures. Overreliance turned out to be most dangerous precisely when the stakes were highest—because the user’s own judgment, the very faculty needed to catch errors, had been disengaged. This dynamic isn’t new. Researchers studying automation complacency in aviation documented a similar erosion decades ago: when pilots trusted autopilot too deeply, their manual flying skills and their ability to detect anomalies decayed. What’s different now is the speed and seamlessness of the offloading. Generative AI doesn’t just take over a routine; it drafts your thinking for you, often before you’ve formed your own view. Cognitive psychologists have long known about the “generation effect”—information you produce yourself is remembered and understood more deeply than information you merely read. When you skip the struggle of a first draft, you lose more than time. You lose the chance to build the mental models that let you recognize when a machine’s answer is hollow. The BCG experiment points toward a skill that isn’t measured by any benchmark: calibrating autonomy. That means knowing, in real time, whether a problem is likely inside or outside the AI’s genuine competence zone. The consultants who did well weren’t the ones with the fanciest prompts. They were the ones who recognized a task’s difficulty and adjusted their reliance accordingly—using AI as a collaborator to accelerate work they understood, and treating its output as a hypothesis to be stress-tested when they were on uncertain ground. Before you prompt, ask yourself: could I solve this problem without AI? If the honest answer is no, you’re likely outside the frontier—and the machine’s answer should be the start of your thinking, not the end of it.