Collaboration as Capability
Most leaders tracking AI adoption are measuring the wrong thing.
They're watching who's using it, how often, which tools. Useful data. But adoption is not capability, and the gap between the two is where organizational AI strategy can fall apart.
The Stanford AI Index 2026, published this week, puts a name to what good actually looks like: centaur evaluations - assessments that measure not the AI system in isolation, but the human-AI team. The insight is straightforward and consequential. If most real-world AI use involves people supervising, steering and integrating model outputs, then that's what you should be measuring. The unit of capability isn't the tool. It's the collaboration.
Which raises an uncomfortable question for most leaders: do you know what a good human-AI team looks like in your organization?
A field study from BCG, Harvard and MIT watched 244 consultants work through the same problem with the same AI tool and found three distinct patterns. Not three types of people - three modes of collaboration that the same person might move between depending on the task and the moment.
The researchers frame it around two questions. Who decides what needs to be done? And who decides how it gets done?
When the human holds both, they're using AI selectively - as a research tool, a way to stress-test their own thinking. Their domain expertise deepens through the process.
When the human holds the first but cedes the second, they're in continuous dialogue with AI across the full workflow - probing, pushing back, iterating. They develop fluency with AI itself as a capability.
When the human cedes both, the output arrives fast and looks polished. But nothing compounds. No domain expertise, no AI fluency. Just throughput.
The labels matter less than what sits underneath them: the same tool, the same task, radically different consequences for what people are actually building in themselves.
This is what makes it a leadership problem rather than an individual one.
Most people don't consciously choose which mode they're operating in. They default. And the default, especially under time pressure, is to cede both questions to the AI. It's faster. The output looks fine. Nothing flags it as a problem until you start noticing that capability isn't building the way you'd expect.
Leaders can't fix what they can't see. And right now, most don't have a way of seeing it - because they're looking at adoption metrics that tell them nothing about the quality of the collaboration happening underneath.
The centaur evaluation framing from Stanford is useful here. Stop asking whether your people are using AI. Start asking whether your human-AI teams are producing better outcomes than either could alone - and whether your people are growing through the process, not just getting faster.
That's a different conversation to have with your teams. It starts with being able to describe what good looks like - and knowing what to look for.