ULTRATHINK
Solutions
← Back to The Signal
Strategy February 23, 2026

Anthropic’s AI Fluency Index: What It Means for Enterprise AI

Anthropic’s research proves individual AI fluency matters. But for enterprises, it’s organizational fluency that closes the Execution Gap.

Nick Amabile
Nick Amabile
Founder & CEO

There’s a debate happening in public right now about whether AI makes us smarter or lazier. One side says that outsourcing your writing to AI is outsourcing your thinking. The other side—my side—says that using AI as a true thought partner, starting with questions rather than prompts, critically evaluating its responses, redirecting and pushing back, produces co-created work that’s authentically yours and better than either could produce alone.

Anthropic just settled this debate with data.

Their AI Fluency Index, published February 2026, analyzed 9,830 Claude.ai conversations in a single week. What they found confirms what we’ve been building toward at Ultrathink: AI success isn’t about the model. It’s about how your people work with it.

But here’s what Anthropic didn’t say—and what matters far more for enterprise leaders: individual AI fluency is necessary but not sufficient. You can have a building full of fluent individuals and still fail at AI. Because the Execution Gap™ isn’t an individual skills gap. It’s an organizational capability gap.

What is AI fluency?

AI fluency is the ability to collaborate effectively with AI tools—not just use them, but use them well. Anthropic’s 4D AI Fluency Framework identifies 24 specific behaviors that characterize effective human-AI collaboration, grouped across four competency domains: description, delegation, discernment, and diligence.

The distinction matters. AI literacy is knowing what AI can do. AI fluency is knowing how to work with it. Iterating on outputs. Questioning reasoning. Identifying what’s missing. Setting the terms of collaboration. Literacy is passive knowledge. Fluency is active practice.

And the gap between the two is where most enterprise AI programs die.

The Iteration Imperative: your AI doesn’t learn because your people don’t push back

The strongest signal in Anthropic’s data: iteration is the single biggest driver of AI fluency. 85.7% of conversations showed iteration—users building on prior exchanges rather than accepting the first response. Those users exhibited 2.67 additional fluency behaviors on average. They were 5.6x more likely to question AI reasoning and 4x more likely to identify missing context.

The enterprise translation is uncomfortable. Most organizations build AI systems designed for one-shot delegation: submit a request, get an answer, move on. There’s no feedback loop. No mechanism for users to push back, refine, or redirect. The system doesn’t learn from its mistakes because nobody tells it when it’s wrong.

We call this the Learning Gap—the first of five hidden barriers behind the Execution Gap. AI systems that can’t learn from their own output aren’t production-grade. They’re static tools that degrade as the world around them changes.

The infrastructure for iteration exists. Feedback capture native to the workflow. Evaluation datasets built from real user corrections. Observability that tracks not just what the model said, but what happened after—did the user accept it, edit it, or reject it? We built this into our own agent operations. When we put OpenClaw into production, the observability layer was what turned it from a demo into a system we could trust with real workflows.

If individual users who iterate are 5.6x better at spotting errors, imagine what happens when your production system has no feedback loop at all. That’s not a skills gap. That’s a design failure.

But the infrastructure is only half the story. The other half is culture. If your people treat AI outputs as final answers instead of starting points, no amount of tooling will fix the learning loop. Anthropic’s data proves this at scale: the organizations that will win are the ones where iteration is the default behavior, not the exception.

The Artifact Paradox: polished outputs are your biggest blind spot

This finding should keep enterprise leaders up at night. When AI generates polished artifacts—code, documents, reports, dashboards—users become better directors but worse evaluators. Goal clarification increases by 14.7 percentage points. Format specification rises by 14.5pp. But missing context identification drops by 5.2pp. Fact-checking decreases by 3.7pp. Challenges to AI reasoning decline by 3.1pp.

The more professional the output looks, the less people scrutinize it.

We’ve been writing about this problem under a different name: the verification tax. When your AI system produces something that looks finished, your team inherits an invisible cost—the human oversight required to catch what the model got wrong. And Anthropic just proved that polished outputs systematically reduce that oversight.

This is how enterprises end up in Pilot Purgatory. The demo looked great. The executive committee was impressed. The output was polished and professional. Nobody asked whether it was correct, complete, or safe to operationalize. Three months later, the pilot is still a pilot. As we wrote in Stop Asking for a Chatbot: vibes don’t run in production.

What production-grade looks like

Production-grade AI systems solve the Artifact Paradox with transparency, not polish. Evidence panels that show why the system made a recommendation. Confidence scores that flag uncertainty. Audit trails that let you trace any output back to its inputs. Run history that makes every decision reviewable. If you’re stuck at Stage 2 on the AI Maturity Curve—rich in demos, poor in production value—the Artifact Paradox is likely one of the reasons. The difference between a demo and a production system isn’t features—it’s governance.

Human-AI collaboration beats delegation—and it’s not close

Anthropic found that users who treat AI as a thought partner—augmentative use—exhibit more than double the fluency behaviors of those who delegate tasks entirely. Collaboration doesn’t just feel better. It produces measurably better outcomes.

I’ve lived this firsthand. I use AI not as a crutch, not abdicating the thinking, but as a true thought partner. I start with questions, not prompts. I critically evaluate responses, ask follow-up questions, redirect when the reasoning goes sideways. The final product is co-created, but the thinking is mine. Anthropic just put numbers on that experience: augmentative users are fundamentally more effective than delegative ones.

Now scale this to the enterprise. Most organizations frame AI adoption as automation—“here, AI, do this task for me.” That’s the delegative pattern. It’s the pattern that produces the fewest fluency behaviors and the worst outcomes.

The Outcome Partnership model is the organizational version of augmentative use. We don’t do the work for you. We work with you—the Synapse Cycle™ starts with your business problem, not our technology. We build systems where humans and AI collaborate on workflows, with clear ownership, feedback loops, and governance. The success fee is tied to your KPIs, not our hours. As we argued in The Billable Hour is Dead: when your partner makes more money the longer a project takes, your incentives aren’t aligned.

When OpenAI launched Frontier with Forward Deployed Engineers at $10M minimums, they chose the delegative model: embed engineers, build for you, charge for time. Anthropic’s own research suggests that model produces worse outcomes than collaboration. The organizations that win at AI won’t be the ones that outsourced it. They’ll be the ones that learned to do it.

The 30% problem: governance is just collaboration terms at scale

One of the quieter findings: only 30% of users explicitly set collaboration terms with AI. Things like “push back if my assumptions are wrong” or “explain your reasoning before answering.” The 70% who don’t set these expectations get measurably worse results.

At the individual level, this is a missed optimization. At the enterprise level, it’s the governance gap.

Governance isn’t bureaucracy. It’s the organizational equivalent of telling AI how you want to work together. What are the boundaries? What decisions require human approval? What’s the escalation path when something goes wrong? What data can the system access, and what’s off-limits?

Most enterprises skip this step. They deploy AI tools with default settings, generic prompts, and no explicit operating parameters. That’s the organizational equivalent of the 70% who never set collaboration terms—and then wonder why the results are mediocre.

The Synapse Cycle™ exists because structured discovery—understanding the business problem, mapping the workflow, defining success criteria—is how you set collaboration terms at the organizational level. It’s not optional prep work. It’s the foundation that determines whether your AI program produces value or just produces output. And the infrastructure-level governance that controls what tools agents can access, what data they can see, and how they route requests—that’s collaboration terms encoded into architecture.

From individual skill to organizational capability

Anthropic measured individuals. Enterprises need systems.

Individual AI fluency is necessary. But unless you build organizational infrastructure around it, fluent individuals will produce isolated wins that never compound. The four dimensions of organizational AI fluency map directly to the machinery we’ve been building:

Structured feedback loops

Individual fluency means iterating on a conversation. Organizational fluency means building evaluation pipelines, feedback capture, and continuous improvement into every AI-powered workflow. The learning loop runs at the system level, not just the interaction level. This is what Ultrathink Axon™’s evaluation layer provides—not just observability after the fact, but a structured path from observation to improvement.

Governance that enables, not blocks

Individual fluency means setting collaboration terms. Organizational fluency means policy controls, approval workflows, audit trails, and risk-tiered permissions—baked into the platform so teams can move fast within clear boundaries.

Measurement tied to the P&L

Individual fluency means knowing when an output is good. Organizational fluency means tying every AI initiative to a workflow KPI the CFO recognizes—handle time, throughput, resolution rate. Not “engagement” or “adoption.” The Outcome Partnership ties our success to these metrics. When you win, we win.

An operating model that compounds learning

Individual fluency means getting better at using AI over time. Organizational fluency means each use case makes the next one faster and cheaper—the platform matures, the evaluation datasets grow, the governance processes refine, and the organizational trust deepens. This is the architecture behind the Modern AI Application Stack—thirteen layers that turn individual interactions into institutional capability.

The fluency gap is the execution gap

Anthropic proved that 85.7% of individual users already iterate. 70% of them just need structure—explicit collaboration terms—to get dramatically better results. The technology isn’t the bottleneck. The organizational infrastructure is.

That’s the Execution Gap. Same problem, different lens.

The 30% of users who set collaboration terms with AI outperform the 70% who don’t. Now imagine what happens when your entire organization operates that way—with structured feedback loops, production governance, P&L-aligned measurement, and an operating model that turns every interaction into institutional learning.

That’s the difference between a company that uses AI and a company that’s fluent in it. And right now, that difference is a moat.

Most organizations are stuck at individual fluency—talented people using AI well in isolation, without the organizational infrastructure to compound it. The AI Readiness Assessment identifies where the gap is. Start the conversation to close it.

This is part of our ongoing series on closing the AI Execution Gap for enterprise leaders. For the structural problem, see The AI Execution Gap. For the five failure modes, see Why AI Projects Fail. For why collaboration beats delegation in partner selection, see Build vs. Buy vs. Partner.

Ready to Close the Execution Gap?

Take the next step from insight to action.

No sales pitches. No buzzwords. Just a straightforward discussion about your challenges.