Anthropic studied 1 million AI conversations. One finding changes how I think about my team.

Anthropic’s latest Economic Index report came out a few weeks ago and I almost skipped past it. Glad I did not.

There is one part in it that was very relevant to me and my team. They tracked users across different tenure levels. People who have been on Claude 6+ months vs newer users. Same tasks. Same models. Same controls for country, language, everything.

The experienced users have a 10% higher success rate. And when they controlled for task type the gap was still there. 3-4 percentage points even when comparing people doing the exact same work.

The gap is widening. Not closing.

Why this hit different

This hit different for me because I have been watching the same pattern play out on my own team. We rolled out AI tools, tracked adoption, hit 80%+ usage. Felt like mission accomplished.

But then you actually watch people work. And the variance is massive. Some engineers accept the first output. Some push back, iterate, throw away bad output and start fresh. Same tools, wildly different results.

Anthropic’s data puts a name on what I was seeing. It is not about adoption. It is about learning curves. The experienced users are not writing better prompts. They are exercising better judgment. They know when AI is confidently wrong. They know when to trust it and when to override it.

Level 2 metrics

Most teams have done a great job measuring level 1. Adoption. Usage rates. Time saved. That is the foundation and it matters.

But I think there is a level 2 that almost nobody is tracking yet:

How often does AI-generated code get reverted after merge?
How many production bugs trace back to unreviewed AI output?
What is the accept vs reject ratio on AI suggestions across seniority levels?

That last one is the most interesting to me. If your senior engineers are rejecting AI output at a significantly higher rate than your juniors, that is not a problem. That is the signal. It means they have the judgment to catch what AI gets wrong.

Adoption was the right first step. Judgment is the natural next one. And the teams that figure out how to measure it are going to pull ahead fast.

ai engineering-culture research