One of the most common conversations I have with CX leaders at customer events goes something like this: they've deployed an AI agent, their containment rate is climbing, leadership is happy.
Then I ask what their resolution rate looks like.
Containment rate tells you how many conversations ended without a human, but it says nothing about whether the customer got what they came for. Most AI customer service programs are being evaluated—and optimized—on exactly the wrong outcome.
The result, unfortunately, is that organizations believe their AI customer service programs are working better than they are.
eSky Group, a leading online travel agency managing three global brands across 50+ markets, ran into this exact problem and chose to solve it differently. Within four months of refocusing their measurement framework around automated resolution, their resolution rate increased 17 points, leading to, in part, a 200% ROI.
Here’s the gap they uncovered and the measurement shift that made those results possible.
The customer service metric everyone trusts, but shouldn’t
Containment rate entered the industry lexicon as a reasonable proxy for AI efficiency. If an AI agent handled a conversation from start to finish, that counted as a win.
No escalation, no human agent time consumed.
The problem is that "no escalation" and "resolved" are not the same thing. An AI customer service agent that loops, deflects, or closes a conversation without solving the customer's problem can still register as contained.
From the dashboard, everything looks fine. Businesses are optimizing for what's easy to measure, but customers are judging brands on whether their problems get solved.
The gap isn't hypothetical. It’s why only 24% of consumers report achieving resolution without a human. The other 76%? They escalate, struggle, or give up, even as AI investment continues to climb.
At the same time, 55% of businesses are measuring AI and human agent interactions together, making it structurally impossible to know how their AI is actually performing.
Agentic CX in 2026: What consumers expect and most enterprises miss
There’s a common assumption that consumers are skeptical of AI in customer service. The data says otherwise. Our 2026 report surveyed 2,000 consumers to understand how people actually experience AI in customer service today.
Read reportTeams are left optimizing for a number that doesn’t reflect real customer outcomes, and without the visibility to know the difference. That’s the problem. The next question is: what should they be measuring instead?
How should enterprises evaluate AI’s impact on customer experience?
To understand why eSky’s approach worked, it’s worth starting with how high-performing teams actually measure AI customer service.
Successful measurement of AI customer service starts with a reframe. Automated resolution rate—measured as the percentage of customer inquiries fully resolved by AI without human intervention—is the metric that closes the loop.
Unlike containment, it captures what actually happens inside those conversations. High containment with low resolution means your AI is deflecting, not helping. High resolution means customers are getting what they came for.
But measuring it well requires more than a single number. The strongest AI customer service programs maintain three separate measurement streams:
- AI-only interactions: Conversations handled entirely by an AI agent, from first message to resolution, with no human involvement.
- Human-only interactions: Conversations managed entirely by a human agent, giving you a clean baseline to measure AI performance against.
- Hybrid handoff conversations: Conversations that start with AI then escalate to a human agent, where the quality of the handoff itself is part of the experience.
When CSAT, first contact resolution, and automated resolution rate are tracked separately across each stream, teams can credibly answer questions that blended metrics never surface:
- Is AI performing better or worse than human agents on comparable use cases?
- Where are customers more satisfied after an AI-only interaction versus a human one?
- Which categories of requests are genuinely beyond the AI's current capability?
Without that separation, teams are flying blind.
How eSky redefined AI customer service performance
eSky didn't just swap one metric for another. They rebuilt their entire performance framework around what the customer experiences, not what's convenient to count.

Beyond automated resolution, Lukáš tracks:
- Post-support customer feedback, segmented by AI-only, human-only, and AI-human hybrid interactions,
- Which use cases the AI agent could and couldn't resolve, and
- CSAT across each of those categories, tracked separately.
When a team can see that AI-only CSAT is tracking significantly below human-only CSAT for a specific category, they know exactly where to focus. When they see the inverse—AI outperforming humans on high-volume, transactional queries—they have a business case for expanding coverage in that area.
Neither insight is visible in a blended number. "This level of granularity is exactly what's needed to know which investments are worth the time," Lukáš says.
This approach also transforms executive conversations. Instead of presenting top-line performance numbers, Lukáš walks into board reviews with a breakdown that shows precisely where AI is delivering, where there's more to gain, and what it would take to get there.
That's a more productive meeting and a stronger foundation for growing the program's mandate.
CX leaders guide: Understanding AI agent impact on company objectives
Hard work shouldn’t go unnoticed. This guide helps you clearly demonstrate ACX’s impact—so you can drive bigger investments, gain strategic influence, and position customer service as a key player in company growth.
Get the guideThe AI-to-human handoff is part of the experience, too
Containment rate has another blind spot: it treats every escalation as a failure. That framing misses something important.
For eSky, improving customer experience begins the moment a conversation starts: "By the time they get connected, the experience is already shaped," Lukáš says.
When an AI customer service agent clarifies intent, gathers the right details, and structures a request before escalating, the human agent starts from a stronger position. They're not rebuilding context from scratch or asking questions the customer already answered.
Average handle time drops. The interaction quality improves. And the outcome is better whether the AI resolves the issue end-to-end or hands it off partway through.
A measurement framework that tracks automated resolution separately from containment captures this nuance. One that treats all non-contained conversations as losses doesn't.
That visibility is what allows teams to improve both resolution and handoff quality—intentionally, not incidentally.

From measurement to measurable gains
When eSky committed to adopting Ada's Playbooks—a structured approach to guiding AI agent behavior across specific customer scenarios—the results appeared within a week.
Because they could clearly see where resolution was breaking down, the team knew exactly where to apply Playbooks and where it would have the most impact.
Automated resolution jumped 10 percentage points. CSAT for AI agent interactions increased 19 points in the same window. Those gains compounded into a 17-point overall improvement in automated resolution rate over four months, alongside a 200% ROI.
These aren't incremental numbers. They're the kind of results that shift budget conversations, validate the program at the executive level, and create the mandate to expand—to new channels, new brands, and eventually, new capabilities like voice.

The question every AI customer service program will face
Every team deploying AI for customer service will eventually face the same decision: are we measuring whether conversations ended, or whether customers were helped?
That decision determines how your program improves, where you invest, and whether your AI actually delivers value to customers.
Containment rate is easy to move. Automated resolution at scale—with strong, segmented CSAT across AI-only, human-only, and hybrid interactions—is what agentic customer experience actually looks like in practice, and what separates programs that earn a growing mandate from those that plateau.
The measurement shift is worth making. The eSky story is proof that when you get it right, the results follow, and they compound.
How eSky scaled AI customer service across brands, channels, and markets
When travelers are stranded at the airport, they don't want scripted responses. They want answers. See how eSky built AI that actually resolves issues, delivering a 17-point jump in automated resolution and 200% ROI.
See case study