Getting to a ‘thank you’ with an AI agent
Creating a natural and helpful conversation with an AI agent requires more than just numbers.
The world is full of metrics
Every launch of a next-generation foundation model is always accompanied by a plethora of metrics: how well the new LLM does, compared to its peers, on a series of tasks that, together, embody a notion of ‘performance.’ Other than by direct comparisons (with leaderboards like LMSys), this is the best way that researchers currently have to calibrate how one LLM should rank compared to another.
The customer support automation arena is not dissimilar. Each automation service is often described in terms of volume, deflection, resolution, and the average customer satisfaction scores it achieves. Together, these capture support automation ‘performance’ from a business-oriented perspective, and outside of a direct comparison at least enable a high-level distinction of sorts across multiple vendors.
Each support conversation is a unique moment
High-level metrics will always hide the low-level, case-by-case, experiences. One facet that remains elusive in how we measure things—either at the LLM or support automation level—is a concept of “human level” interaction. That is, given two completions, bots, or agents that spit out some text, both of which are technically correct or even semantically similar, how can we tell that one is a more natural, fluent, and seamless response to give over the other? Perhaps both of them would have lead to a deflected customer, but (in light of how important customer support is to a company’s brand) how can we make it a better experience?
At Gradient Labs, we’ve been thinking deeply about these questions, less so from the perspective of creating new numbers but more from the point of view of shaping a better experience for our design partner’s customers. Primarily, we do so by reading a huge number of customer support chats—both between customers and human agents, and when customers talk to our AI agent. Our ongoing discussions about these currently bring us to three insights, which we describe below.
Today’s chatbots often make customers act robotically
A wide range of customer support automation could be described as “best effort.” These systems will answer every single customer query by throwing some kind of information at the customer and forcing them to wade through it. They might have workflows that customers must follow to the letter in order to get anywhere, or screw up many times before an escape hatch is offered. Some even encode what seems like a level of keyword matching (“you must type ‘talk to a human’ to be transferred”).
Interestingly, customer replies in these settings strike us as very robotic. “Yes.” “No.” “Talk to a human.” It’s as if the customers are bending their communication style to try and speak in the limited language understood by the bot. It can work, for simple cases: as soon as any nuance appears, or even if the customer wants a clarification, these tend to fall over. But, ultimately, having customers talk in a specific, unnatural way seems to be a symptom of people trying to find their way around poor automation.
Having seen these in action, these systems will, by definition, have a high response rate and will likely have a high deflection rate—many customers don’t figure out how to speak robotically enough to resolve their own problem, and so they give up.
Tools to speed up support staff make human agents sound robotic
A method that we use frequently at Gradient Labs is to compare AI and human replies, side by side. Our initial opinion was that the human response should be considered a “gold standard” for what the AI could achieve. However, we ironically found that (in larger organisations) replies from human agents can also not result in a fluid, natural conversation.
One reason for this is that support staff performance is typically evaluated in ways that implicitly encourages transactional behaviour. If Mike and Alice must take on X support chats per hour, the easiest way to achieve their targets is to reach for the best matching canned response in each turn. This can work, in simpler cases—where pre-written responses are good enough—but as soon as there is any nuance or ambiguity in customers’ intents, things go astray.
At its extreme, we know of cases where human agents have been accused by customers of being bots, when in practice nothing more than canned responses were being sent.
Getting to a “thank you”
There are a lot of qualities that we attend to when making our AI agent more natural-sounding. This goes beyond the standard expectations of AI agents, like making sure that informational replies come from documented sources. It includes more qualitative aspects: replies shouldn’t be unnecessarily verbose, shouldn’t ask too many questions at once, shouldn’t ask about things that have already been said (but sometimes should ask for confirmation about them), should only be apologetic when it’s relevant, and many more. And one way that we see our AI agent get it right is when customers go so far as saying “thank you” for the help they have received.
Importantly, this is not about obfuscating that the customer is talking to an AI agent or masquerading it as a human—this is about making the AI agent so easy to talk with that it feels natural to thank it. It’s somewhat tricky to describe without speaking about the details, but we’ll close with two anecdotes of this in action.
Our AI agent had a chat with a customer that was about a fairly complex topic—understanding which statement month a particular transaction should appear on. While diagnosing the issue, the customer mentioned it is their first month with the service, in the context of not having a billing statement history. Our AI agent’s next reply acknowledged that by welcoming them to the company as part of its answer to the customer’s actual question and then worked with them to a resolution. And it ended in a thank you:
In a separate instance, a customer chatted to our AI agent about a delayed payment. The AI agent clarified details about the payment and then not only used relevant sources to create a reply, but intermingled the context given previously in the conversation with the right knowledge to give the customer a tailored response about about when that payment should be visible. It did not reach for a canned-style response “all payments take 1-2 days” or just a directly say “Thursday” without explanation, it said something akin to “our payments typically take 1-2 days to clear, which means Thursday in your case.” It invited the customer to reach back out if that didn’t come to fruition. And, again, it ended with a thank you:
Meeting metrics with amazing experiences
The discussion above does not mean that we’re doing away with metrics—after all, we’re a technical team and need to summarise our overall progress in a way we can share 📈. But every support conversation is a unique moment for a customer; their reach-out is much more than a number to them. Creating a fluid, helpful, and easy conversation with an AI agent needs much more than numbers, and making it feel natural to thank the AI is one way that we’re seeing it coming into fruition.