Going beyond RAG for customer support conversations
Chasing our ambition for superhuman quality support
In a recent post, we discussed how Retrieval Augmented Generation (RAG) is just one piece of the puzzle in building customer support AI agents. Today, we'll a dive a bit deeper into these issues—at Gradient Labs, we want to empower our customers to deliver superhuman AI support experiences to their end users.
The key insight? Real world support conversations are a lot messier than what RAG can usefully handle.
Human support agents seek to understand intent
RAG agents rely on semantic similarity search to ground their answers in truth. While this helps to avoid hallucinations, they often provide only "best effort" replies as long as enough semantically similar content is found. In effect, a lot of RAG is tuned for a single question and reply rather than a long-form conversation.
Human agents, however, are expected to understand the true, specific intent behind a customer’s query. They have a non-fuzzy world model of the company’s products, processes, and related user activities, which then allows them to pattern match customer’s question to that world model in a more structured and precise way. For instance, if a customer says, “Why can’t I pay?”, it’s easy for a human to reason about what’s missing in that query and to instinctively ask for clarification about the missing, implied, information in order to help effectively.
Taking customer queries at face value is inaccurate
Customers can make statements that are technically inaccurate, like saying, “My card is broken,” when they mean a transaction was declined. Or they might say, “Somebody took my money,” when they simply forgot about a prior transaction. In extreme cases, customers might try to be deceitful, particularly in fraud scenarios.
RAG approaches take these statements at face value, often resulting in irrelevant or harmful responses.
Implicit knowledge from human agent’s experience is invaluable
Even the best-maintained knowledge bases have gaps and can become outdated—permanently, when a product changes, or temporarily, when a marketing campaign is being run for a day.
Human agents rely heavily on their shared experience from handling numerous cases, from their training, and from company-wide announcements that may not be documented. Experienced human agents can quickly deduce common patterns, like symptom X usually leading to outcome Y, even if X can theoretically arise from other causes. Standard RAG agents lack this practical shortcutting ability and tend to perform poorly in troubleshooting scenarios.
While the underlying documents could be updated with a lot of effort to include such information, there are also other approaches, as outlined in a research paper from Google on medical diagnosis. We’ve found that AI agents can extract a lot of value from reading historical conversations and building their own facts. One needs to be particularly careful with this approach, however, in order not extract wrong or irrelevant information.
Complex scenarios
Customers often describe complex situations with partially irrelevant details where human agents first identify “the crux of the issue” before responding. For example: “I was travelling abroad last week and paid my hotel with my card. They charged £1000 and said £300 would be refunded. I had an additional restaurant bill of £100, but they refunded only £150.”
We have seen first hand that standard RAG agent approaches do not reply anything helpful in such situations and can cause confusion and frustration. While the retrieved documents might provide the relevant context on what to expect with hotel card reservations, RAG agents will not try to apply appropriate reasoning techniques by default. To solve such cases they need to recognise such cases explicitly and switch to a different “reasoning mode” internally. To a human agent on the other hand it’s immediately obvious that the customer was expecting a refund of £200 but has received only £150.
Putting it all together
Real-world customer support involves multi-turn conversations, unlike the single-turn question-answer pairs typical of RAG. These conversations progress through multiple phases. They usually begin by understanding the “real” customer intent, which might involve asking clarifying questions to grasp the situation correctly before providing an answer. Even after providing an answer, the conversation often continues, with customers seeking further clarification or adding more details. The multi-turn agent must navigate these phases fluently to deliver a magical support experience, akin to the best human agents.
Until we address these challenges, AI-driven conversations will feel deceptively good but will be riddled with issues due to a lack of real context understanding.
We envision a future with superhuman quality support experiences. If you’re excited about that vision and the challenges ahead that we’ve outlined, we’d love to hear from you!