We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
For call center applications, dialogue state tracking (DST) has historically served as a way to determine what a caller wants at a given point in a conversation. But in the real world, the work of a call center agent is much more complex than simply recognizing intents. Agents often have to look up knowledge base articles, review customer histories, and inspect account details all at the same time. Yet none of these aspects is accounted for in popular DST benchmarks. A more realistic environment might use a “dual constraint,” in which an agent needs to accommodate customer requests while considering company policies when taking actions.
In an effort to address this, AI research-driven customer experience company Asapp is releasing Action-Based Conversations Dataset (ABCD), a dataset designed to help develop task-oriented dialogue systems for customer service applications. ABCD contains more than 10,000 human-to-human labeled dialogues with 55 intents requiring sequences of actions constrained by company policies to accomplish tasks.
According to Asapp, ABCD differs from other datasets in that it asks call center agents to adhere to a set of policies. With the dataset, the company proposes two new tasks:
- Action State Tracking (AST), which keeps track of dialogue state when an action has taken place during that turn.
- Cascading Dialogue Success (CDS), a measure of an AI system’s ability to understand actions in context as a whole, which includes the context from other utterances.
AST ostensibly improves upon DST metrics by detecting intents from customer utterances while taking into account agent guidelines. For example, if a customer is entitled to a discount and requests 30% off, but the guidelines stipulate 15%, it would make 30% an apparantly reasonable — but ultimately flawed — choice. To measure a system’s ability to understand these situations, AST adopts overall accuracy as an evaluation metric.
Meanwhile, CDS aims to gauge a system’s skill at understanding actions in context. Whereas AST assumes an action occurs in the current turn, CDS first predicts the type of turn (e.g., utterances, actions, and endings) and then its subsequent details. When the turn is an utterance, the detail is to respond with the best sentence chosen from a list of possible sentences. When the turn is an action, the detail is to choose the appropriate values. And when the turn is an ending, the system should know to end the conversation, according to Asapp.
A CDS score is calculated on every turn, and the system is evaluated based on the percent of remaining steps correctly predicted, averaged across all available turns.
Improving customer experiences
The ubiquity of smartphones and messaging apps — and the constraints of the pandemic — have contributed to increased adoption of conversational technologies. Fifty-six percent of companies told Accenture in a survey that conversational bots and other experiences are driving disruption in their industry. And a Twilio study showed that 9 out of 10 consumers would like the option to use messaging to contact a business.
Even before the pandemic, autonomous agents were on the way to becoming the rule rather than the exception, partly because consumers prefer it that way. According to research published last year by Vonage subsidiary NewVoiceMedia, 25% of people prefer to have their queries handled by a chatbot or other self-service alternative. And Salesforce says roughly 69% of consumers choose chatbots for quick communication with brands.
Unlike other large open-domain dialogue datasets, which are typically built for more general chatbot entertainment purposes, ABCD focuses on increasing the count and diversity of actions and text within the domain of customer service. Call center contributors to the dataset were incentivized through cash bonuses, mimicking the service environments and realistic agent behavior, according to Asapp.
Rather than relying on datasets that expand upon an array of knowledge base lookup actions, ABCD presents a corpus for building more in-depth and task-oriented dialogue systems, Asapp says. The company expects that the dataset and new tasks will create opportunities for researchers to explore better and more reliable models for task-oriented dialogue systems.
“For customer service and call center applications, it is time for both the research community and industry to do better. Models relying on DST as a measure of success have little indication of performance in real-world scenarios, and discerning customer experience leaders should look to other indicators grounded in the conditions that actual call center agents face,” the company wrote in a press release. “We can’t wait to see what the community creates from this dataset. Our contribution to the field with this dataset is another major step to improving machine learning models in customer service.”
VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.