The advent of Generative Large Language Models (LLMs) has revolutionized a myriad of business operations, ranging from creating enthralling blog articles to providing tailored customer support. These models, especially when spiced with focused datasets, open a completely new realm of opportunities for addressing real-world issues.
One such use-case that is worth exploring is their application in enriching the intent recognition abilities in modern Chatbots — an essential language comprehension component. Nonetheless, pioneering this function requires meticulous comprehension of individual application scenarios and the specific audience in focus.
Understanding the “Intent Recognition” Process
The journey towards formulating an impeccable intent recognition capability consists of iteratively scrutinizing the system in practice, along with perpetually refining training data to adapt to evolutions in user conversational trends.
The success of the intent recognition feature is significantly anchored on the provision of top-notch user data mirroring their intentions, equipping the LLM to adeptly link language forms to particular intent markers.
Pertinently, it’s of utmost importance that the dataset for training not only embodies the language and tone that resonate with the target audience, but also contains various potential expressions of user intents.
Innovative Approach at Bitext
At Bitext, we have angled our proprietary text generation abilities to derive superior-quality data for customizing cutting-edge LLMs for intent detection. Our interesting journey has made the most of the Bitext Customer Service Open Dataset, freely accessible on our GitHub account, to refine OpenAI GPT-3 for this unique assignment.
Here’s an imaginary example:
Note: This example is fictional, included for demonstration purposes and will need to be replaced with a real example.
At Bitext, we have a ‘food delivery’ intent. When users communicate phrases like “I want to order food”, “I am hungry, let’s order something” or “Let’s get pizza delivered”, the fine-tuned model correctly identifies these as related to ‘food delivery’ intent.
Our research has shown that well-curated language data can deliver an impressive starting accuracy that exceeds 90%.
Data: The Cornerstone of Fine-tuning
The Bitext Customer Service Open Dataset contains 27 user intents, typically ranging from simplistic functions like, “place an order,” to intricate interactions such as, “get a refund” and, “reach out to customer service”. The dataset embraces both unique intents and closely-related intents to push the boundaries of our machine learning model and, in essence, build a sturdier system.
To comprehensively examine the influence of data size on model performance, we embarked on generating subsets of data with varying utterances per intent — from 1 to a whopping 5000.
Experiments: Setting Our Proposals to Test
For our experiments, we engaged OpenAI’s ada model, renowned for its efficiency and critical task performance when juxtaposed with other models. We fine-tuned several instances of the ada model using distinct subsets of data alongside specific epochs and batch sizes.
Moreover, a well-rounded validation set with about 1000 examples and a test set with about 1100 examples ensured accuracy.
A Unique Approach to Test Set Creation
Creating the test set was a two-pronged approach (manual and through GPT-3.5 generation) governed by rigorous review parameters to eliminate irrelevant or hallucinated content.
Insights from the Results — A Snapshot
To highlight the correlation between training data size and model performance, experimental runs were conducted using data subsets ranging from 1 to 5000 utterances per intent. The following is a brief overview of accuracy outcomes fetched by each subset.
Data and Corresponding Accuracy (%)
Wrapping Up: Transforming the Chatbot Experience
Our primary focus rests in improving Chatbots’ intent recognition efficiency using generative LLMs. Fine-tuning OpenAI GPT-3 in alignment with our Customer Service Open Dataset, we could straightaway attain a staggering 90% accuracy. Moreover, with 250 to 1000 utterances per intent, the accuracy inched up to over 92%.
We anticipate that future Chatbots, equipped with this advanced intent recognition capability and empowered by user feedback via repetitive assessments and training, will exceed 95% accuracy.
In essence, our novel findings show the immense potential of generative LLMs in augmenting intent recognition within Chatbots, catalyzing efficient and factual user interactions. The constancy of these model enhancements spurred by real-world usage and user feedback pledges even more powerful performance gains in the days to come. Please share your thoughts or experiences in the comments; we’d be delighted to engage.