The advent of Generative Large Language Models (LLMs) has revolutionized a myriad of business operations, ranging from creating enthralling blog articles to the brilliant classification of sentiments across statements. These models, when spiced with tailored datasets, unfold a realm of limitless opportunities for addressing niche real-world implementations. 

One such worth-exploring use-case is their application in enriching the intent recognition abilities in modern Chatbots — a quintessential language comprehension component. Nonetheless, pioneering this function requires meticulous comprehension of individual application scenarios and the specific audience in focus. 

Understanding the Intent Recognition Process 

The journey towards formulating an impeccable intent recognition capability comprises of iteratively scrutinizing the system in practice, alongside perpetually refining training data to embody evolutions in user conversational trends. 

The success of the intent recognition feature is significantly anchored on the provision of top-notch user data mirroring their intentions, equipping the LLM to adeptly link language forms to particular intent markers. 

Pertinently, it’s of utmost importance that the dataset for training not only oozes the language and tone that resonate with the target audience but also encapsulates various potential expressions of user intents. 

Innovative Approach at Bitext

We at Bitext angled our proprietary text generation skill to derive superior-quality data for customizing cutting-edge LLMs for intent detection. Our interesting journey entailed the utilization of the Bitext Customer Service Open Dataset, freely accessible on our GitHub account, to refine OpenAI GPT-3 for this unique assignment.

 Here’s an imaginary example:

Note: This example is fictional, included for demonstration purposes and will need to be replaced with a real example.

At Bitext, we have a ‘food delivery’ intent. When users communicate phrases like “I want to order food”, “I am hungry, let’s order something” or “Let’s get pizza delivered”, the fine-tuned model correctly identifies these as related to ‘food delivery’ intent. 

From our explorations, we discovered that well-curated language data can deliver an impressive starting accuracy exceeding 90%.

Data: The Cornerstone of Fine-tuning

The Bitext Customer Service Open Dataset embodies 27 user intents, typically ranging from simplistic functions like ‘place an order’ to intricate interactions such as ‘get a refund’ and ‘reach out to customer service.’ The dataset embraces both unique intents and closely-related intents to push the boundaries of our machine learning model and build a sturdier system, in essence. 

To comprehensively examine the influence of data size on model performance, we embarked on generating subsets of data with varying utterances per intent — from 1 to a whopping 5000.

Experiments: Setting Our Proposals to Test 

For our experiments, we engaged OpenAI’s ada model, renowned for its efficiency and admirable critical task performance when juxtaposed with other models. We fine-tuned several instances of the ada model using distinct subsets of data alongside specific epochs and batch sizes.

Moreover, a well-rounded validation set and a test set with about 1000 and 1100 examples respectively ensured accuracy. 

A Unique Approach to Test Set Creation 

Creating the test set was a two-pronged approach — manual and through GPT-3.5 generation, governed by rigorous review parameters to eliminate irrelevant or hallucinated content. 

Insights from the Results — A Snapshot

To highlight the correlation between training data size and model performance, experimental runs were conducted using data subsets ranging from 1 to 5000 utterances per intent. Here’s a succinct representation of accuracy outcomes fetched by each subset. 

Data and Corresponding Accuracy (%)

 

Fine-Tuning-GPT-3-for-Intent-Detection-bitext-img1

CTA-Download-Research

Wrapping Up: Transforming the Chatbot Experience 

Our primary focus rests in ameliorating Chatbots’ intent recognition efficiency using generative LLMs. Fine-tuning OpenAI GPT-3 in alignment with our Customer Service Open Dataset, we could straightaway attain a staggering 90% accuracy. Moreover, with 250 to 1000 utterances per intent, the accuracy hovered close to 92%.

We anticipate future Chatbots, equipped with this advanced intent recognition capability and empowered by user feedback via repetitive assessments and training, will exceed 95% accuracy.

In essence, our novel findings present the immense potential of generative LLMs in augmenting intent recognition within Chatbots, catalyzing efficient and factual user interactions. The constancy of these model enhancements spurred by real-world usage and user feedback pledges even heftier performance gains in the days to come. Please share your thoughts or experiences in the comments; we’d be delighted to engage.

Sharing is caring!