ChatGPT has major flaws that prevent it from becoming a useful tool in industries like Customer Experience. That’s what Blake Morgan, a CX expert, published in Forbes recently:

Cons Of ChatGPT For Customer Experience One relevant fragment:

“One of the lauded benefits of ChatGPT is that it offers a new response every time. But that creates an inconsistent customer service experience. Companies can never count on the bot to provide a specific answer, making monitoring customer needs and requests challenging. One of chatbots’ most common use cases is repetitive questions, such as order status or account information. These questions require a set answer, which ChatGPT can’t provide consistently.”

We are writing this post because we have found a solution to the problems Blake raises. Blake points out three major flaws in ChatGPT, #3 being the real challenge in our opinion:

  1. The Information Isn’t Always Accurate
  2. It Doesn’t Have Any Special Training
  3. It Provides Different Answers Every Time

We decided to run an experiment with the goal of making GPT to provide consistent answers. As a first step we generated synthetic text, supervised by human experts, for 3 different purposes:

  • Questions: a set of representative questions on the CX domain, like “I need to return my order”
  • Answers: a set of accurate and structured answers for those questions
  • Variants: around 1,000 variants for each question and answer, to guarantee a comprehensive training

We then integrated the questions-answer pairs in a training dataset, to which we added linguistic tagging. This tagging documents the linguistic reasons behind every variant; for example, “colloquial” is the tag for sentences like “do u wanna send my order asap”.

With this dataset we then fine-tuned an instance of GPT 3.5 and, finally, we evaluated the answers from the two instances:

  • Playground Instance of GPT 3.5
  • Bitext-Trained Instance of GPT 3.5

For question: “I need help to file a consumer complaint”

The Playground Instance of GPT 3.5 provides 2 totally different answers, as Blake anticipated, with high degrees of variability in the contents (although both make sense):

Answer 1: “Please note that I am an AI language model and cannot provide legal advice. The information provided above is for general guidance purposes only. It is always recommended to consult with a legal professional or consumer protection agency for specific advice related to your situation.”

Answer 2: “I can definitely help you with that! To file a consumer reclamation, you can follow these steps: 1. Gather all relevant information…”

The instance of GPT 3.5 trained to be a Customer support agent with Bitext Dataset, however, provides two responses (even three or more) that accurate, matching the training, with the same content and yet some differences in the wording:

Answer 1: “I’m sorry to hear that you’re in a situation where you need to file a consumer reclamation. Your feedback is incredibly valuable to us, and I want to assure you that we take your concerns seriously. To assist you with this process, I recommend reaching out to our dedicated customer support team. They are well-equipped to guide you through the necessary steps and ensure that your reclamation is properly addressed. Please provide me with your contact information, and I will personally ensure that you receive the assistance you need.”

Answer 2: “I’m sorry to hear that you’re facing an issue and need assistance in filing a consumer reclamation. Your feedback is incredibly valuable to us, and we want to ensure that your concerns are addressed promptly and effectively. To begin the process, could you please provide me with the specific details of your reclamation? This will help us better understand the situation and take the necessary steps to resolve it. Rest assured, we are committed to resolving your issue and improving our services based on your feedback. Thank you for bringing this to our attention, and we appreciate your patience as we work towards a resolution.”

AWS blog reports similar experiments and results, fine tuning GPT-J 6B with SEC Filling Data

Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial Data 

As a next step, we plan to test if this strategy of fine-tuning using synthetic text works also for non-trained questions, i.e. for questions which are not present in the training dataset. This could be a relevant finding for question answering systems beyond customer service like Siri, Alexa or Google virtual assistants.

Sharing is caring!