Getting GPT to Answer Consistently and with Style

GPT, and any other generative model, tends to provide disparate answers for the same question, sometimes they are just a bit different, sometimes very different or even contradictory.

Behavior of GPT-3.5 Standard

These are a few different answers provided by the GPT 3.5 for the same question ‘How much does it cost to open an international banking account?’:
Example 1 from GPT-3.5 Standard. Response 1 with Style ‘Objective, detailed, and list-oriented’ and Summary ‘Varied fees such as opening, maintenance, and transaction costs.’
Example 2 from GPT-3.5 Standard. Response 2 with Style ‘Explanatory, concise, and advisory’ and Summary ‘Costs vary by bank, account type, and required services.’
Example 3 from GPT-3.5 Standard. Response 3 with Style ‘Explanatory, concise, and advisory’ and Summary ‘Bank choice, account type, and services affect costs.’

Behavior of Fine-tuned GPT-3.5 by Bitext

Having control of what GPT answers is technically called fine-tuning; fine-tuning is about “taming the beast” and making it answer what you need in your particular set up. There are different approaches to finetuning, typically based on the software side: parameters such as learning rate, weight decay, batch size, number of epochs, momentum, and dropout rate play crucial roles.

We’ve approached the problem using data: can we make GPT to answer properly to some questions? By properly we mean in the right tone and corporate style, with the right content and without glitches/contradictions/hallucinations.

For this experiment, we’ve generated a dataset verticalized for the banking sector (retail banking, loans, wealth management…) and customized for Customer Support. Available at Hugging Face for free on Customer Support Dataset.

We call this dataset hybrid because it combines the advantages of synthetic text (low cost, high scalability and speed, privacy, and corporate identity) without the disadvantages (hallucination and bias mainly). It contains 30M tokens which have been infused with anonymized PII (Personally Identifiable Information), establishing an empathic connection with users while being tailored to the sector’s target demographic. Below is a sample from the dataset:

“how much does it cost to open an account for international banking.”
“Thank you for considering opening an account for international banking! I can provide you with the information you need.
The cost of opening an account for international banking may vary depending on the financial institution and the specific account type you choose. To get the most accurate and up-to-date information. I recommend [….]
You can reach our customer service team at {{Customer Support Phone Number}} or through the Live Chat on our website at {{Company Website URL}}. They are available {{Customer Support Working Hours}} to […]
We look forward to the possibility of serving you and providing a seamless international banking experience!”

With this data, we fine-tuned GPT-3.5 in its standard version with 4K tokens to embody the knowledge and behavior of a customer support representative specialized in banking services and products. This enhancement enables GPT-3.5 to respond empathically and accurately to a range of user inquiries, whether complex or straightforward, within the banking sector. The fine-tuning process aims at understanding the users’ intent to ensure a high level of satisfaction.

These are the answers provided by the data-finetuned version of GPT 3.5 for the same question ‘How much does it cost to open an international banking account?’:

Example 4 from Fine-Tuned GPT-3.5. Response 4 with Style ‘Empathetic, informative, and action-oriented’ and Summary ‘Initial deposit, monthly fees, and tailored guidance provided.’
Example 5 from Fine-Tuned GPT-3.5. Response 5 with Style ‘Appreciative, advisory, and customer-focused’ and Summary ‘Varied account costs, dedicated support, and emphasis on tailored assistance.’
Example 6 from Fine-Tuned GPT-3.5. Response 6 with Style ‘Grateful, comprehensive, and advisory’ and Summary ‘Detailed fee structure, added services, and importance of understanding terms.’.
As you can see in the screenshots, the behavior of GPT can be modified (fine-tuned) for specific purposes, in our case Customer Support Banking.

When the questions used for testing are part of the training data, it’s less surprising that the answers are positively influenced by the training.

It’s a bit more surprising when using questions that where not used in the training, like the one we used “how much does it cost to open an account for international banking”.

As we can see, they all:

  • Provide correct content (according to training).
  • Follow the same structure (company policy for Customer Support).
  • They are consistent among them, eliminating the “disparateness” factor we saw in GPT answers.


Data-centric fine-tuning can get us the best of two worlds:
  • The creativity of the generative capabilities of GPT.
  • The accuracy and consistency of fine-tuning.

Sharing is caring!