Getting GPT to Answer Consistently and with Style

GPT, like other generative models, tends to provide disparate answers for the same question. Sometimes answers vary only slightly, but sometimes they are very inconsistent or even contradictory.

Behavior of Standard GPT-3.5

Below are a few different answers provided by GPT 3.5 for the question “How much does it cost to open an international banking account?”:

Example 1 from GPT-3.5 Standard. Response 1 with Style ‘Objective, detailed, and list-oriented’ and Summary ‘Varied fees such as opening, maintenance, and transaction costs.’
Example 2 from GPT-3.5 Standard. Response 2 with Style ‘Explanatory, concise, and advisory’ and Summary ‘Costs vary by bank, account type, and required services.’
Example 3 from GPT-3.5 Standard. Response 3 with Style ‘Explanatory, concise, and advisory’ and Summary ‘Bank choice, account type, and services affect costs.’

Behavior of Bitext Fine-tuned GPT-3.5

Having control of what GPT answers is technically called fine-tuning; fine-tuning is about “taming the beast” and making it answer what you need in your particular set up. There are different approaches to fine-tuning, typically based on the software side: parameters such as learning rate, weight decay, batch size, number of epochs, momentum, and dropout rate play crucial roles.

We’ve approached the problem using data: can we make GPT answer some questions properly? By properly, we mean in the right tone and corporate style, with the right content, and without glitches/contradictions/hallucinations.

For this experiment, we’ve generated a dataset verticalized for the banking sector (retail banking, loans, wealth management, etc.) and customized for Customer Support. The dataset we used is available at Hugging Face for free here.

We call this dataset hybrid because it has all the advantages of synthetic text (low cost, high scalability and speed, privacy, and corporate identity) while avoiding the disadvantages (hallucination and bias). It contains 30M tokens which have been infused with anonymized PII (Personally Identifiable Information), establishing an empathic connection with users while being tailored to the sector’s target demographic. Below is a sample from the dataset:

“how much does it cost to open an account for international banking.”
“Thank you for considering opening an account for international banking! I can provide you with the information you need.
The cost of opening an account for international banking may vary depending on the financial institution and the specific account type you choose. To get the most accurate and up-to-date information. I recommend [….]
You can reach our customer service team at {{Customer Support Phone Number}} or through the Live Chat on our website at {{Company Website URL}}. They are available {{Customer Support Working Hours}} to […]
We look forward to the possibility of serving you and providing a seamless international banking experience!”

With this data, we fine-tuned GPT-3.5 in its standard version with 4K tokens to embody the knowledge and behavior of a customer support representative specialized in banking services and products. This enhancement enables GPT-3.5 to respond empathically and accurately to a range of user inquiries, whether complex or straightforward, within the banking sector. The fine-tuning process aims at understanding the users’ intent to ensure a high level of satisfaction.

These are the answers provided by the data-finetuned version of GPT 3.5 for the same question ‘How much does it cost to open an international banking account?’:

Example 4 from Fine-Tuned GPT-3.5. Response 4 with Style ‘Empathetic, informative, and action-oriented’ and Summary ‘Initial deposit, monthly fees, and tailored guidance provided.’
Example 5 from Fine-Tuned GPT-3.5. Response 5 with Style ‘Appreciative, advisory, and customer-focused’ and Summary ‘Varied account costs, dedicated support, and emphasis on tailored assistance.’
Example 6 from Fine-Tuned GPT-3.5. Response 6 with Style ‘Grateful, comprehensive, and advisory’ and Summary ‘Detailed fee structure, added services, and importance of understanding terms.’.

As you can see in the screenshots, the behavior of GPT can be modified (fine-tuned) for specific purposes. In our case, it’s for Customer Support Banking.

When the questions used for testing are part of the training data, it’s less surprising that the answers are positively influenced by the training.

It’s a bit more surprising when using questions that where not used in the training, like the one we used “how much does it cost to open an account for international banking”.

As we can see, they all:

  • Provide correct content (according to training).
  • Follow the same structure (company policy for Customer Support).
  • They are internally consistent, eliminating the “disparateness” factor we saw in the standard GPT answers.


With data-centric fine-tuning, we get the best of two worlds:

  • All the creativity of the generative capabilities of GPT.
  • Accuracy and consistency previously possible only with automated responses.

Sharing is caring!