The GenAI community has turned to synthetic data to satisfy its hunger for data. However, it’s done so with significant reluctance, even sometimes assuming that synthetic data is equivalent to “fake data”. This article published by Bloomberg is very good expression of this trend:
This is a great article, and definitely worth reading, but with one caveat in our opinion: it seems to assume that GenAI is the only technique available to generate synthetic text. However, there are other approaches, based on rules (symbolic approach) rather than on pre-existing data, that provide alternative solutions.
What are the advantages of the symbolic approach?
  • Data generated is 100% reliable, which means that all generated variations keep 100% semantic integrity with the seed sentence (the seed sentence being the initial text for which we request variants)
  • This implies that the data is 100% free of hallucination, bias and PII issues
The reason why this approach reaches 100% accuracy is that the generation of variants is controlled by rules with predictable behavior. Before actually generating any text, the Natural Language Generation (NLG) platform requires a seed sentence and a set of linguistic parameters.

As a result, this approach is a combination of the best of two worlds:

  • Full Automation based on Symbolic NLG Technology
  • Expert parametrization by linguists to verticalize the generated data
This makes this approach particularly valuable for Conversational AI applications, since they are sometimes based on exchanges of short questions and answers.

To illustrate the differences between statistical and symbolic approaches, we’ve run a benchmark to compare data generated by ChatGPT 3.5, by ChatGPT 4 and by Bitext NLG technology.

The Benchmark description is as follows:

  • The seed sentence is: “cancel order”
  • The minimum requested number of generated variants is: 20

The results are a bit surprising:

  • With ChatGPT 3.5, 60% of utterances generated changed the meaning; 12 out of 20 variants are wrong; this is a very high number
  • The minimum requested number of generated variants is: 20
  • With Bitext, 100% of utterances are semantically equivalent to the seed sentence
We include a few examples below (and the full list at the end of the article):
What are the reasons for errors in ChatGPT? Not surprisingly they are related to what’s called “hallucinations”, surfaced in the form of addition of content that didn’t exist in the seed sentence. For example, some of the variants generated include information about a refund, which is not included in the seed sentence:
  • Can I cancel my order and refund the payment, please?

Some other variants just don’t make sense, like this one asking for the reason for the cancellation, the reason should be known by the customer, not by the chatbot; a clear case of hallucination:

  • Can you please cancel my order and provide a reason for the cancellation?
There is also irrelevant content added about fees, confirmation…
We include below the full set of the 20 sentences generated by each of the three systems.

In summary, synthetic text remains one of the top alternatives to solve the pressing data issue that GenAI faces for deployment, particularly at the enterprise level. GenAI-based approaches are not providing the solution that the market needs. However, there are other approaches that generate useful data, like the symbolic approach, although they are bit blurred by the current hype.

Sample results obtained with ChatGPT 3.5, 4 and Bitext NLG:

ChatGPT 3.5 Generation
IDGenerated Variant for Seed "cancel order"OKKO
1Can you cancel my order and refund the payment, please?X
2I need to cancel my order for personal reasons.X
3Can you please cancel my order and provide a refund?X
4I need to cancel my order and would like to know the refund process.X
5Could you please cancel my order and provide a cancellation confirmation?X
6I need to cancel my order and would like to know when I will receive a refund.X
7Can you please cancel my order and let me know if there are any fees?X
8I've decided to cancel my order, can you please help me with that?X
9I need to cancel my order due to a mistake in the order details.X
10Can you cancel my order and confirm the refund amount?X
11I regret to say that I need to cancel my order.X
12Can you please cancel my order and provide a reason for the cancellation?X
13I need to cancel my order and would appreciate your prompt assistance.X
14Can you cancel my order and let me know if I need to take any additional steps?X
15I need to cancel my order and would like to know the status of the refund.X
16Can you please cancel my order and let me know about the refund?X
17I need to cancel my order due to a change in circumstances.X
18Can you cancel my order and provide a confirmation email?X
19I would like to cancel my order and receive a full refund.X
20I need to cancel my order, pleaseX
Total:812
ChatGPT 4 Generation
IDGenerated Variant for Seed "cancel order"OKKO
1I'd like to cancel my order, please.X
2Can you help me cancel the order I just placed?X
3I made a mistake and need to cancel my order.X
4I've changed my mind and would like to cancel my recent order.X
5I need to stop my order before it's shipped, can you help me with that?X
6Please cancel my order, I no longer need the items.X
7Could you please assist me in canceling my purchase?X
8Can you abort the shipment of my order?X
9I want to halt my order, how do I do that?X
10Please undo the order I just made.X
11I don't want to proceed with my order anymore, can you cancel it?X
12Is it possible to terminate my order?X
13I accidentally placed an order. Can it be cancelled?X
14Help me in cancelling the order I placed a few minutes ago.X
15I would like to retract my recent order.X
16I've decided against my order, can we cancel it?X
17I need to nullify my order, could you please assist?X
18How do I go about revoking my order?X
19I need to cancel my purchase, can you guide me?X
20Please, discontinue my order processing.X
Total:128
Bitext NLG
IDGenerated Variant for Seed "cancel order"OKKO
1can you give me information about canceling an order?X
2problems with canceling ordersX
3I do not know how I can cancel the last order I madeX
4I would like to cancel the order I madeX
5I need help canceling an orderX
6how could I get information about cancelling an order?X
7I need help cancelling an order I madeX
8I have problems with cancelling ordersX
9I have a question about canceling an orderX
10problem with cancelling ordersX
11problems with cancelling an orderX
12how could I cancel an order?X
13where could I get information about order cancellations?X
14don't know how to cancel orderX
15I need assistance cancelling an order I madeX
16I am trying to find information about order cancelationsX
17I want help cancelling an order I have madeX
18I need help with cancelling an order I have madeX
19where to cancel an order I made?X
20how do I cancel my orderX
Total:200

Sharing is caring!