The impact of lemmatization for morphologically-rich languages Abstract Are there ways to improve the performance of language models, beyond increases in size -both in the number of model parameters or in the size of training corpora? Our benchmarks show that another...
The case for evaluation of NLU platforms Synthetic image and video have proven to be a big success for cost-cutting. Synthetic text is following suit: tabular data (that is the data organized in a table with rows and columns) is becoming mainstream already, and the...
What Is Synthetic training data? Synthetic Training data is the data that is used to train an NLU engine. An NLU engine allows chatbots to understand the intent of user queries. The training data is enriched by data labeling or data annotation, with information about...
Arabic is a complex language for NLP tasks, even for simple ones like lemmatization. There are several reasons for this: Arabic creates words based on roots: for example, the word کتاب (kitab, “book”) is derived from ك ت ب (k t b). Many related words are derived from...
Everything looks promising in the world of bots: big players are pushing platforms to build them (Google, Amazon, Facebook, Microsoft, IBM, Apple), large retail companies are adopting them (Starbucks, Domino’s, British Airways), press is excited about movies becoming...
Recent Comments