Minimize text ambiguity with our enterprise grade multilingual lemmatization software.
We have the most complete multilingual morphological dictionaries in the market.
The amount of data available via search engines (Whatsapp, Airbnb or Netflix) grows more and more every day, and if you want your company to make the most out of it, information retrieval systems you use need to connect with similar meanings and different writings (“bicycle”, “bicycles”). This is what a lemmatizer is a must-have tool.
Bitext Lemmatization Service relates words that have the same meaning without being misguided by apparent similar letters. For example, in English, it relates “bicycle” and “bicycles” but not “new” and “news". How?
Lemmatization is the process followed to determine the lemma of each word in a text depending on its intended meaning. The lemma form of a word is used to increase search relevancy and to reduce indexing needs in databases. The main difference with stemming is that lemmatization takes into consideration the context to solve the problem of disambiguation.
Lemmatization software can be used for compact indexing and comprehensive retrieval. Our software can index and search massive volumes of multi-language data accurately and efficiently while maintaining the highest level of data availability and security.
Bitext lemmatization software helps to disambiguate and group words by considering the context. Let’s take the word "book" as an example: depending on the surrounding text it can mean two different things.
“I enjoy booking my trips online, it helps me to save money”: In this case, “booking” means 'reservation', the lemma being the verb “book”.
“I bought three new books last week on my trip to Dublin”: In this case, “books” refers to a novel, the lemma being the noun “book”.
If the algorithm can treat each of those “book” as different words, the error margin will be lower, therefore, we will increase the accuracy of the results even when using a smaller training corpora.
For any eCommerce site, if a user asks for “One flies over the cuckoo’s nest” it will probably not find the intended book if it is using a stemmer. But if you are using Bitext Lemmatization Service it will analyze “fly” and determine its relationship with “flew” and “flown”, and will add these words to the query, finding the book “One Flew Over the Cuckoo’s Nest”.
The software is currently available for over 50 languages: Afrikaans, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bulgarian, Catalan, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Kannada, Kazakh, Korean, Kyrgyz, Macedonian, Malay, Malayalam, Mongolian, Nepali, Norwegian Bokmal, Norwegian Nynorsk, Persian, Portuguese, Punjabi, Russian, Serbian Latinica, Slovak, Spanish, Swahili, Swedish, Tagalog, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, and Zulu.
Bitext Lemmatization software is part of MarkLogic’s “Ask Anything” Universal Index. We help MarkLogic to provide advanced language support.
If you have any doubt or you would like to discuss any project you have in mind, do not hesitate to contact us! We will be happy to help you create a bot your customers will engage with.
José Echegaray 8, building 3, office 4
Parque Empresarial Las Rozas
28232 Las Rozas
541 Jefferson Ave., Ste. 100