Common problem:

How to deal with all the available information?

The amount of data available via search engines (Whatsapp, Airbnb or Netflix) grows more and more every day, and if you want your company to make the most out of it, information retrieval systems you use need to connect with similar meanings and different writings (“bicycle”, “bicycles”). This is what a lemmatizer is a must-have tool.

Solution:

Lemmatization to provide more accurate results

Bitext Lemmatization Service relates words that have the same meaning without being misguided by apparent similar letters. For example, in English, it relates “bicycle” and “bicycles” but not “new” and “news”. How? Lemmatization is the process followed to determine the lemma of each word in a text depending on its intended meaning. The lemma form of a word is used to increase search relevancy and to reduce indexing needs in databases. The main difference with stemming is that lemmatization takes into consideration the context to solve the problem of disambiguation.

I want to try it

Applications

Textual Databases

Lemmatization software can be used for compact indexing and comprehensive retrieval. Our software can index and search massive volumes of multi-language data accurately and efficiently while maintaining the highest level of data availability and security.

Machine Learning algorithms

Bitext lemmatization software helps to disambiguate and group words by considering the context. Let’s take the word "book" as an example: depending on the surrounding text it can mean two different things.
  • “I enjoy booking my trips online, it helps me to save money”: In this case, “booking” means 'reservation', the lemma being the verb “book”.
  • “I bought three new books last week on my trip to Dublin”: In this case, “books” refers to a novel, the lemma being the noun “book”.
If the algorithm can treat each of those “book” as different words, the error margin will be lower, therefore, we will increase the accuracy of the results even when using a smaller training corpora.

eCommerce Search

For any eCommerce site, if a user asks for “One flies over the cuckoo’s nest” it will probably not find the intended book if it is using a stemmer. But if you are using Bitext Lemmatization Service it will analyze “fly” and determine its relationship with “flew” and “flown”, and will add these words to the query, finding the book “One Flew Over the Cuckoo’s Nest”.

Available Languages

The software is currently available for over 50 languages: Afrikaans, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bulgarian, Catalan, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Kannada, Kazakh, Korean, Kyrgyz, Macedonian, Malay, Malayalam, Mongolian, Nepali, Norwegian Bokmal, Norwegian Nynorsk, Persian, Portuguese, Punjabi, Russian, Serbian Latinica, Slovak, Spanish, Swahili, Swedish, Tagalog, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, and Zulu.

Request a personalized demo


Clients

Bitext Lemmatization software as part of MarkLogic’s “Ask Anything” Universal Index. We help MarkLogic to provide advanced language support.

Contact us

If you have any doubt or you would like to discuss any project you have in mind, do not hesitate to contact us! We will be happy to help you create a bot your customers will engage with.

Contact us


madrid

MADRID, SPAIN

José Echegaray 8, building 3, office 4
Parque Empresarial Las Rozas
28232 Las Rozas


san-francisco

SAN FRANCISCO, USA

1700 Montgomery Street, Suite 101
CA 94111