We understand as lemmatization as the process followed to determine the lemma of each word in a text depending on its intended meaning.
The main difference with stemming is that lemmatization takes into consideration the context to solve the problem of disambiguation.

Download our white paper


The software is currently available for 35 languages.

Applications

On every text, a word may appear in different inflected forms. For example, the verb “play” can appear as “playing”, “played”, “plays”. However, all of them should be classified in the same category, because even if they are different words they all mean the same.   Our lemmatization software allows you to group the different forms of a word text into the same root word. This has been applied to improve database indexing, text categorization tools, and machine learning pipelines.

Benefits for ML and Deep Learning algorithms

Our lemmatization software helps to disambiguate and group words by considering the context.

Let’s take the word book as an example: depending on the surrounding text it can mean two different things.

I enjoy booking my trips online, it helps me to save money: In this case, booking means reservation, the lemma being the verb “book”.

I bought three new books last week on my trip to Dublin: In this case, books refers to a novel, the lemma being the noun “book”.

If the algorithm can treat each of those “book” as different words, the error margin will be lower, therefore, we will increase the accuracy of the results even when using a smaller training corpora.

Clients

Bitext Lemmatization software as part of MarkLogic’s “Ask Anything” Universal Index. We help MarkLogic to provide advanced language support.

If you are interested in this service, or you need more information please contact us!

Schedule Your Demo

Test-drive our Text
Analytics tools, for FREE!

Our cloud services help market research professionals and data scientists perform sentiment analysis, categorization and entity & concept extraction, easily and effectively.

Free trial. No credit card required. No obligation.

Start Analyzing

madrid

Madrid, SPAIN

José Echegaray 8 , building 3, office 4
Parque Empresarial Las Rozas
28232 Las Rozas

san-francisco

SAN FRANCISCO, USA

1700 Montgomery Street, Suite 101
CA 94111