Bitext NLP Data Overview
Bitext develops comprehensive NLP datasets and multilingual tools in up to 77 languages, trusted by market leaders, including lexical, semantic, and syntactic annotation tools.

Lexical Level and Lemmatization
At the lexical level, the main component is the lemmatizer, integrated with tools for decompounding or word segmentation (required by some languages to perform proper lemmatization).
The lemmatizer can be additionally packaged to cover a language analysis full pipeline, from sentence segmentation to full parsing, including tools like spell checking.
Both components of the lemmatizer, data and software, can be distributed integrated or separately. All these tools are available in 77 languages and 25 language variants.
Bitext Lemmatizer
This page describes how the Bitext Lemmatizer works
Syntactic Level and Parsing
At the syntactic level, the parser is the main component. The parser analyzes the structure of the sentences in the text and is used for tasks like POS Tagging and Phrase Extraction. Additionally, it is used as the base component for various semantic level tasks like Named Entity Recognition (NER), Topic-Level Sentiment Analysis or Generation of Synthetic Text. We have developed parsers for 21 languages and are always adding new languages.
For a full list of services, at the lexical, syntactic and semantic levels, check our linguistic services.

MADRID, SPAIN
