Linguistic Services
Bitext provides core tools to automatically pre-annotate custom corpora & datasets. These tools annotate both at the word level (lemmatization/stemming, inflection…) and at the sentence level (Topic-Based Sentiment Analysis, Categorization, Parsing…). We provide:

Lexical services (no grammar)
Your Title Goes Here
Sentence segmentation
Tokenization
Word segmentation (no-space tokenization)
Decompounding
Lemmatization (ambiguous)
POS Tagging (ambiguous)
Return the possible parts of speech (and optionally other attributes) of a word Applicable to all languages Example: run → verb (infinitive), verb (1st person singular, present tense), noun (singular)
Inflection
Language identification
Spell checking
Spell suggestions
Syntactic services (grammar)
Your Title Goes Here
Entity extraction
Detect proper names (people, places…) and other special text (phones, URLs…) Applicable to all languages Example: John lives in New York → “John” – person name, “New York” – place
Offensive language detection
Detect offensive or vulgar expressions in text Applicable to all languages
Example: tell John to f*ck off → “f*ck off” – offensive
Anonymization
Remove sensitive or personal information (PII) from text Applicable to all languages Example: My name is John and my account number is 1234567 → My name is XXXX and my account number is XXXX.
POS-Tagging (disambiguated)
Return the parts of speech for each word in a sentence
Applicable to all languages Example: John runs back home → “John” – proper noun, “runs” – verb, “back” – preposition, “home” – noun
Phrase Extraction
Returns the constituents (noun phrases, verb phrases…) of a sentence Applicable to all languages Example: John’s sister was performing in the theatre → “John’s sister” – NP, “was performing” – VP, “in the theatre” – PP
Topic-Based Sentiment Analysis
Returns the sentiment and corresponding topic of opinions in text Applicable to all languages Example: I hate my old phone → opinion: “hate” (negative), topic: “my old phone”
Categorization
Returns the categories applicable to a text, based on pre-defined rules Applicable to all languages Example: John is feeling great. → HAPPINESS [RULE: feel + great → HAPPINESS] Example: John was weeping like a willow. → SADNESS [RULE: weep + like + willow → SADNESS]
Parsing
Produce a tree with the hierarchical constituent parts of a sentence (words, phrases, clauses…) Applicable to all languages
Languages
- Afrikaans
- Albanian
- Amharic
- Arabic
- Armenian
- Assamese
- Azeri
- Basque
- Belarusian
- Bengali
- Bulgarian
- Burmese
- Catalan
- Chinese
- Croatian
- Czech
- Danish
- Dutch
- English
- Esperanto
- Estonian
- Finnish
- French
- Galician
- Georgian
- German
- Greek
- Gujarati
- Hebrew
- Hindi
- Hungarian
- Icelandic
- Indonesian
- Irish Gaelic
- Italian
- Japanese
- Kannada
- Kazakh
- Khmer
- Korean
- Kyrgyz
- Lao
- Latvian
- Lithuanian
- Macedonian
- Malay
- Malayalam
- Marathi
- Mongolian
- Nepali
- Norwegian Bokmal
- Norwegian Nynorsk
- Oriya
- Persian
- Polish
- Portuguese
- Punjabi
- Romanian
- Russian
- Serbian
- Sindhi
- Sinhala
- Slovak
- Slovenian
- Spanish
- Swahili
- Swedish
- Tagalog
- Tamil
- Telugu
- Thai
- Turkish
- Ukrainian
- Urdu
- Uzbek
- Vietnamese
- Zulu
- Afrikaans
- Albanian
- Amharic
- Arabic
- Armenian
- Assamese
- Azeri
- Basque
- Belarusian
- Bengali
- Bulgarian
- Burmese
- Catalan
- Chinese
- Croatian
- Czech
- Danish
- Dutch
- English
- Esperanto
- Estonian
- Finnish
- French
- Galician
- Georgian
- German
- Greek
- Gujarati
- Hebrew
- Hindi
- Hungarian
- Icelandic
- Indonesian
- Irish Gaelic
- Italian
- Japanese
- Kannada
- Kazakh
- Khmer
- Korean
- Kyrgyz
- Lao
- Latvian
- Lithuanian
- Macedonian
- Malay
- Malayalam
- Marathi
- Mongolian
- Nepali
- Norwegian Bokmal
- Norwegian Nynorsk
- Oriya
- Persian
- Polish
- Portuguese
- Punjabi
- Romanian
- Russian
- Serbian
- Sindhi
- Sinhala
- Slovak
- Slovenian
- Spanish
- Swahili
- Swedish
- Tagalog
- Tamil
- Telugu
- Thai
- Turkish
- Ukrainian
- Urdu
- Uzbek
- Vietnamese
- Zulu
Variants
- French Dutch
- Portuguese
- Spanish
- English
- Italian
- German
- Turkish
- Polish
Under Preparation:
- Danish
- Swedish
- Korean
- Chinese
- Japanese
Contact us for more Information about our evaluation and training data

SAN FRANCISCO, USA
541 Jefferson Ave., Ste. 100
Redwood City
CA 94063

MADRID, SPAIN
José Echegaray 8, Building 3
Parque Empresarial Las Rozas
28232 Las Rozas