Core nlp services for lexical analysis

language identification

Flexibility

You can request additional languages.

API On Premise

Avalaible via our API or on-premises.

High Accuracy

Specially when dealing with short texts.

Bitext Language Identification service detects the language of the input text and returns a list of sentences with their respective language.

Languages

Unlike offerings from other vendors, our Language Identification Technology is designed primarily for high accuracy when dealing with short texts (particularly for bots and other conversational interfaces).

Most language identification tools use underlying character distribution models that are trained using longer texts (such as web pages or Wikipedia articles), but this approach does not perform well when dealing with short single sentences.

Our approach takes advantage of our wide range of linguistic resources, including our computational lexicons and morphological models.

We currently identify 58 languages (and we regularly add support for additional languages as we develop new resources):

 

  • Afrikaans
  • Arabic
  • Azerbaijani
  • Belarusian
  • Bulgarian
  • Catalan
  • Chinese
  • Czech
  • Danish
  • Dutch
  • German
  • Greek
  • English
  • Esperanto
  • Spanish
  • Estonian
  • Basque
  • Persian (Farsi)
  • Finnish
  • French
  • Irish Gaelic
  • Galician
  • Gujarati
  • Hebrew
  • Hindi
  • Hungarian
  • Armenian
  • Indonesian
  • Icelandic
  • Italian
  • Japanese
  • Georgian
  • Kazakh
  • Kannada
  • Korean
  • Kyrgyz
  • Macedonian
  • Malayalam
  • Mongolian
  • Malay
  • Nepali
  • Norwegian Bokmål
  • Norwegian Nynorsk
  • Punjabi
  • Portuguese
  • Russian
  • Slovak
  • Serbian
  • Swedish
  • Swahili
  • Telugu
  • Tagalog
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Zulu
  • Afrikaans
  • Arabic
  • Azerbaijani
  • Belarusian
  • Bulgarian
  • Catalan
  • Chinese
  • Czech
  • Danish
  • Dutch
  • German
  • Greek
  • English
  • Esperanto
  • Spanish
  • Estonian
  • Basque
  • Persian (Farsi)
  • Finnish
  • French
  • Irish Gaelic
  • Galician
  • Gujarati
  • Hebrew
  • Hindi
  • Hungarian
  • Armenian
  • Indonesian
  • Icelandic
  • Italian
  • Japanese
  • Georgian
  • Kazakh
  • Kannada
  • Korean
  • Kyrgyz
  • Macedonian
  • Malayalam
  • Mongolian
  • Malay
  • Nepali
  • Norwegian Bokmål
  • Norwegian Nynorsk
  • Punjabi
  • Portuguese
  • Russian
  • Slovak
  • Serbian
  • Swedish
  • Swahili
  • Telugu
  • Tagalog
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Zulu

The Language Identification service is available through our API and it can also be deployed on-premises as a Python 3 module (together with a native C++ library and a set of language-specific data files).