Applications

Bitext has developed internally a set of linguistic technology to work in a multilingual context. No third party or open source software has been used:

Talk to us!

segmentation

Segmentation:

Phrase identification. Knowing when the phrases end. For example, our technology can differentiate and when “.” Is a full stop instead of an abbreviation like in “J. Smith”

  tokenization

Tokenization:

Identify the different words in the text.

  chunking

Chunking:

Identification of meaningful concepts that can be formed by multiword expressions.

  parking

Parsing:

Generation of the relevant parse tree of a sentence. To understand the syntactic structure of each phrase.

  incremental parsing

Incremental Parsing:

“On the fly parsing” that generates the different parsing possibilities in real time as the sentence is being written.

  reference resolution

Reference Resolution:

Used to identify what is the source word that pronouns refer to.

  semantic role labeling

Semantic Roles Labeling:

Identify what role plays a participant in an action. Used to know that in both phrases “Acme Inc. is being acquired by John Smith Industries” and in “John Smish Industries are acquiring Acme Inc.” the company acquired is the same.

  lemmatization

Lemmatization:

Identification of the lema (the base or dictionary form) of a word. This is a manageable task in regular languages but the complexity grows with irregular languages like Spanish or Hungarian.

  disambiguation

Disambiguation:

To know with lema to choose when going from an inflected form. For example to know that the lema in “plays” is referring to the verb or to the noun.

madrid

MADRID, SPAIN

José Echegaray 8 , building 3, office 4
Parque Empresarial Las Rozas
28232 Las Rozas

san-francisco

SAN FRANCISCO, USA

1700 Montgomery Street, Suite 101
CA 94111