Raw data can be difficult to handle when you need to look for specific information within the text. Entities are a good example: they are useful for different purposes (text anonymization, knowledge graph generation, etc.) – still, they pose ambiguity problems that are hard to solve with high accuracy.
Bitext Entity Extraction is able to locate and classify over 16 types of entities such as names, persons, and organizations using a combination of NLP technologies:
Deep Linguistic Analysis based on grammars
Alphanumeric pattern detection using regular expressions
Monolingual and multilingual dictionaries
Use entity analysis to detect personal data in a secure way to ensure compliance with European GDPR Data Privacy legislation
Having access to loads and loads of text data can be a real opportunity. Entity extraction tools allow for taking advantage of this opportunity, as data is of no use at all unless it is analyzed and understood.
Entities are more than just isolated strings. They have properties; Entities are connected to other entities; entities perform actions, etc. Pure entity detection doesn’t do the job, does not extract all there is to learn: actions, properties... Bitext Entity Extraction is the perfect tool to create Advanced Knowledge Graphs.
The entity extraction service detects and extracts:
Proper names such as: Lionel Messi, Tom Brady, Puerto Rico, United Nations. These ones can be classified into different categories: people, places, organizations.
Numeric entities like: bank accounts, money amounts or phone numbers.
Alphanumeric entities as: car plates, web addresses, dates, identity cards.
E-mail addresses, URLs, social media users and hashtags.
The service detects entities even though they may be written in different forms (for example: 20:00, 20 hours, 20h, 8 pm).
In addition, it applies a normalization process to the entities, presenting them in a standard form in order to consistently handle all instances of the same entity (NYSE, New York Stock Exchange, NY Stock Exchange are instances of the same entity). The service can provide on demand the detection of entities which are not written in upper case: “I am in new york”.
Bitext’s linguistic engine assigns types to entities depending on syntactic rules: for example, in the sentence “I live at Barack Obama” the name of the president is interpreted as the name of an avenue, whereas in the sentence “As Barack Obama said” the proper noun is identified as the name of the US president. This feature is provided on demand.
José Echegaray 8, building 3
28232 Las Rozas
1700 Montgomery Street