Hindi (HI) Language Data
Inflectional Morphology Data
The Lexical Resource for Hindi contains all the standard inflectional forms for nouns, verbs, adjectives, postpositions, conjunctions, etc
Derivational Morphology Data
Contains all the standard derivational forms including verbs derived from nouns, nouns derived from verbs and adjectives derived from nouns, and common compound words.
Extended Morphology Data
Contains the result of extending the inflectional and derivational forms lists as a result of considering additional morphological phenomena such as common combinations of postposition suffixes.
Contains the data regarding the relative frequency of appearance for the words in the above lists in the given language.
Each word has been assigned a frequency group, where the frequency group corresponds to a normalized logarithmic scale from 0 to 255. The most frequent word in the corpus has been assigned frequency group 255, and words not appearing in the corpus have been assigned frequency group 0.
Complementary Semantic Annotations
Named Entities Morphology Data
Contains the data regarding named entities comprising person names, places, companies and organizations.
Offensive Language Flag
Contains information per word indicating if the word might be considered offensive in certain contexts.
Volume of Language Data
Total number of forms
- Verbs: 140,000 forms (27%)
- Nouns: 350,000 forms (70%)
- Adjectives: 12,000 forms (2%)
- Other: 1,000 forms (1%)
Total number of lemmas
Each form will be annotated with the lemma (root form), POS, and morphological attributes (tense, mood, gender, number, person, case, possessive-gender, possessive-number, possessive-case).
The canonical form for the inflected word.
Part of Speech such as noun, verb, adjective, etc.
Specifies when the action takes place such as past, present, future, etc.
Modality of the verb form: indicative, subjunctive, imperative, etc
Verb or pronoun refers to the first, second or third person.
State of being singular, dual or plural.
Noun, verb or adjective forms are provided, masculine, feminine, neuter, etc.
The function that the noun or adjective plays within a sentence.
Clitic pronouns are identified and tagged.
Relative frequency of the form based on a large general-purpose corpus.
Pre-defined entities are tagged as person names, places, organization, etc.
Indicates whether the form might be considered offensive in certain contexts.
SAN FRANCISCO, USA
541 Jefferson Ave., Ste. 100
José Echegaray 8, Building 3
Parque Empresarial Las Rozas
28232 Las Rozas