Most teams working with Elasticsearch, OpenSearch or RAG pipelines focus on ranking, embeddings or model quality when trying to improve relevance. But in many cases, the issue starts much earlier: in how text is normalized before indexing. In a previous post, we...
Some RAG issues have a simpler fix than people think: better text normalization. One common culprit is stemming. Stemming is a blunt, error-prone approach: it strips word endings mechanically, without properly accounting for morphology, part of speech, or context....
Almost all of us use a search engine in our daily work. It has become a key tool to get things done. However, as the amount of data grows exponentially, providing high-quality results that truly match user queries becomes more complex. One of the issues that...
Arabic is a complex language for NLP tasks, even for simple ones like lemmatization. There are several reasons for this: Arabic creates words based on roots: for example, the word کتاب (kitab, “book”) is derived from ك ت ب (k t b). Many related words are derived from...
Everything looks promising in the world of bots: big players are pushing platforms to build them (Google, Amazon, Facebook, Microsoft, IBM, Apple), large retail companies are adopting them (Starbucks, Domino’s, British Airways), press is excited about movies becoming...
People who use financial databases are aware of the hardships of ensuring information is structured and legible. Don’t worry! Knowledge graphs are here to help. Data volume, nowadays, continues to grow uncontrolled and those datasets are hard to process and draw...
Recent Comments