table.bitext-table { width:100%; border-collapse:collapse; font-size:15px; margin:10px 0 22px; }
table.bitext-table th { background-color:#b71c1c ; /* rojo Bitext */ color:#ffffff ; padding:8px 10px; border:1px solid #9c1515; text-align:left; }
table.bitext-table td { padding:8px 10px; border:1px solid #e0e0e0; color:#333333; }
table.bitext-table tr:nth-child(even) td { background-color:#fafafa; }
Every day, millions of news articles are published about technology, business and geopolitics.
But there is a signal hidden inside them that most analytics systems completely miss.
It isn’t in what the articles say.
It’s in which entities appear together.
Once you start measuring that signal, you can see how global narratives form.
This phenomenon is called co-mentions, and it is widely used in knowledge graph construction and large-scale text analysis.
Counting mentions tells you which entities are important.
But co-mentions tell you something far more valuable: how those entities are connected.
That distinction is crucial.
For example: AI might appear in thousands of articles.
But if AI increasingly appears alongside Nvidia, something deeper is happening. It reveals a narrative forming:
AI infrastructure → Nvidia
Similarly, when AI increasingly appears together with the US or China, the story changes. AI is no longer just a technology topic. It has become a geopolitical one.
Co-mentions allow us to detect these narrative shifts early – before they become obvious.
We tested this idea using the Leipzig English News corpora from the Wortschatz Project at Leipzig University. We analyzed datasets from 2023, 2024 and 2025.
Across these datasets, the pipeline processed roughly:
From these documents the pipeline extracted:
To focus on economic and technology narratives, documents were filtered using the IPTC Media Topics taxonomy, keeping only:
| Dataset Scope | Approximate Volume |
|---|---|
| Raw news articles processed | 2 million |
| Articles after topical filtering | 400K |
| Entity mentions extracted | Millions |
| Co-mention relationships generated | Tens of millions |
The pipeline combines entity extraction with graph analysis:
Relationships are generated by linking entities that appear in the same document, producing weighted co-mention edges.
For example, if a document mentions US, China, Nvidia and AI, the system generates relationships such as:
| Pipeline Step | What It Does |
|---|---|
| Entity recognition | Extracts companies, countries, technologies and other entities from text |
| Normalization | Maps variants such as “US” and “America” to a canonical entity |
| Relationship extraction | Links entities appearing in the same document |
| Aggregation | Builds weighted co-mention patterns across the corpus |
When these relationships are aggregated across hundreds of thousands of articles, they form a knowledge graph that reveals patterns in global narratives.
Even a tiny fragment already tells a story:
AI → Nvidia → U.S. → China
Technology → infrastructure → geopolitics.
| Input | Transformation | Output |
|---|---|---|
| Unstructured news text | Entity extraction + co-mention analysis | Knowledge graph of entities and relationships |
Most of the world’s knowledge still lives in unstructured text. But once entities and relationships are extracted at scale, that text can be transformed into structured knowledge graphs ready for analysis.
These graphs integrate naturally with platforms such as Neo4j, Stardog, Ontotext and MarkLogic, where the extracted entities and relationships can be explored and analyzed.
In short: text → entities → relationships → knowledge graph
And once the graph exists, hidden signals start to appear.
| Stage | Result |
|---|---|
| Text | Raw unstructured articles |
| Entities | Normalized companies, countries, technologies and other concepts |
| Relationships | Weighted co-mentions between entities |
| Knowledge graph | Structured narrative map ready for analysis |
Co-mentions are one of the simplest signals you can extract from text.
But at scale, they reveal how the world connects ideas, companies and countries.
What other signals do you think could be extracted from large-scale news analysis?
Large Language Models are powerful systems for language generation and reasoning. However, when they are…
German and Korean do not break retrieval because they are unusually complex; they break retrieval…
Problem. There’s broad consensus today: LLMs are phenomenal personal productivity tools — they draft, summarize,…
Rationale. NER tools are at the heart of how the scientific community is solving LLM…
As described in our previous post “Using Public Corpora to Build Your NER systems”, we…
The new Forrester Wave™: Data Governance Solutions, Q3 2025 makes one thing clear: governance is…