The Hidden Signal in Millions of News Articles That Reveals How Global Narratives Form
Every day, millions of news articles are published about technology, business and geopolitics.
But there is a signal hidden inside them that most analytics systems completely miss.
It isn’t in what the articles say.
It’s in which entities appear together.
Once you start measuring that signal, you can see how global narratives form.
This phenomenon is called co-mentions, and it is widely used in knowledge graph construction and large-scale text analysis.
Why Co-mentions Matter
Counting mentions tells you which entities are important.
But co-mentions tell you something far more valuable: how those entities are connected.
That distinction is crucial.
For example: AI might appear in thousands of articles.
But if AI increasingly appears alongside Nvidia, something deeper is happening. It reveals a narrative forming:
AI infrastructure → Nvidia
Similarly, when AI increasingly appears together with the US or China, the story changes. AI is no longer just a technology topic. It has become a geopolitical one.
Co-mentions allow us to detect these narrative shifts early – before they become obvious.
The Experiment
We tested this idea using the Leipzig English News corpora from the Wortschatz Project at Leipzig University. We analyzed datasets from 2023, 2024 and 2025.
Across these datasets, the pipeline processed roughly:
- 2 million raw news articles
- 400K articles after topical filtering
From these documents the pipeline extracted:
- millions of entity mentions
- tens of millions of co-mention relationships
To focus on economic and technology narratives, documents were filtered using the IPTC Media Topics taxonomy, keeping only:
- Economy, Business and Finance
- Science and Technology
| Dataset Scope | Approximate Volume |
|---|---|
| Raw news articles processed | 2 million |
| Articles after topical filtering | 400K |
| Entity mentions extracted | Millions |
| Co-mention relationships generated | Tens of millions |
How the Analysis Works
The pipeline combines entity extraction with graph analysis:
- Entity recognition using the Bitext NLP SDK (companies, countries, technologies)
- Entity normalization (e.g. “US”, “United States”, “America” → United States)
- Extraction of relationships between entities appearing in the same document
- Aggregation of co-mentions across the corpus
Relationships are generated by linking entities that appear in the same document, producing weighted co-mention edges.
For example, if a document mentions US, China, Nvidia and AI, the system generates relationships such as:
- US – China
- US – AI
- China – AI
- Nvidia – AI
| Pipeline Step | What It Does |
|---|---|
| Entity recognition | Extracts companies, countries, technologies and other entities from text |
| Normalization | Maps variants such as “US” and “America” to a canonical entity |
| Relationship extraction | Links entities appearing in the same document |
| Aggregation | Builds weighted co-mention patterns across the corpus |
From Text to Knowledge Graph
When these relationships are aggregated across hundreds of thousands of articles, they form a knowledge graph that reveals patterns in global narratives.
Even a tiny fragment already tells a story:
AI → Nvidia → U.S. → China
Technology → infrastructure → geopolitics.
| Input | Transformation | Output |
|---|---|---|
| Unstructured news text | Entity extraction + co-mention analysis | Knowledge graph of entities and relationships |
Why This Matters
Most of the world’s knowledge still lives in unstructured text. But once entities and relationships are extracted at scale, that text can be transformed into structured knowledge graphs ready for analysis.
These graphs integrate naturally with platforms such as Neo4j, Stardog, Ontotext and MarkLogic, where the extracted entities and relationships can be explored and analyzed.
In short: text → entities → relationships → knowledge graph
And once the graph exists, hidden signals start to appear.
| Stage | Result |
|---|---|
| Text | Raw unstructured articles |
| Entities | Normalized companies, countries, technologies and other concepts |
| Relationships | Weighted co-mentions between entities |
| Knowledge graph | Structured narrative map ready for analysis |
In Summary
Co-mentions are one of the simplest signals you can extract from text.
But at scale, they reveal how the world connects ideas, companies and countries.
What other signals do you think could be extracted from large-scale news analysis?