The Hidden Signal in Millions of News Articles That Reveals How Global Narratives Form

Every day, millions of news articles are published about technology, business and geopolitics.

But there is a signal hidden inside them that most analytics systems completely miss.

It isn’t in what the articles say.

It’s in which entities appear together.

Once you start measuring that signal, you can see how global narratives form.

This phenomenon is called co-mentions, and it is widely used in knowledge graph construction and large-scale text analysis.


Why Co-mentions Matter

Counting mentions tells you which entities are important.

But co-mentions tell you something far more valuable: how those entities are connected.

That distinction is crucial.

For example: AI might appear in thousands of articles.

But if AI increasingly appears alongside Nvidia, something deeper is happening. It reveals a narrative forming:

AI infrastructure → Nvidia

Similarly, when AI increasingly appears together with the US or China, the story changes. AI is no longer just a technology topic. It has become a geopolitical one.

Co-mentions allow us to detect these narrative shifts early – before they become obvious.


The Experiment

We tested this idea using the Leipzig English News corpora from the Wortschatz Project at Leipzig University. We analyzed datasets from 2023, 2024 and 2025.

Across these datasets, the pipeline processed roughly:

  • 2 million raw news articles
  • 400K articles after topical filtering

From these documents the pipeline extracted:

  • millions of entity mentions
  • tens of millions of co-mention relationships

To focus on economic and technology narratives, documents were filtered using the IPTC Media Topics taxonomy, keeping only:

  • Economy, Business and Finance
  • Science and Technology
Dataset ScopeApproximate Volume
Raw news articles processed2 million
Articles after topical filtering400K
Entity mentions extractedMillions
Co-mention relationships generatedTens of millions

How the Analysis Works

The pipeline combines entity extraction with graph analysis:

  1. Entity recognition using the Bitext NLP SDK (companies, countries, technologies)
  2. Entity normalization (e.g. “US”, “United States”, “America” → United States)
  3. Extraction of relationships between entities appearing in the same document
  4. Aggregation of co-mentions across the corpus

Relationships are generated by linking entities that appear in the same document, producing weighted co-mention edges.

For example, if a document mentions US, China, Nvidia and AI, the system generates relationships such as:

  • US – China
  • US – AI
  • China – AI
  • Nvidia – AI
Pipeline StepWhat It Does
Entity recognitionExtracts companies, countries, technologies and other entities from text
NormalizationMaps variants such as “US” and “America” to a canonical entity
Relationship extractionLinks entities appearing in the same document
AggregationBuilds weighted co-mention patterns across the corpus

From Text to Knowledge Graph

When these relationships are aggregated across hundreds of thousands of articles, they form a knowledge graph that reveals patterns in global narratives.

Even a tiny fragment already tells a story:

AI → Nvidia → U.S. → China

Technology → infrastructure → geopolitics.

InputTransformationOutput
Unstructured news textEntity extraction + co-mention analysisKnowledge graph of entities and relationships

Why This Matters

Most of the world’s knowledge still lives in unstructured text. But once entities and relationships are extracted at scale, that text can be transformed into structured knowledge graphs ready for analysis.

These graphs integrate naturally with platforms such as Neo4j, Stardog, Ontotext and MarkLogic, where the extracted entities and relationships can be explored and analyzed.

In short: text → entities → relationships → knowledge graph

And once the graph exists, hidden signals start to appear.

StageResult
TextRaw unstructured articles
EntitiesNormalized companies, countries, technologies and other concepts
RelationshipsWeighted co-mentions between entities
Knowledge graphStructured narrative map ready for analysis

In Summary

Co-mentions are one of the simplest signals you can extract from text.

But at scale, they reveal how the world connects ideas, companies and countries.

What other signals do you think could be extracted from large-scale news analysis?

 

 

Sharing is caring!