Function Repository Resource:

KeywordsGraph

Source Notebook

A weighted graph visualizing the flow and clustering of ideas in the text

Contributed by: Vitaliy Kaurov

ResourceFunction["KeywordsGraph"][text,number]

finds a given number of most used words in text (keywords) and builds a graph with such keywords as vertices where any two vertices are connected by an edge if one of the keywords follows the other directly in text.

ResourceFunction["KeywordsGraph"][text, number, blist]

builds a graph with blacklisted strings blist removed from the text.

ResourceFunction["KeywordsGraph"][text, number, blist, rlist]

builds a graph with string-replacement rules rlist applied to the text.

Details and Options

ResourceFunction["KeywordsGraph"] returns a Graph expression.
ResourceFunction["KeywordsGraph"] takes the same options as Graph, with the following additions and changes:
DirectedEdgesFalsewhether to use directed edges
"LowerCase"Truewhether to ignore case
"StopWords"Truewhether to show stop words
VertexLabelsAutomaticlabels and placements for vertices
With "StopWords"True, DeleteStopwords is automatically applied, and hence no stop words can appear as keywords. Use "StopWords"False to keep the stop words.
With "LowerCase"True, ToLowerCase is automatically applied to remove unwanted capitalization (for example, at the beginning of sentences) that might lead to incorrect graphs. Use "LowerCase"False to keep the capital letters in text; for example, to distinguish some abbreviations.
VertexWeight is set for every vertex to the number of times the corresponding keyword is encountered in text.
EdgeWeight is set for every edge to the number of times an edge connection is made. Among other applications, this also help to build more meaningful CommunityGraphPlot as some of its methods take EdgeWeight in account.
ResourceFunction["KeywordsGraph"] returns an undirected Graph by default. Use the option setting DirectedEdgesTrue to get a directed graph that shows the sequential order in text of connected keywords.
The default option setting VertexLabelsAutomatic shows the keywords on the graph as vertex labels. Use the option setting VertexLabelsNone to remove them.
Large texts require more time to compute.

Examples

Basic Examples (3) 

Consider an English tongue twister:

In[1]:=
text = "Betty Botter bought some butter
But she said the butter\[CloseCurlyQuote]s bitter
If I put it in my batter, it will make my batter bitter
But a bit of better butter will make my batter better
So \[OpenCurlyQuote]twas better Betty Botter bought a bit of better butter";

Find the nine most frequently used words (not counting stop words) and see which words are directly next to each other in the text:

In[2]:=
ResourceFunction["KeywordsGraph"][text, 9]
Out[2]=

You can also find the order in which words follow each other:

In[3]:=
ResourceFunction["KeywordsGraph"][text, 9, DirectedEdges -> True]
Out[3]=

Scope (2) 

Get the text of the book Alice In Wonderland and build a keywords graph for the top eleven keywords:

In[4]:=
text = ExampleData[{"Text", "AliceInWonderland"}];
ResourceFunction["KeywordsGraph"][text, 11]
Out[5]=

Exclude the unwanted words by forming a blacklist. You can also apply any option of Graph. For instance, you can restyle your graph and resize vertices in accordance with their properties:

In[6]:=
blist = {"came", "said", "like", "just", "went"};
g = ResourceFunction["KeywordsGraph"][text, 11, blist, VertexSize -> "VertexWeight", GraphStyle -> "Prototype"]
Out[7]=

Because KeywordsGraph yields a Graph expression, you can apply any functions to it that you can apply to a Graph. For instance, you can find clustering by displaying community structure (note, because edges are weighted they might influence how the clustering is computed):

In[8]:=
CommunityGraphPlot[g]
Out[8]=

VertexWeight and EdgeWeight are set to the numbers of times keywords and their next-neighbor pairs are met in a text:

In[9]:=
PropertyValue[g, {VertexWeight, EdgeWeight}]
Out[9]=

The order of numbers of VertexWeight corresponds to the order of VertexList:

In[10]:=
ListPlot[
 AssociationThread[VertexList[g] -> PropertyValue[g, VertexWeight]],
 PlotTheme -> "Detailed", ScalingFunctions -> "Log"]
Out[10]=

Occasionally one needs to replace some words with others. Use a list of replacement rules to achieve that. For example, consider the inaugural address by president Joe Biden:

In[11]:=
inaugural = SortBy[ResourceData["Presidential Inaugural Addresses"], "Date"];
biden = inaugural[-1]["Text"];

There are many words in the inaugural address that could be considered as redundant in meaning, such as "america", "american", and "americans":

In[12]:=
g1 = ResourceFunction["KeywordsGraph"][biden, 9, GraphLayout -> "RadialEmbedding"]
Out[12]=

Consolidate these redundant words in a single term of your choice, for instance "america":

In[13]:=
blist = {"story", "know", "today"};
rlist = {"american" -> "america", "americans" -> "america"};
g2 = ResourceFunction["KeywordsGraph"][biden, 9, blist, rlist, GraphLayout -> "RadialEmbedding"]
Out[15]=

Note that the separate counts for all 3 words (18, 9, and 9 correspondingly) were summed to 36 to represent the consolidated word:

In[16]:=
KeySort[AssociationThread[
    VertexList[#] -> PropertyValue[#, VertexWeight]]] & /@ {g1, g2}
Out[16]=

Options (6) 

LowerCase (3) 

Consider a text where capitalization matters. For instance, here "us" and "US" are different terms:

In[17]:=
text = "A few of us have recently immigrated to the US and found spouses here. 
To us the US seems a gret place to raise a family.";

By default ToLowerCase is applied and "us" is not distinguished from "US":

In[18]:=
blist = {"of", "a", "the", "to", "and"};
ResourceFunction["KeywordsGraph"][text, 6, blist, "StopWords" -> False, DirectedEdges -> True]
Out[19]=

Use the option setting "LowerCase"False to distinguish capitalized cases:

In[20]:=
ResourceFunction["KeywordsGraph"][text, 6, blist, "LowerCase" -> False, "StopWords" -> False, DirectedEdges -> True]
Out[20]=

StopWords (3) 

Sometimes you might need to keep some stop words. For example, consider "us" and "US" here:

In[21]:=
text = "A few of us have recently immigrated to the US and found spouses here. 
To us the US seems a great place to raise a family.";

By default "us" and "US" will be removed by DeleteStopwords:

In[22]:=
ResourceFunction["KeywordsGraph"][text, 6]
Out[22]=

Use the option setting "StopWords"False to retain some stopwords and make your own blacklist of words to remove:

In[23]:=
blist = {"of", "a", "the", "to", "and"};
ResourceFunction["KeywordsGraph"][text, 6, blist, "StopWords" -> False]
Out[24]=

Applications (3) 

Get the dataset for presidential inaugural addresses from the Wolfram Data Repository and order it by time:

In[25]:=
inaugural = SortBy[ResourceData["Presidential Inaugural Addresses"], "Date"];

Extract the text of inaugural addresses for the two last presidents as of 2019 - Barack Obama and Donald Trump:

In[26]:=
obama = inaugural[-3]["Text"];
trump = inaugural[-2]["Text"];

Build KeywordsGraph for Barack Obama and Donald Trump using 30 keywords. You can get the notion of key ideas without actually reading the texts:

In[27]:=
CommunityGraphPlot[
 ResourceFunction["KeywordsGraph"][obama, 30, {}, {}, {VertexLabelStyle -> Directive[
GrayLevel[0.8], 14], EdgeStyle -> Opacity[0.5], VertexSize -> "VertexWeight", VertexStyle -> Opacity[0.7], GraphStyle -> "Prototype", Background -> GrayLevel[0]}],
 PlotLabel -> Style["Barack Obama", 30, Lighter@Blue, FontFamily -> "Phosphate"],
 CommunityBoundaryStyle -> Directive[Yellow, Dashed, Opacity[.5]]]
Out[27]=
In[28]:=
CommunityGraphPlot[
 ResourceFunction["KeywordsGraph"][trump, 30, {}, {}, {VertexLabelStyle -> Directive[
GrayLevel[0.8], 14], EdgeStyle -> Opacity[0.5], VertexSize -> "VertexWeight", VertexStyle -> Opacity[0.7], GraphStyle -> "Prototype", Background -> GrayLevel[0]}],
 PlotLabel -> Style["Donald Trump", 30, Lighter@Red, FontFamily -> "Phosphate"],
 CommunityBoundaryStyle -> Directive[Yellow, Dashed, Opacity[.5]]]
Out[28]=

Possible Issues (1) 

The second argument (number of keywords in graph) should not exceed the total number of keywords in the text:

In[29]:=
text = "How much wood would a woodchuck chuck if a woodchuck could chuck wood?
He would chuck, he would, as much as he could, and chuck as much wood,
as a woodchuck would if a woodchuck could chuck wood.";
In[30]:=
ResourceFunction["KeywordsGraph"][text, 6, VertexLabels -> Automatic]
Out[30]=

Neat Examples (2) 

Get the dataset for presidential inaugural addresses from the Wolfram Data Repository and order it by time:

In[31]:=
inaugural = SortBy[ResourceData["Presidential Inaugural Addresses"], "Date"];
In[32]:=
allTEXT = Normal[inaugural[All, "Text"]];
allNAME = Normal[inaugural[All, DateString[#Date, "Year"] <> " " <> CommonName[#Name] &]];

Build the KeywordsGraph for each address using 30 keywords and arrange them in a grid:

In[33]:=
allGRAPH = MapThread[
   CommunityGraphPlot[
     ResourceFunction["KeywordsGraph"][#1, 30, {}, {}, {VertexLabels -> None, EdgeStyle -> Opacity[0.5], VertexSize -> "VertexWeight", VertexStyle -> Opacity[0.7], GraphStyle -> "Prototype", Background -> GrayLevel[0]}],
     PlotLabel -> Style[#2, White, 13],
     CommunityBoundaryStyle -> Directive[Yellow, Dashed, Opacity[0.5]]] &,
   {allTEXT, allNAME}];
In[34]:=
Grid[Partition[allGRAPH, 4], Spacings -> {0, 0}, Background -> Black]
Out[34]=

Version History

  • 2.0.0 – 06 September 2022
  • 1.0.0 – 03 October 2019

Related Resources

License Information