Function Repository Resource:

KeywordsGraph

A weighted graph visualizing the flow and clustering of ideas in the text

Contributed by: Vitaliy Kaurov

ResourceFunction["KeywordsGraph"][text,number]

finds a given number of most used words in text (keywords) and builds a graph with such keywords as vertices where any two vertices are connected by an edge if one of the keywords follows the other directly in text.

ResourceFunction["KeywordsGraph"][text, number, blist]

builds a graph with blacklisted strings blist removed from the text.

ResourceFunction["KeywordsGraph"][text, number, blist, rlist]

builds a graph with string-replacement rules rlist applied to the text.

Details and Options

ResourceFunction["KeywordsGraph"] returns a Graph expression.

ResourceFunction["KeywordsGraph"] takes the same options as Graph, with the following additions and changes:

DirectedEdges

False

whether to use directed edges

"LowerCase"

True

whether to ignore case

"StopWords"

True

whether to show stop words

VertexLabels

Automatic

labels and placements for vertices

With "StopWords"→True, DeleteStopwords is automatically applied, and hence no stop words can appear as keywords. Use "StopWords"→False to keep the stop words.

With "LowerCase"→True, ToLowerCase is automatically applied to remove unwanted capitalization (for example, at the beginning of sentences) that might lead to incorrect graphs. Use "LowerCase"→False to keep the capital letters in text; for example, to distinguish some abbreviations.

VertexWeight is set for every vertex to the number of times the corresponding keyword is encountered in text.

EdgeWeight is set for every edge to the number of times an edge connection is made. Among other applications, this also help to build more meaningful CommunityGraphPlot as some of its methods take EdgeWeight in account.

ResourceFunction["KeywordsGraph"] returns an undirected Graph by default. Use the option setting DirectedEdges→True to get a directed graph that shows the sequential order in text of connected keywords.

The default option setting VertexLabels→Automatic shows the keywords on the graph as vertex labels. Use the option setting VertexLabels→None to remove them.

Large texts require more time to compute.

Examples

Basic Examples (3)

Consider an English tongue twister:

In[1]:=

$text = "Betty Botter bought some butter But she said the butter\[CloseCurlyQuote]s bitter If I put it in my batter, it will make my batter bitter But a bit of better butter will make my batter better So \[OpenCurlyQuote]twas better Betty Botter bought a bit of better butter";$

Find the nine most frequently used words (not counting stop words) and see which words are directly next to each other in the text:

In[2]:=

Out[2]=

You can also find the order in which words follow each other:

In[3]:=

Out[3]=

Scope (2)

Get the text of the book Alice In Wonderland and build a keywords graph for the top eleven keywords:

In[4]:=

Out[5]=

Exclude the unwanted words by forming a blacklist. You can also apply any option of Graph. For instance, you can restyle your graph and resize vertices in accordance with their properties:

In[6]:=

Out[7]=

Because KeywordsGraph yields a Graph expression, you can apply any functions to it that you can apply to a Graph. For instance, you can find clustering by displaying community structure (note, because edges are weighted they might influence how the clustering is computed):

In[8]:=

Out[8]=

VertexWeight and EdgeWeight are set to the numbers of times keywords and their next-neighbor pairs are met in a text:

In[9]:=

Out[9]=

The order of numbers of VertexWeight corresponds to the order of VertexList:

In[10]:=

Out[10]=

Occasionally one needs to replace some words with others. Use a list of replacement rules to achieve that. For example, consider the inaugural address by president Joe Biden:

In[11]:=