Function Repository Resource:

KeywordsGraph (1.0.0) current version: 2.0.0 »

A weighted graph connecting frequently used keywords of a text that are sequential neighbors and thus visualizing the flow and clustering of ideas in the text

Contributed by: Vitaliy Kaurov

ResourceFunction["KeywordsGraph"][text,number]

finds a given number of most used words in text (keywords) and builds a graph with such keywords as vertices where any two vertices are connected by an edge if one of the keywords follows the other directly in text.

ResourceFunction["KeywordsGraph"][text, number, blist]

builds a graph with blacklisted strings blist removed from the text.

Details and Options

The function returns a Graph expression.

All options of Graph can be applied. The only additional option to those of Graph are "StopWords" and "LowerCase".

Default setting "StopWords"→True automatically applies DeleteStopwords and hence no stop words can appear as keywords. Use "StopWords"→False to keep the stop words.

Default setting "LowerCase"→True automatically applies ToLowerCase to remove unwanted capitalization (for example, at the beginning of sentences) that might lead to incorrect graphs. Use "LowerCase"→False to keep the capital letters in text, for example, to distinguish some abbreviations.

VertexWeight is set for every vertex to the number of times the corresponding keyword is encountered in text.

EdgeWeight is set for every edge to the number of times an edge connection is made. Among other applications, this also help to build more meaningful CommunityGraphPlot as some of its methods take EdgeWeight in account.

By default an undirected Graph is returned. Use DirectedEdges→True to get a directed graph that shows the sequential order in text of connected keywords.

By default VertexLabels→Automatic to show keywords on the graph. Use option VertexLabels→None to remove them.

Large texts require longer time to compute.

Examples

Basic Examples (3)

Consider an English tongue twister:

In[1]:=

$text = "Betty Botter bought some butter But she said the butter\[CloseCurlyQuote]s bitter If I put it in my batter, it will make my batter bitter But a bit of better butter will make my batter better So \[OpenCurlyQuote]twas better Betty Botter bought a bit of better butter";$

Find the nine most frequently used words (not counting stop words) and see which words are directly next to each other in the text:

In[2]:=

Out[2]=

You can also find the order in which words follow each other:

In[3]:=

Out[3]=

Scope (5)

Get the text of the book Alice In Wonderland and build a keywords graph for the top eleven keywords:

In[4]:=

Out[4]=

Exclude the unwanted words by forming a blacklist. You can also apply any option of Graph. For instance, you can restyle your graph and resize vertices in accordance with their properties:

In[5]:=

Out[5]=

Because KeywordsGraph yields a Graph expression, you can apply any functions to it that you can apply to a Graph. For instance, you can find clustering by displaying community structure (note, because edges are weighted they might influence how the clustering is computed):

In[6]:=