Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
A weighted graph connecting frequently used keywords of a text that are sequential neighbors and thus visualizing the flow and clustering of ideas in the text
ResourceFunction["KeywordsGraph"][text,number] finds a given number of most used words in text (keywords) and builds a graph with such keywords as vertices where any two vertices are connected by an edge if one of the keywords follows the other directly in text. |
|
ResourceFunction["KeywordsGraph"][text, number, blist] builds a graph with blacklisted strings blist removed from the text. |
Consider an English tongue twister:
In[1]:= |
![]() |
Find the nine most frequently used words (not counting stop words) and see which words are directly next to each other in the text:
In[2]:= |
![]() |
Out[2]= |
![]() |
You can also find the order in which words follow each other:
In[3]:= |
![]() |
Out[3]= |
![]() |
Get the text of the book Alice In Wonderland and build a keywords graph for the top eleven keywords:
In[4]:= |
![]() |
Out[5]= |
![]() |
Exclude the unwanted words by forming a blacklist. You can also apply any option of Graph. For instance, you can restyle your graph and resize vertices in accordance with their properties:
In[6]:= |
![]() |
Out[7]= |
![]() |
Because KeywordsGraph yields a Graph expression, you can apply any functions to it that you can apply to a Graph. For instance, you can find clustering by displaying community structure (note, because edges are weighted they might influence how the clustering is computed):
In[8]:= |
![]() |
Out[8]= |
![]() |
VertexWeight and EdgeWeight are set to the numbers of times keywords and their next-neighbor pairs are met in a text:
In[9]:= |
![]() |
Out[9]= |
![]() |
The order of numbers of VertexWeight corresponds to the order of VertexList:
In[10]:= |
![]() |
Out[10]= |
![]() |
Consider a text where capitalization matters. For instance, here "us" and "US" are different terms:
In[11]:= |
![]() |
By default ToLowerCase is applied and "us" is not distinguished from "US":
In[12]:= |
![]() |
Out[13]= |
![]() |
Use option "LowerCase"→False to distinguish capitalized cases:
In[14]:= |
![]() |
Out[14]= |
![]() |
Sometimes you might need to keep some stop words. For example, consider "us" and "US" here:
In[15]:= |
![]() |
By default "us" and "US" will be removed by DeleteStopwords:
In[16]:= |
![]() |
Out[16]= |
![]() |
Use option "StopWords"→False to retain some stopwords and make your own blacklist of words to remove:
In[17]:= |
![]() |
Out[18]= |
![]() |
Get the dataset for presidential inaugural addresses from the Wolfram Data Repository and order it by time:
In[19]:= |
![]() |
Extract the text of inaugural addresses for the two last presidents as of 2019 - Barack Obama and Donald Trump:
In[20]:= |
![]() |
Define graph styles:
In[21]:= |
![]() |
Build KeywordsGraph for Barack Obama and Donald Trump using 30 keywords. You can get the notion of key ideas without actually reading the texts:
In[22]:= |
![]() |
Out[22]= |
![]() |
In[23]:= |
![]() |
Out[23]= |
![]() |
The second argument (number of keywords in graph) should not exceed the total number of keywords in the text:
In[24]:= |
![]() |
In[25]:= |
![]() |
Out[25]= |
![]() |
Get the dataset for presidential inaugural addresses from the Wolfram Data Repository and order it by time:
In[26]:= |
![]() |
In[27]:= |
![]() |
Define graph styles:
In[28]:= |
![]() |
Build KeywordsGraph for each address using 30 keywords and arrange them in a grid:
In[29]:= |
![]() |
In[30]:= |
![]() |
Out[30]= |
![]() |
This work is licensed under a Creative Commons Attribution 4.0 International License