Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Find words associated with a search term in a list, text file, PDF or URL
| ResourceFunction["ConcordanceWords"][source,searchterm,n] finds surrounding words within n words of searchterm in source. | 
| "file" | a file or URL corresponding to a PDF file | 
| {"string1","string2",…} | a list of strings of text content | 
Find words occurring next to or near "president" in the US Constitution:
| In[1]:= | ![ResourceFunction["ConcordanceWords"][
 List[ExampleData[{"Text", "USConstitution"}]], "president"]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/3db4414dc6c071b1.png) | 
| Out[1]= |  | 
Find words occurring next to or near "Earth" on Wikipedia's page on the Moon:
| In[2]:= | ![ResourceFunction["ConcordanceWords"][
 List[WikipediaData["Moon"]], "Earth"]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/770664c029636986.png) | 
| Out[2]= |  | 
Find words occurring next to or near "analytics" in a PDF published online:
| In[3]:= | ![ResourceFunction[
 "ConcordanceWords"]["http://exampledata.wolfram.com/article.pdf", "analytics"]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/41c764666401dbc0.png) | 
| Out[3]= |  | 
Specify a distance of 5 for words occurring next to "Sheet" on Wikipedia's page on "Paper":
| In[4]:= | ![ResourceFunction["ConcordanceWords"][
 List[WikipediaData["Paper"]], "Sheet", 5]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/1de4ebf215333822.png) | 
| Out[4]= |  | 
Find words occurring next to or near "circle" on a webpage using its URL:
| In[5]:= |  | 
| In[6]:= | ![ResourceFunction["ConcordanceWords"][arXivAPI, "circle"]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/44bf5a29c3caa7a5.png) | 
| Out[6]= |  | 
Specify a distance of 5:
| In[7]:= | ![ResourceFunction["ConcordanceWords"][arXivAPI, "Circle", 5]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/7701611c088dd5d4.png) | 
| Out[7]= |  | 
The web scraping function will only work if it matches the XML element condition:
| In[8]:= | ![ResourceFunction[
 "ConcordanceWords"]["https://arxiv.org/abs/1906.00068v1", "circles"]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/2edc72db6a5c0085.png) | 
| Out[8]= |  | 
Instead, the following code can be used to import and process the data:
| In[9]:= | ![positions = Position[StringCases[
   Import["https://arxiv.org/abs/1906.00068v1", "Hyperlinks"], RegularExpression["(/pdf/)|(.pdf)"]], Except@{}, 1, Heads -> False]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/64d75dd8b230f2f0.png) | 
| Out[9]= |  | 
| In[10]:= | ![links = DeleteDuplicates[
  Flatten[Import["https://arxiv.org/abs/1906.00068v1", "Hyperlinks"][[#]] & /@ positions]]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/59d6b82e7f06ee12.png) | 
| Out[10]= |  | 
| In[11]:= | ![ResourceFunction["ConcordanceWords"][#, "Circles"] & /@ links](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/17e4e78fe77edaa5.png) | 
| Out[11]= |  | 
Find correlated words using ServiceConnect["ArXiv"]:
| In[12]:= | ![arXiv = ServiceConnect["ArXiv"];
articles = arXiv["Search", {"Query" -> "Physics", "MaxItems" -> 5}];
urls = Normal@articles[All, {"URL"}];
urlist = Flatten[Values[urls]];
pdfurls = StringReplace[urlist, "http://arxiv.org/abs/" -> "http://arxiv.org/pdf/"];
datapdf = Quiet[Import[#, "Plaintext"] & /@ pdfurls];](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/63878e18eceda8da.png) | 
| In[13]:= | ![ResourceFunction["ConcordanceWords"][datapdf, "Force"]](https://www.wolframcloud.com/obj/resourcesystem/images/2cc/2ccb2635-34e9-45e1-9596-64103ed3f361/01afb45587656bc8.png) | 
| Out[13]= |  | 
This work is licensed under a Creative Commons Attribution 4.0 International License