Wolfram Language Paclet Repository
Community-contributed installable additions to the Wolfram Language
ArXivExplore helps the deep data analysis of all research articles on ArXiv
Contributed by: Daniele Gregori
ArXivExplore helps the deep data analysis of all 2.6M physics, math, cs, etc. articles on ArXiv, providing functionality for e.g. title/abstract word statistics; TeX source/formulae and citations dissection; NNs for classification or recommendation; LLM automated concept definitions and author reports.
To install this paclet in your Wolfram Language environment,
evaluate this code:
PacletInstall["DanieleGregori/ArXivExplore"]
To load the code after installation, evaluate this code:
Needs["DanieleGregori`ArXivExplore`"]
The first article ever on ArXiv:
| In[1]:= |
| Out[1]= |
| In[2]:= |
| Out[2]= |
| In[3]:= |
| Out[3]= |
A DateListPlot showing the trends in the most popular title words in theoretical physics category (hep-th):
| In[4]:= |
| Out[4]= | ![]() |
All the 100 most common 2-neighbour title words on the whole ArXiv, ever:
| In[5]:= |
| Out[5]= | ![]() |
Let us also show an author's citations graph, with the tooltip indicating the articles ids:
| In[6]:= |
| Out[6]= | ![]() |
The dimensions whole ArXiv dataset (at the end of July 2024):
| In[7]:= |
| Out[7]= |
Let us create a super-database with all computer science "cs" type primary or cross-list categories:
| In[8]:= |
and then let us visualize the most frequent and less frequent title words:
| In[9]:= | ![]() |
| Out[9]= | ![]() |
Let us compute the 4 most frequent categories:
| In[10]:= |
| Out[10]= | ![]() |
with their meaning:
| In[11]:= |
| Out[11]= | ![]() |
Using only titles and abstracts, we can train a NN to classify different categories:
| In[12]:= |
| In[13]:= |
| Out[13]= | ![]() |
| In[14]:= |
| Out[14]= | ![]() |
Even with a basic 15 minutes training on laptop CPU, we obtain 95% accuracy:
| In[15]:= |
| Out[15]= |
| In[16]:= |
| Out[16]= | ![]() |
We could even to classify authors within the same category, with ArXivClassifyAuthorNet.
Extracting TEX introduction:
| In[17]:= |
| Out[17]= | ![]() |
also TEX formulae:
| In[18]:= |
Explain a technical concept using an article introduction:
| In[19]:= |
| Out[19]= | ![]() |
Let us visualize all authors with more than 7 papers in primary category "cs.NA":
| In[20]:= |
| Out[20]= | ![]() |
Let us pick a random author among them and use LLM functionality to explain his overall work:
| In[21]:= |
| Out[21]= | ![]() |
Wolfram Language Version 14