Wolfram Language Paclet Repository
Community-contributed installable additions to the Wolfram Language
ArXivExplore helps the deep data analysis of all research articles on ArXiv
Contributed by: Daniele Gregori
ArXivExplore helps the deep data analysis of all 2.6M physics, math, cs, etc. articles on ArXiv, providing functionality for e.g. title/abstract word statistics; TeX source/formulae and citations dissection; NNs for classification or recommendation; LLM automated concept definitions and author reports.
To install this paclet in your Wolfram Language environment,
evaluate this code:
PacletInstall["DanieleGregori/ArXivExplore"]
To load the code after installation, evaluate this code:
Needs["DanieleGregori`ArXivExplore`"]
The first article ever on ArXiv:
In[1]:= |
Out[1]= |
In[2]:= |
Out[2]= |
In[3]:= |
Out[3]= |
A DateListPlot showing the trends in the most popular title words in theoretical physics category (hep-th):
In[4]:= |
Out[4]= |
All the 100 most common 2-neighbour title words on the whole ArXiv, ever:
In[5]:= |
Out[5]= |
Let us also show an author's citations graph, with the tooltip indicating the articles ids:
In[6]:= |
Out[6]= |
The dimensions whole ArXiv dataset (at the end of July 2024):
In[7]:= |
Out[7]= |
Let us create a super-database with all computer science "cs" type primary or cross-list categories:
In[8]:= |
and then let us visualize the most frequent and less frequent title words:
In[9]:= |
Out[9]= |
Let us compute the 4 most frequent categories:
In[10]:= |
Out[10]= |
with their meaning:
In[11]:= |
Out[11]= |
Using only titles and abstracts, we can train a NN to classify different categories:
In[12]:= |
In[13]:= |
Out[13]= |
In[14]:= |
Out[14]= |
Even with a basic 15 minutes training on laptop CPU, we obtain 95% accuracy:
In[15]:= |
Out[15]= |
In[16]:= |
Out[16]= |
We could even to classify authors within the same category, with ArXivClassifyAuthorNet.
Extracting TEX introduction:
In[17]:= |
Out[17]= |
also TEX formulae:
In[18]:= |
Explain a technical concept using an article introduction:
In[19]:= |
Out[19]= |
Let us visualize all authors with more than 7 papers in primary category "cs.NA":
In[20]:= |
Out[20]= |
Let us pick a random author among them and use LLM functionality to explain his overall work:
In[21]:= |
Out[21]= |
Wolfram Language Version 14