Wolfram Language Paclet Repository
Community-contributed installable additions to the Wolfram Language
Variables importance determination using classifiers
Contributed by: Anton Antonov
This paclet has functions for finding the importance of variables in datasets using classifiers.
To install this paclet in your Wolfram Language environment,
evaluate this code:
PacletInstall["AntonAntonov/VariableImportanceByClassifiers"]
To load the code after installation, evaluate this code:
Needs["AntonAntonov`VariableImportanceByClassifiers`"]
1. Load some data:
In[1]:= |
2. Variable names and unique class labels.:
In[2]:= |
Out[2]= |
In[3]:= |
Out[3]= |
3. Here is a data summary:
In[4]:= |
Out[4]= |
4. Make the classifier:
In[5]:= |
Out[5]= |
5. Obtain accuracies after shuffling:
In[6]:= |
Out[6]= |
6. Tabulate the results:
In[7]:= |
Out[7]= |
7. Further confirmation of the found variable importance can be done using the mosaic plots. We can see that female passengers are much more likely to survive and especially female passengers from first and second class:
In[8]:= |
Out[9]= |
5a. In order to use F-scores instead of overall accuracy the desired class labels are specified with the option "ClassLabels":
In[10]:= |
Out[10]= |
5b. Here is another example that uses the class label with the smallest F-score. (Probably the most important since it is most misclassified):
In[11]:= |
Out[11]= |
5c. It is good idea to verify that we get the same results using different classifiers. Below is given code that computes the shuffled accuracies and returns the relative damage scores for a set of methods of Classify:
In[12]:= |
Out[13]= |
Make a classifier ensemble:
In[14]:= |
Out[15]= |
Obtain accuracies after shuffling:
In[16]:= |
Out[16]= |