Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

ClassifierEnsembles

Guides

  • Ensembles of classifiers

Tech Notes

  • ROC for classifier ensembles, bootstrapping, damaging, and interpolation

Symbols

  • ClassifyByThreshold
  • EnsembleClassifierConfusionMatrix
  • EnsembleClassifierMeasurements
  • EnsembleClassifier
  • EnsembleClassifierProbabilities
  • EnsembleClassifierROCData
  • EnsembleClassifierROCPlots
  • EnsembleClassifierVotes
  • EnsembleClassifyByThreshold
  • EnsembleClassify
  • ResamplingEnsembleClassifier
ROC for classifier ensembles, bootstrapping, damaging, and interpolation
Introduction
Classifier ensembles by bootstrapping
Used paclets
Damaging data
Data used
Classifier interpolation
Classifier ensembles
References
Introduction
The main goals of this document are:
i) to demonstrate how to create versions and combinations of classifiers utilizing different perspectives,
ii) to apply the Receiver Operating Characteristic (ROC) technique into evaluating the created classifiers (see [2,3]) and
iii) to illustrate the use of the Mathematica packages [5,6].
The concrete steps taken are the following:
1. Obtain data: Mathematica built-in or external. Do some rudimentary analysis.
2. Create an ensemble of classifiers and compare its performance to the individual classifiers in the ensemble.
3. Produce classifier versions with from changed data in order to explore the effect of records outliers.
4. Make a bootstrapping classifier ensemble and evaluate and compare its performance.
5. Systematically diminish the training data and evaluate the results with ROC.
6. Show how to do classifier interpolation utilizing ROC.
In the steps above we skip the necessary preliminary data analysis. For the datasets we use in this document that analysis has been done elsewhere. (See [,,,].) Nevertheless, since ROC is mostly used for binary classifiers we want to analyze the class labels distributions in the datasets in order to designate which class labels are “positive” and which are “negative.”

ROC plots evaluation (in brief)

Assume we are given a binary classifier with the class labels
P
and
N
(for “positive” and “negative” respectively).
Consider the following measures True Positive Rate (TPR):
TPR:=
correctlyclassifiedpositives
totalpositives
.
and False Positive Rate (FPR):
FPR:=
incorrectlyclassifiednegatives
totalnegatives
.
Assume that we can change the classifier results with a parameter
θ
and produce a plot like this one:
For each parameter value
θ
i
the point
{TPR(
θ
i
),FPR(
θ
i
)}
is plotted; points corresponding to consecutive
θ
i
’s are connected with a line. We call the obtained curve the ROC curve for the classifier in consideration. The ROC curve resides in the ROC space as defined by the functions FPR and
TPR
corresponding respectively to the
x
-axis and the
y
-axis.
The ideal classifier would have its ROC curve comprised of a line connecting
{0,0}
to
{0,1}
and a line connecting
{0,1}
to
{1,1}
.
Given a classifier the ROC point closest to
{0,1}
, generally, would be considered to be the best point.

The wider perspective

This document started as being a part of a conference presentation about illustrating the cultural differences between Statistics and Machine learning (for
Wolfram Technology Conference 2016
). Its exposition become both deeper and wider than expected. Here are the alternative, original goals of the document:
i) to demonstrate how using ROC a researcher can explore classifiers performance without intimate knowledge of the classifiers` mechanisms, and
ii) to provide concrete examples of the typical investigation approaches employed by machine learning researchers.
To make those points clearer and more memorable we are going to assume that exposition is a result of the research actions of a certain protagonist with a suitably selected character.
A by-product of the exposition is that it illustrates the following lessons from machine learning practices. (See [1].)
1. For a given classification task there often are multiple competing models.
2. The outcomes of the good machine learning algorithms might be fairly complex. I.e. there are no simple interpretations when really good results are obtained.
3. Having high dimensional data can be very useful.
In [1] these three points and discussed under the names “Rashomon”, “Occam”, and “Bellman”. To quote:
"Rashomon: the multiplicity of good models;
Occam: the conflict between simplicity and accuracy;
​ Bellman: dimensionality -- curse or blessing."

The protagonist

Our protagonist is a “Simple Nuclear Physicist” (SNP) -- someone who is accustomed to obtaining a lot of data that has to be analyzed and mined sometimes very deeply, rigorously, and from a lot of angles, for different hypotheses. SNP is fairly adept in programming and critical thinking, but he does not have or care about deep knowledge of statistics methods or machine learning algorithms. SNP is willing and capable to use software libraries that provide algorithms for statistical methods and machine learning.
SNP is capable of coming up with ROC if he is not aware of it already. ROC is very similar to the so called phase space diagrams physicists do.
Used paclets
These commands load the used Wolfram Language (WL) paclets [4, 5, 6]:
In[94]:=
Needs["AntonAntonov`DataReshapers`"];​​Needs["AntonAntonov`ROCFunctions`"];​​Needs["AntonAntonov`ClassifierEnsembles`"];
Data used

The Titanic dataset

The commands of this section ingest the Titanic data suitable for Machine Learning (ML) classification workflows.
Get training Titanic data
In[97]:=
data=ExampleData[{"MachineLearning","Titanic"},"TrainingData"];​​columnNames=(Flatten@*List)@@ExampleData[{"MachineLearning","Titanic"},"VariableDescriptions"];​​data=((Flatten@*List)@@@data)〚All,{1,2,3,-1}〛;​​trainingData=DeleteCases[data,{___,_Missing,___}];​​Dimensions[trainingData]
Out[101]=
{732,4}
Show summary
In[102]:=
RecordsSummary[trainingData,columnNames]
Out[102]=

1 passenger class
3rd
354
1st
197
2nd
181
,
2 passenger age
Min
0.3333
1st Qu
21.
Median
27.
Mean
29.4164
3rd Qu
38.
Max
80.
,
3 passenger sex
male
460
female
272
,
4 passenger survival
died
427
survived
305

Get testing Titanic data
In[103]:=
data=ExampleData[{"MachineLearning","Titanic"},"TestData"];​​data=((Flatten@*List)@@@data)〚All,{1,2,3,-1}〛;​​testData=DeleteCases[data,{___,_Missing,___}];​​Dimensions[testData]
Out[106]=
{314,4}
Show summary
In[107]:=
RecordsSummary[testData,columnNames]
Out[107]=

1 passenger class
3rd
147
1st
87
2nd
80
,
2 passenger age
Min
0.1667
1st Qu
21.
Median
30.
Mean
30.9644
3rd Qu
40.
Max
76.
,
3 passenger sex
male
198
female
116
,
4 passenger survival
died
192
survived
122

Convert categorical labels to numerical
In[108]:=
nTrainingData=trainingData/.{"survived"1,"died"0,"1st"0,"2nd"1,"3rd"2,"male"0,"female"1};
Classifier ensembles
This command makes a classifier ensemble of two built-in classifiers “NearestNeighbors” and ”NeuralNetwork”:
In[109]:=
aCLs=
EnsembleClassifier
[{"NearestNeighbors","NeuralNetwork"},trainingData〚All,1;;-2〛trainingData〚All,-1〛]
Out[109]=
NearestNeighborsClassifierFunction
Input type: {Nominal,Numerical,Nominal}
Classes: died,survived
,NeuralNetworkClassifierFunction
Input type: {Nominal,Numerical,Nominal}
Classes: died,survived

A classifier ensemble of the package [6] is simply an association mapping classifier IDs to classifier functions.
The first argument given to EnsembleClassifier can be Automatic:
With Automatic the following built-in classifiers are used:

Classification with ensemble votes

Classification with the classifier ensemble can be done using the function EnsembleClassify. If the third argument of EnsembleClassify is “Votes” the result is the class label that appears the most in the ensemble results.
The following commands clarify the voting done in the command above.

Classification with ensemble averaged probabilities

If the third argument of EnsembleClassify is “ProbabilitiesMean” the result is the class label that has the highest mean probability in the ensemble results.
The following commands clarify the probability averaging utilized in the command above.

ROC for ensemble votes

The following code computes the ROC curve for a range of votes.

ROC for ensemble probabilities mean

If we want to compute ROC of a range of probability thresholds we EnsembleClassifyByThreshold with the fourth argument being “ProbabilitiesMean”.
The implementation of EnsembleClassifyByThreshold with “ProbabilitiesMean” relies on the ClassifierFunction signature:
ClassifierFunction[__][record_, "Probabilities"]
Here is the corresponding ROC plot:

Comparison of the ensemble classifier with the standard classifiers

This plot compares the ROC curve of the ensemble classifier with the ROC curves of the classifiers that comprise the ensemble.
Plot the ROC curves for each classifier
Classifier ensembles by bootstrapping
First, we are going to make a bootstrapping classifier ensemble using one of the Classify methods. Then we are going to make a more complicated bootstrapping classifier with six methods of Classify.

Bootstrapping ensemble with a single classification method

First we select a classification method and make a classifier with it.
Let us compare the ROC curves of the single classifier with the bootstrapping derived ensemble.
We can see that we get much better results with the bootstrapped ensemble.

Bootstrapping ensemble with multiple classifier methods

This code creates an classifier ensemble using the classifier methods corresponding to Automatic given as a first argument to EnsembleClassifier.
This code computes the ROC statistics with the obtained bootstrapping classifier ensemble:
Let us plot the ROC curve of the bootstrapping classifier ensemble (in blue) and the single classifier ROC curves (in gray):
Again we can see that the bootstrapping ensemble produced better ROC points than the single classifiers.
Damaging data
This section tries to explain why the bootstrapping with resampling to smaller sizes produces good results.
In short, the training data has outliers; if we remove small fraction of the training data we might get better results.
The procedure described in this section can be used in conjunction with the procedures described in the guide for importance of variables investigation [8].

Ordering function

Let us replace the categorical values with numerical in the training data. There are several ways to do it, here is a fairly straightforward one:

Decreasing proportions of females

First, let us find all indices corresponding to records about females.
The following code standardizes the training data corresponding to females, finds the mean record, computes distances from the mean record, and finally orders the female records indices according to their distances from the mean record.
The following plot shows the distances calculated above.
The following code removes from the training data the records corresponding to females according to the order computed above. The female records farthest from the mean female record are removed first.
The following graphics grid shows how the classification results are affected by the removing fractions of the female records from the training data. The results for none or small fractions of records removed are more blue.
We can see that removing the female records outliers has dramatic effect on the results by the classifiers “NearestNeighbors” and “NeuralNetwork”. Not so much on “LogisticRegression” and “NaiveBayes”.

Decreasing proportions of males

The code in this sub-section repeats the experiment described in the previous one males (instead of females).
Plot ROC points (not curves)
Classifier interpolation
FPR ((1 - 0.09) n) + TPR (0.09 n) == 0.2 n
which can be simplified to
FPR (1 - 0.09) + TPR 0.09 == 0.2

The two classifiers

Consider the following two classifiers.

Geometric computations in the ROC space

Here are the ROC space points corresponding to the two classifiers, cf1 and cf2:
Here is the breakdown of frequencies of the class labels:
Here using the points q1 and q2 of the constraint line we find the intersection point with the line connecting the ROC points of the classifiers:
Let us plot all geometric objects:

Classifier interpolation

Next we find the ratio of the distance from the intersection point q to the cf1 ROC point and the distance between the ROC points of cf1 and cf2.
The classifier interpolation is made by a weighted random selection based on that ratio (using RandomChoice):
We can run the process multiple times in order to convince ourselves that the interpolated classifier ROC point is very close to the constraint line most of the time.
References
[1] Leo Breiman, Statistical Modeling: The Two Cultures, (2001), Statistical Science, Vol. 16, No. 3, 199–231.

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com