Function Repository Resource:

ConfusionMatrixTrajectoryFunction

Construct a function that, when given a threshold probability value, produces a confusion matrix

Contributed by: Seth J. Chandler

ResourceFunction["ConfusionMatrixTrajectoryFunction"][cmo,class]

takes a ClassifierMeasurementsObject cmo and class, a designation of the positive class, and produces a function that, when applied to a threshold probability value, will generate a confusion matrix.

ResourceFunction["ConfusionMatrixTrajectoryFunction"][assoc,class]

takes an Association assoc derived from the application of a classifier to data.

Details and Options

The output from ResourceFunction["ConfusionMatrixTrajectoryFunction"] will be a Function whose body has a Head of Through; the functions that Through will apply will consist of right compositions of a NearestFunction and two other functions.

The main purpose of this function is speed: it determines the trajectory of confusion matrices and associated statistics (such as false positive rate and true positive rate) without having to call ClassifierMeasurements repeatedly with varying UtilityFunction options.

Consistent with the way that ClassifierMeasurements works in creating receiver operating curves when there are more than two output classes, the confusion matrices produced by the function are one versus all. Even if originally the classifier was capable of outputting a large number of classes, all classes except the designated positive class are collapsed into a single class. Without this collapse, the receiver operating curve would become a receiver operating manifold in potentially many dimensions and would be difficult to visualize.

There is a correspondence between the threshold value and the disutilities placed on false positive and false negative responses from the classifier. A threshold of t corresponds to a relationship between false positive disutilities (fpc) and false negative disutilities (fnc) such that

In ResourceFunction["ConfusionMatrixTrajectoryFunction"][assoc,class], the Association assoc should have the keys "LogProbabilities", "TestSet" and "ExtendedClasses". See the example in the Scope section for a typical use case.

Examples

Basic Examples (2)

Get some simple training and test data:

In[1]:=

trainingData = {-0.7 -> False, 0.4 -> True, 0.6 -> True, 0.8 -> True, 0.6 -> True, -0.6 -> False, 1. -> True, 0.8 -> True, 0.9 -> True, -0.5 -> False, 0.1 -> False, 1. -> True, -0.2 -> False, -0.5 -> True, 0.7 -> True, -0.5 -> True,
0.3 -> True, -1. -> False, -0.9 -> True, -0.5 -> False};

In[2]:=

testData = {-0.3 -> False, -0.2 -> True, -0.2 -> False, 0.8 -> True, -0.2 -> False, -0.5 -> False, -0.8 -> False, 0.4 -> False, 1. -> True, -0.5 -> False, 0.9 -> False, 0.1 -> True,
0. -> False, -0.4 -> False, 0.6 -> False, -0.2 -> False, 0.5 -> True, 1. -> True, -0.7 -> False, 0.6 -> False};

Run a classifier and create a ClassifierMeasurementsObject:

In[3]:=

Out[3]=

Create the ConfusionMatrixTrajectoryFunction:

In[4]:=

Out[4]=

Get the training and test sets of the Fisher iris data:

In[5]:=

In[6]:=

Create a classifier on the training set:

In[7]:=

Out[7]=

Construct a ClassifierMeasurementsObject from the classifier and the test set:

In[8]:=

Out[8]=

Construct a function that will, for any given probability threshold, construct the confusion matrix when "setosa" is designated as the positive class:

In[9]:=

Out[9]=

Apply that function to a threshold of 0.62 to get a confusion matrix:

In[10]:=

Out[10]=

Create a plot showing a trajectory of the false positive rate and true positive rate of the classifier for different thresholds:

In[11]:=

$falsePositiveRate[cm : {{tn_, fp_}, {fn_, tp_}}] := fp/(tn + fp); truePositiveRate[cm : {{tn_, fp_}, {fn_, tp_}}] := tp/(fn + tp)$

In[12]:=

Out[12]=

Create a confusion matrix trajectory function as before, but let "virginica" be designated as the positive class:

In[13]:=

Out[13]=

Get the confusion matrix if the threshold is 0.62:

In[14]:=

Out[14]=

Scope (2)

One does not need a ClassifierMeasurementsObject to create the requisite function; the function will work with an Association that has the requisite key-value pairs as shown in the following example:

In[15]:=

cmfSimple = ResourceFunction[
"ConfusionMatrixTrajectoryFunction"][<|"LogProbabilities" -> {{-Log[
7], -Log[7/4], -Log[7/2]}, {-Log[12/7], -Log[4], -Log[
6]}, {-Log[7/3], -Log[7/2], -Log[7/2]}, {-Log[9/5], -Log[
3], -Log[9]}, {-Log[19/6], -Log[19/5], -Log[19/8]}, {-Log[17/
3], -Log[17/4], -Log[17/10]}}, "TestSet" -> <|"Output" -> {"virginica", "versicolor", "setosa", "virginica", "setosa", "versicolor"}|>, "ExtendedClasses" -> {"setosa", "versicolor", "virginica"}|>, "versicolor"]

Out[15]=

One can use the function on a neural network that has been converted to a classifier. Create a trained network following the method set forth here:

In[16]:=

trainingData = ExampleData[{"MachineLearning", "FisherIris"}, "TrainingData"];
testData = ExampleData[{"MachineLearning", "FisherIris"}, "TestData"];
labels = Union[Values[trainingData]];
net = NetChain[{3, SoftmaxLayer[]}, "Input" -> 4, "Output" -> NetDecoder[{"Class", labels}]];
results = NetTrain[net, trainingData, All, MaxTrainingRounds -> 100];
trained = results["TrainedNet"];
clNet = Classify[trained]

Out[22]=

Create a ClassifierMeasurementsObject using testData:

In[23]:=

Out[23]=

Create the confusion matrix trajectory function with "versicolor" as the positive class:

In[24]:=

Out[24]=

Show the three-dimensional ROC (receiver operating characteristic) curve for the network:

In[25]:=

Out[25]=

Applications (2)

Create functions that compute the false positive rate and true positive rate from a confusion matrix:

In[26]:=

$falsePositiveRate[cm : {{tn_, fp_}, {fn_, tp_}}] := fp/(tn + fp); truePositiveRate[cm : {{tn_, fp_}, {fn_, tp_}}] := tp/(fn + tp)$

Construct a three-dimensional receiver operating curve that shows the false positive rate and true positive rate for "good" wine as a function of the threshold probability required before the classifier calls a wine "good":

In[27]:=

binWine = Query[All, ReplacePart[#, 2 -> Which[#[[2]] >= 7, "Good", #[[2]] < 5, "Bad", True, "Mediocre"]] &];

In[28]:=

Out[28]=

In[29]:=

Out[29]=

In[30]:=

Out[30]=

Construct a ClassifierMeasurementsObject from the classifier and the test set:

In[31]:=

Out[31]=

Create the confusion matrix function:

In[32]:=

Out[32]=

Create a three-dimensional plot:

In[33]:=

ParametricPlot3D[
Append[Through[{falsePositiveRate, truePositiveRate}[cmfWine[t]]], t], {t, 0, 1}, AxesLabel -> {"FPR", "TPR", "threshold"}, BoxRatios -> {1, 1, 1}, PlotRange -> {{0, 1}, {0, 1}, {0, 1}}]

Out[33]=

View the graphic from the top as a traditional receiver operating curve:

In[34]:=

Out[34]=

The function works well in conjunction with the CrossValidateModel resource function and permits one to smooth the confusion matrix trajectory. Start this process by creating a query that will convert continuous wine quality values into discrete bins of Good, Bad and Mediocre:

In[35]:=

Create wine data suitable for Classify:

In[36]:=

$(wineData = binWine[ExampleData[{"MachineLearning", "WineQuality"}, "Data"]]) // ResourceFunction[ ResourceObject[ Association[ "Name" -> "Terse", "ShortName" -> "Terse", "UUID" -> "6809487c-44ed-4a55-a610-ab706ebb8661", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "An operator form of Short", "RepositoryLocation" -> URL[ "https://www.wolframcloud.com/objects/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$\ 369a78f89aa2413eb5b19a962ce89cd7`Terse", "FunctionLocation" -> CloudObject[ "https://www.wolframcloud.com/obj/c1820918-b759-4685-b9b8-\ c971a81216b5"]], ResourceSystemBase -> Automatic]][6]$

Out[36]=

Run the CrossValidateModel resource function to create five pairs of ClassifierFunction and ClassifierMeasurementsObject for each of the five folds:

In[37]:=

$crossValidatedWine = ResourceFunction[ ResourceObject[ Association[ "Name" -> "CrossValidateModel", "ShortName" -> "CrossValidateModel", "UUID" -> "27b02800-4e9d-4ff6-8a0e-46bbb178d668", "ResourceType" -> "Function", "Version" -> "2.0.0", "Description" -> "Check the quality of a data fitting model by \ splitting the data into test and validation sets multiple times", "RepositoryLocation" -> URL[ "https://www.wolframcloud.com/objects/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$\ 430508cd20274821afc9e3afc9a5464c`CrossValidateModel", "FunctionLocation" -> CloudObject[ "https://www.wolframcloud.com/obj/f2b4c1a2-a8bc-4c9b-a9c6-\ fac5f23b556e"]]]][RandomSample@wineData, Classify[#, TimeGoal -> 5] &]$

Out[37]=

Create confusion matrix trajectory functions for each of the five folds:

In[38]:=

(wineCMTrajectories = Query[All, "ValidationResult" /* (ResourceFunction[
"ConfusionMatrixTrajectoryFunction"][#, "Good"] &)][
crossValidatedWine]) // Shallow[#, 5] &

Out[38]=

Plot the accuracy of the classifier over the threshold levels for each of the folds:

In[39]:=

In[40]:=

wineAccuracyPlots = MapIndexed[
Plot[accuracy[#[t]], {t, 0, 1}, PlotRange -> {0.5, 0.88}, PlotStyle -> {Opacity[0.5], ColorData[1][#2[[1]]]}] &, wineCMTrajectories]

Out[40]=

Create a composite view of the five accuracy plots:

In[41]:=

Out[41]=

Now create a function that computes confusion matrices for all five folds:

In[42]:=

Out[42]=

Plot the blended accuracy curve:

In[43]:=

blendedWineAccuracyPlot = Plot[With[{cm = Mean[blendedWineTrajectory[t]]}, accuracy[cm]], {t, 0, 1}, PlotRange -> {0.5, 0.88}, PlotStyle -> Black, AspectRatio -> 1, Frame -> True, FrameLabel -> {"threshold", "accuracy"}]

Out[43]=

Show the blended accuracy plot and the components creating it:

In[44]:=

Out[44]=

Neat Examples (4)

Compare how a classifier performs on wines with low pH versus wines with high pH:

In[45]:=

$confusionMatrixFunctionsBypH = KeyTake[{"lowpH", "highpH"}][ResourceFunction[ ResourceObject[ Association[ "Name" -> "MapReduceOperator", "ShortName" -> "MapReduceOperator", "UUID" -> "856f4937-9a4c-44a9-88ae-cfc2efd4698f", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Like an operator form of GroupBy, but where \ one also specifies a reducer function to be applied", "RepositoryLocation" -> URL[ "https://www.wolframcloud.com/objects/resourcesystem/api/1.0"]\ , "SymbolName" -> "FunctionRepository`$\ ad7fe533436b4f8294edfa758a34ac26`MapReduceOperator", "FunctionLocation" -> CloudObject[ "https://www.wolframcloud.com/obj/6d981522-1eb3-4b54-84f6-\ 55667fb2e236"]], ResourceSystemBase -> Automatic]][ If[#[[1, 9]] < 3.18, "lowpH", "highpH"] &, (ClassifierMeasurements[ cWine, #] &) /* (ResourceFunction[ "ConfusionMatrixTrajectoryFunction"][#, "Good"] &)][ wineTestData]]$

Out[45]=

Show the three-dimensional ROC curves and draw points where the threshold is 0.5:

In[46]:=

With[{low = confusionMatrixFunctionsBypH["lowpH"], high = confusionMatrixFunctionsBypH["highpH"]},
Show[ParametricPlot3D[{Append[
Through[{falsePositiveRate, truePositiveRate}[low[t]]], t], Append[Through[{falsePositiveRate, truePositiveRate}[high[t]]], t]}, {t, 0, 1}, AxesLabel -> {"FPR", "TPR", "threshold"}, PlotLegends -> {"low pH", "high pH"}], Graphics3D[{PointSize[0.03], RGBColor[0.368417, 0.506779, 0.709798],
Point[Append[
Through[{falsePositiveRate, truePositiveRate}[low[0.5]]], 0.5]],
RGBColor[0.880722, 0.611041, 0.142051], Point[Append[
Through[{falsePositiveRate, truePositiveRate}[high[0.5]]], 0.5]]}]]
]

Out[46]=

Show the graphic from the top and notice that even though the ROC curves look roughly the same in two dimensions, the false positive and true positive rates for the two different pH groupings of wine differ significantly at a threshold of 0.5:

In[47]:=

Out[47]=

Create the same plot but make the hue of each plot line depend on the accuracy of the classifier at that threshold:

In[48]:=

$With[{low = confusionMatrixFunctionsBypH["lowpH"], high = confusionMatrixFunctionsBypH["highpH"]}, Module[{lowplot = ParametricPlot3D[{Append[ Through[{falsePositiveRate, truePositiveRate}[low[t]]], t]}, {t, 0.1, 0.9}, AxesLabel -> {"FPR", "TPR", "threshold"}, ColorFunctionScaling -> True, ColorFunction -> (With[{i = With[{m = low[#3]}, Total[m[[2]]]/Total[m, 2]]}, Lighter@ColorData[ "TemperatureMap"][((-1 + i) (-1 + #1) + (-2 + i + 2 #1) #2)/Max[0.01, #1 - #2]]] &)], highplot = ParametricPlot3D[{Append[ Through[{falsePositiveRate, truePositiveRate}[high[t]]], t]}, {t, 0.1, 0.9}, AxesLabel -> {"FPR", "TPR", "threshold"}, ColorFunctionScaling -> True, ColorFunction -> (With[{i = With[{m = high[#3]}, Total[m[[2]]]/Total[m, 2]]}, Darker@ColorData[ "TemperatureMap"][((-1 + i) (-1 + #1) + (-2 + i + 2 #1) #2)/Max[0.01, #1 - #2]]] &)]}, Labeled[Show[lowplot, highplot], Style["lighter coloration for the low pH wines;\ndarker coloration \ for the high pH wines", "Text"]] ] ]$

Out[48]=

Publisher

Seth J. Chandler

Version History

1.0.0 – 27 October 2020

Related Resources

Author Notes

The motivation for this function is efficient pursuit of the issue shown in Neat Examples, which is understanding and examining the difference in the performance of a classifier on various subgroups of a dataset. It is thus useful in the area of "Fair Machine Learning" or algorithmic bias.

License Information

This work is licensed under a Creative Commons Attribution 4.0 International License