Function Repository Resource:

MultinormalKLDivergence

Source Notebook

Compute the Kullback–Leibler divergence between two multinormal distributions

Contributed by: Seth J. Chandler

ResourceFunction["MultinormalKLDivergence"][dist1,dist2]

computes the Kullback–Leibler divergence between two multinormal distributions.

Details

MultinormalKLDivergence will work with a univariate NormalDistribution or a bivariate BinormalDistribution. It treats them as special cases of the more general MultinormalDistribution.

Examples

Basic Examples (2) 

Compute the Kullback–Leibler divergence for two multinormal distributions:

In[1]:=
ResourceFunction["MultinormalKLDivergence"][
 MultinormalDistribution[{0, 0}, {{1, 1/2}, {1/2, 1/3}}], MultinormalDistribution[{1, 1}, {{1, 1}, {1, 2}}]]
Out[1]=

Plot how the Kullback–Leibler divergence varies as the center of the second multinormal distribution moves on a line extended from the origin:

In[2]:=
Plot[ResourceFunction["MultinormalKLDivergence"][
  MultinormalDistribution[{0, 0}, {{1, 0}, {0, 1}}], MultinormalDistribution[{z, z}, {{1, 0}, {0, 1}}]], {z, -3, 3}, AxesLabel -> {"z", "KL Divergence"}]
Out[2]=

Scope (3) 

Compute the Kullback–Leibler divergence of two normal distributions:

In[3]:=
ResourceFunction["MultinormalKLDivergence"][NormalDistribution[], NormalDistribution[3, 4]]
Out[3]=

Compute the Kullback–Leibler divergence of two binormal distributions:

In[4]:=
ResourceFunction["MultinormalKLDivergence"][BinormalDistribution[3/5],
  BinormalDistribution[{1, 2}, {3, 4}, -1/7]]
Out[4]=

Compute the Kullback–Leibler divergence symbolically:

In[5]:=
ResourceFunction["MultinormalKLDivergence"][
  MultinormalDistribution[{m11, m12}, {{c111, c112}, {c121, c122}}], MultinormalDistribution[{m21, m22}, {{c211, c212}, {c221, c222}}]] // Simplify
Out[5]=

Applications (3) 

Compare the Kullback–Leibler divergence of two orthogonal multinormal distributions as parameters of their covariance matrices change:

In[6]:=
Plot3D[ResourceFunction["MultinormalKLDivergence"][
  MultinormalDistribution[{0, 0}, {{\[Sigma]1, 0}, {0, \[Sigma]1}}], MultinormalDistribution[{1, 1}, {{\[Sigma]2, 0}, {0, \[Sigma]2}}]], {\[Sigma]1, 0.1, 5}, {\[Sigma]2, 0.1, 5}, AxesLabel -> {"\[Sigma]1", "\[Sigma]2", "KL Divergence"}, PlotRange -> All, ScalingFunctions -> "Log", ColorFunction -> "AlpineColors"]
Out[6]=

Use a Manipulate to compare two multinormal distributions and show their Kullback–Leibler divergence:

In[7]:=
klplot[mean_, c11_, c12_, c22_] := Module[{cv2$ = {{c11, c12}, {c12, c22}}, dist1$ = MultinormalDistribution[{0, 0}, {{1, 0}, {0, 1}}], dist2$,
    pd2$}, pd2$ = PositiveDefiniteMatrixQ[cv2$]; If[pd2$, dist2$ = MultinormalDistribution[mean, cv2$]; Plot3D[{PDF[dist1$, {x, y}], PDF[dist2$, {x, y}]}, {x, -8, 8}, {y, -8, 8}, ImageSize -> 300, ImagePadding -> 40, PlotRange -> {0.0001, All}, MaxRecursion -> 5, MeshFunctions -> {#3 &}, PlotLabel -> ResourceFunction["MultinormalKLDivergence"][dist1$, dist2$]], Plot3D[{PDF[dist1$, {x, y}], PDF[dist2$, {x, y}]}, {x, -8, 8}, {y, -8, 8}, ImageSize -> 300, ImagePadding -> 40, PlotRange -> {0.0001, All}, MaxRecursion -> 5, MeshFunctions -> {#3 &}, PlotLabel -> "Covariance matrix of second distribution is not positive definite"]]]
In[8]:=
Manipulate[
 klplot[mean, c11, c12, c22], {{mean, {1, 1}}, {-4, -4}, {4, 4}}, {{c11, 1}, 0.1, 3, 0.1, Appearance -> "Labeled"}, {{c12, 0}, -3, 3, 0.1, Appearance -> "Labeled"}, {{c22, 1}, 0.1, 3, 0.1, Appearance -> "Labeled"}, ControlPlacement -> Left]
Out[8]=

Compute the symbolic Kullback–Leibler divergence for a multinormal distribution with itself, but with the second distribution undergoing a rotation of its covariance matrix:

In[9]:=
ResourceFunction["MultinormalKLDivergence"][
  MultinormalDistribution[{0, 0}, {{1, 0}, {0, 1}}], MultinormalDistribution[{0, 0}, RotationMatrix[\[Theta]]]] // Simplify
Out[9]=

The same computation, but with a three dimensional multinormal distribution:

In[10]:=
ResourceFunction["MultinormalKLDivergence"][
  MultinormalDistribution[{0, 0, 0}, {{1, 0, 0}, {0, 1, 0}, {0, 0, 1}}], MultinormalDistribution[{0, 0, 0}, RotationMatrix[\[Theta], {x, y, z}] //. Conjugate[x_] :> x // Simplify]] // FullSimplify[#, {x > 0, y > 0, z > 0}] &
Out[10]=

Properties and Relations (2) 

One can use the more general resource function KullbackLeiblerDivergence, but MultinormalKLDivergence is somewhat faster for low-dimensional distributions:

In[11]:=
ResourceFunction[
ResourceObject[<|"Name" -> "KullbackLeiblerDivergence", "ShortName" -> "KullbackLeiblerDivergence", "UUID" -> "24ce1dd0-828d-40e8-bedf-8c3248d44394", "ResourceType" -> "Function", "Version" -> "3.0.0", "Description" -> "Calculate the Kullback\[Dash]Leibler divergence between two distributions", "RepositoryLocation" -> URL[
      "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$19c5a7ebd7e944a0bb909ee7c2e26cec`KullbackLeiblerDivergence", "FunctionLocation" -> CloudObject[
      "https://www.wolframcloud.com/obj/97b3571f-6f27-404e-80af-e003cbc47fea"]|>, ResourceSystemBase -> Automatic]][
  MultinormalDistribution[{0, 0}, {{1, 0}, {0, 1}}], MultinormalDistribution[{1, 1}, {{1, 0}, {0, 1}}]] // RepeatedTiming
Out[11]=
In[12]:=
ResourceFunction["MultinormalKLDivergence"][
  MultinormalDistribution[{0, 0}, {{1, 0}, {0, 1}}], MultinormalDistribution[{1, 1}, {{1, 0}, {0, 1}}]] // RepeatedTiming
Out[12]=

MultinormalKLDivergence is much faster for higher-dimensional distributions:

In[13]:=
n = 20;
cov1 = With[{m = RandomVariate[NormalDistribution[], {n, n}]}, m . Transpose[m]];
cov2 = With[{m = RandomVariate[NormalDistribution[], {n, n}]}, m . Transpose[m]];
dist1 = MultinormalDistribution[
   RandomVariate[NormalDistribution[], Length[cov1]], cov1];
dist2 = MultinormalDistribution[
   RandomVariate[NormalDistribution[], Length[cov2]], cov2];
In[14]:=
ResourceFunction[
ResourceObject[<|"Name" -> "KullbackLeiblerDivergence", "ShortName" -> "KullbackLeiblerDivergence", "UUID" -> "24ce1dd0-828d-40e8-bedf-8c3248d44394", "ResourceType" -> "Function", "Version" -> "3.0.0", "Description" -> "Calculate the Kullback\[Dash]Leibler divergence between two distributions", "RepositoryLocation" -> URL[
      "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$19c5a7ebd7e944a0bb909ee7c2e26cec`KullbackLeiblerDivergence", "FunctionLocation" -> CloudObject[
      "https://www.wolframcloud.com/obj/97b3571f-6f27-404e-80af-e003cbc47fea"]|>, ResourceSystemBase -> Automatic]][dist1, dist2] // RepeatedTiming
Out[14]=
In[15]:=
ResourceFunction["MultinormalKLDivergence"][dist1, dist2] // RepeatedTiming
Out[15]=

Possible Issues (2) 

If the two multinormal distributions supplied have different dimensions, a Failure object will result:

In[16]:=
ResourceFunction["MultinormalKLDivergence"][
 MultinormalDistribution[{0, 0}, DiagonalMatrix[{1, 1}]], MultinormalDistribution[{0, 0, 0}, DiagonalMatrix[{1, 1, 1}]]]
Out[16]=

If one or more of the distributions is not a multivariate normal distribution, a Failure object will result:

In[17]:=
ResourceFunction["MultinormalKLDivergence"][PoissonDistribution[1/2], MultinormalDistribution[{1, 2}, {{3, 1}, {1, 4}}]]
Out[17]=

Neat Examples (5) 

Use the Kullback–Leibler divergence to help regularization in a variational autoencoder. Create an expression for the Kullback–Leibler divergence between a multinormal distribution with zero mean and a unit, diagonal covariance matrix:

In[18]:=
Quiet@ResourceFunction["MultinormalKLDivergence"][
   MultinormalDistribution[{0, 0}, DiagonalMatrix[{1, 1}]], MultinormalDistribution[{#1[[1]], #2[[2]]}, DiagonalMatrix[Exp@{#2[[1]], #2[[2]]}]]] // InputForm
Out[18]=

Use the resulting expression to create a function layer that can be used for regularization:

In[19]:=
regularizer = FunctionLayer[
  Apply[1/2 (-2 + Exp[#2[[1]]] + Exp[#2[[2]]] - #2[[1]] - #2[[2]] + #1[[1]]^2 + #1[[2]]^2) &]]
Out[19]=

Build a variational autoencoder architecture that uses the Kullback–Leibler divergence to prevent overfitting and to encourage the model to learn a compact and meaningful representation of the data:

In[20]:=
Information[
 net = NetGraph[
   Association["linear1" -> LinearLayer[10], "nonlinear1" -> Tanh, "mean" -> LinearLayer[2, "Input" -> 10, "Biases" -> None], "logvar" -> LinearLayer[2, "Input" -> 10, "Biases" -> None], "exp" -> ElementwiseLayer[Exp], "z" -> RandomArrayLayer[NormalDistribution[0, 1] &, "Output" -> 2], "randomValue" -> FunctionLayer[Apply[#1 + #2*#3 &]], "reconstructor" -> LinearLayer[4], "reconstructionLoss" -> MeanSquaredLossLayer[], "regularizationLoss" -> regularizer, "weightedTotalLoss" -> FunctionLayer[Apply[#1 + 0.025*#2 &]]], {NetPort["x"] -> "linear1" -> "nonlinear1" -> {"mean", "logvar"}, "logvar" -> "exp", {"mean", "exp", "z"} -> "randomValue" -> "reconstructor", {"reconstructor", NetPort["x"]} -> "reconstructionLoss", {"mean", "logvar"} -> "regularizationLoss", {"reconstructionLoss", "regularizationLoss"} -> "weightedTotalLoss" -> NetPort["loss"],
     "reconstructor" -> NetPort["reconstruction"]}, "x" -> 4, "reconstruction" -> 4], "SummaryGraphic"]
Out[20]=

Create structured data that one wants the variational autoencoder to capture:

In[21]:=
self[edgeList_List] := First[Last /@ edgeList];
g1 = Graph[{1, 2, 3, 4}, {1 -> 2, 1 -> 3, 2 -> 4, 3 -> 4}, VertexLabels -> Placed[Automatic, Center], VertexSize -> Large];
SeedRandom[20230211];
initialization1 = 1 -> RandomVariate[DiscreteUniformDistribution[{1, 10}], 100];
functions1 = {x, y, z} |-> Switch[self[z], 2, MapThread[
     v |-> RandomVariate[DiscreteUniformDistribution[{v, v + 3}]], x],
     3, MapThread[v |-> RandomVariate[NormalDistribution[v, 1]], x], 4, MapThread[{v1, v2} |-> RandomVariate[NormalDistribution[2 v1 - v2, 1]], x]];
In[22]:=
(x = Thread[Values[ResourceFunction[
ResourceObject[<|"Name" -> "DirectedAcyclicEvaluate", "ShortName" -> "DirectedAcyclicEvaluate", "UUID" -> "6a982fd7-4993-4495-aced-a4d53a852dbc", "ResourceType" -> "Function", "Version" -> "1.1.0", "Description" -> "Evaluate functions locally over any directed acyclic graph", "RepositoryLocation" -> URL[
           "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"],
           "SymbolName" -> "FunctionRepository`$870b98ec382e4ca2b7fe695c0e6142f1`DirectedAcyclicEvaluate", "FunctionLocation" -> CloudObject[
           "https://www.wolframcloud.com/obj/37c15cca-bccf-4865-9565-5921a229f8d6"]|>, ResourceSystemBase -> Automatic]][g1, initialization1, functions1][["VertexWeights"]]]]) // Short
Out[22]=
In[23]:=
trained = NetTrain[net, Association["x" -> x], "LossFunction" -> "loss"]
Out[23]=

Use a RadialAxisPlot to show the smooth and compact reconstruction of the original vector from a latent space:

In[24]:=
f = NetExtract[trained, "reconstructor"]
Out[24]=
In[25]:=
Manipulate[
 RadialAxisPlot[f[x], PlotRange -> {-5, 20}], {{x, {0, 0}}, {-3, -3}, {3, 3}, {0.1, 0.1}},
  ControlPlacement -> Left]
Out[25]=

Publisher

Seth J. Chandler

Version History

  • 1.0.0 – 22 February 2023

Source Metadata

Related Resources

License Information