Function Repository Resource:

MultinormalKLDivergence

Compute the Kullback–Leibler divergence between two multinormal distributions

Contributed by: Seth J. Chandler

ResourceFunction["MultinormalKLDivergence"][dist₁,dist₂]

computes the Kullback–Leibler divergence between two multinormal distributions.

Details

MultinormalKLDivergence will work with a univariate NormalDistribution or a bivariate BinormalDistribution. It treats them as special cases of the more general MultinormalDistribution.

Examples

Basic Examples (2)

Compute the Kullback–Leibler divergence for two multinormal distributions:

In[1]:=

ResourceFunction["MultinormalKLDivergence"][
MultinormalDistribution[{0, 0}, {{1, 1/2}, {1/2, 1/3}}], MultinormalDistribution[{1, 1}, {{1, 1}, {1, 2}}]]

Out[1]=

Plot how the Kullback–Leibler divergence varies as the center of the second multinormal distribution moves on a line extended from the origin:

In[2]:=

Out[2]=

Scope (3)

Compute the Kullback–Leibler divergence of two normal distributions:

In[3]:=

Out[3]=

Compute the Kullback–Leibler divergence of two binormal distributions:

In[4]:=

Out[4]=

Compute the Kullback–Leibler divergence symbolically:

In[5]:=

Out[5]=

Applications (3)

Compare the Kullback–Leibler divergence of two orthogonal multinormal distributions as parameters of their covariance matrices change:

In[6]:=

$Plot3D[ResourceFunction["MultinormalKLDivergence"][ MultinormalDistribution[{0, 0}, {{\[Sigma]1, 0}, {0, \[Sigma]1}}], MultinormalDistribution[{1, 1}, {{\[Sigma]2, 0}, {0, \[Sigma]2}}]], {\[Sigma]1, 0.1, 5}, {\[Sigma]2, 0.1, 5}, AxesLabel -> {"\[Sigma]1", "\[Sigma]2", "KL Divergence"}, PlotRange -> All, ScalingFunctions -> "Log", ColorFunction -> "AlpineColors"]$

Out[6]=

Use a Manipulate to compare two multinormal distributions and show their Kullback–Leibler divergence:

In[7]:=

$klplot[mean_, c11_, c12_, c22_] := Module[{cv2$ = {{c11, c12}, {c12, c22}}, dist1$ = MultinormalDistribution[{0, 0}, {{1, 0}, {0, 1}}], dist2$, pd2$}, pd2$ = PositiveDefiniteMatrixQ[cv2$]; If[pd2$, dist2$ = MultinormalDistribution[mean, cv2$]; Plot3D[{PDF[dist1$, {x, y}], PDF[dist2$, {x, y}]}, {x, -8, 8}, {y, -8, 8}, ImageSize -> 300, ImagePadding -> 40, PlotRange -> {0.0001, All}, MaxRecursion -> 5, MeshFunctions -> {#3 &}, PlotLabel -> ResourceFunction["MultinormalKLDivergence"][dist1$, dist2$]], Plot3D[{PDF[dist1$, {x, y}], PDF[dist2$, {x, y}]}, {x, -8, 8}, {y, -8, 8}, ImageSize -> 300, ImagePadding -> 40, PlotRange -> {0.0001, All}, MaxRecursion -> 5, MeshFunctions -> {#3 &}, PlotLabel -> "Covariance matrix of second distribution is not positive definite"]]]$

In[8]:=

Manipulate[
klplot[mean, c11, c12, c22], {{mean, {1, 1}}, {-4, -4}, {4, 4}}, {{c11, 1}, 0.1, 3, 0.1, Appearance -> "Labeled"}, {{c12, 0}, -3, 3, 0.1, Appearance -> "Labeled"}, {{c22, 1}, 0.1, 3, 0.1, Appearance -> "Labeled"}, ControlPlacement -> Left]

Out[8]=

Compute the symbolic Kullback–Leibler divergence for a multinormal distribution with itself, but with the second distribution undergoing a rotation of its covariance matrix:

In[9]:=

$ResourceFunction["MultinormalKLDivergence"][ MultinormalDistribution[{0, 0}, {{1, 0}, {0, 1}}], MultinormalDistribution[{0, 0}, RotationMatrix[\[Theta]]]] // Simplify$

Out[9]=

The same computation, but with a three dimensional multinormal distribution:

In[10]:=

$ResourceFunction["MultinormalKLDivergence"][ MultinormalDistribution[{0, 0, 0}, {{1, 0, 0}, {0, 1, 0}, {0, 0, 1}}], MultinormalDistribution[{0, 0, 0}, RotationMatrix[\[Theta], {x, y, z}] //. Conjugate[x_] :> x // Simplify]] // FullSimplify[#, {x > 0, y > 0, z > 0}] &$

Out[10]=

Properties and Relations (2)

One can use the more general resource function KullbackLeiblerDivergence, but MultinormalKLDivergence is somewhat faster for low-dimensional distributions:

In[11]:=

$ResourceFunction[ ResourceObject[<|"Name" -> "KullbackLeiblerDivergence", "ShortName" -> "KullbackLeiblerDivergence", "UUID" -> "24ce1dd0-828d-40e8-bedf-8c3248d44394", "ResourceType" -> "Function", "Version" -> "3.0.0", "Description" -> "Calculate the Kullback\[Dash]Leibler divergence between two distributions", "RepositoryLocation" -> URL[ "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$19c5a7ebd7e944a0bb909ee7c2e26cec`KullbackLeiblerDivergence", "FunctionLocation" -> CloudObject[ "https://www.wolframcloud.com/obj/97b3571f-6f27-404e-80af-e003cbc47fea"]|>, ResourceSystemBase -> Automatic]][ MultinormalDistribution[{0, 0}, {{1, 0}, {0, 1}}], MultinormalDistribution[{1, 1}, {{1, 0}, {0, 1}}]] // RepeatedTiming$

Out[11]=

In[12]:=

Out[12]=

MultinormalKLDivergence is much faster for higher-dimensional distributions:

In[13]:=

n = 20;
cov1 = With[{m = RandomVariate[NormalDistribution[], {n, n}]}, m . Transpose[m]];
cov2 = With[{m = RandomVariate[NormalDistribution[], {n, n}]}, m . Transpose[m]];
dist1 = MultinormalDistribution[
RandomVariate[NormalDistribution[], Length[cov1]], cov1];
dist2 = MultinormalDistribution[
RandomVariate[NormalDistribution[], Length[cov2]], cov2];

In[14]:=

Out[14]=

In[15]:=

Out[15]=

Possible Issues (2)

If the two multinormal distributions supplied have different dimensions, a Failure object will result:

In[16]:=

Out[16]=

If one or more of the distributions is not a multivariate normal distribution, a Failure object will result:

In[17]:=

Out[17]=

Neat Examples (5)

Use the Kullback–Leibler divergence to help regularization in a variational autoencoder. Create an expression for the Kullback–Leibler divergence between a multinormal distribution with zero mean and a unit, diagonal covariance matrix:

In[18]:=

Quiet@ResourceFunction["MultinormalKLDivergence"][
MultinormalDistribution[{0, 0}, DiagonalMatrix[{1, 1}]], MultinormalDistribution[{#1[[1]], #2[[2]]}, DiagonalMatrix[Exp@{#2[[1]], #2[[2]]}]]] // InputForm

Out[18]=

Use the resulting expression to create a function layer that can be used for regularization:

In[19]:=

Out[19]=

Build a variational autoencoder architecture that uses the Kullback–Leibler divergence to prevent overfitting and to encourage the model to learn a compact and meaningful representation of the data:

In[20]:=

Information[
net = NetGraph[
Association["linear1" -> LinearLayer[10], "nonlinear1" -> Tanh, "mean" -> LinearLayer[2, "Input" -> 10, "Biases" -> None], "logvar" -> LinearLayer[2, "Input" -> 10, "Biases" -> None], "exp" -> ElementwiseLayer[Exp], "z" -> RandomArrayLayer[NormalDistribution[0, 1] &, "Output" -> 2], "randomValue" -> FunctionLayer[Apply[#1 + #2*#3 &]], "reconstructor" -> LinearLayer[4], "reconstructionLoss" -> MeanSquaredLossLayer[], "regularizationLoss" -> regularizer, "weightedTotalLoss" -> FunctionLayer[Apply[#1 + 0.025*#2 &]]], {NetPort["x"] -> "linear1" -> "nonlinear1" -> {"mean", "logvar"}, "logvar" -> "exp", {"mean", "exp", "z"} -> "randomValue" -> "reconstructor", {"reconstructor", NetPort["x"]} -> "reconstructionLoss", {"mean", "logvar"} -> "regularizationLoss", {"reconstructionLoss", "regularizationLoss"} -> "weightedTotalLoss" -> NetPort["loss"],
"reconstructor" -> NetPort["reconstruction"]}, "x" -> 4, "reconstruction" -> 4], "SummaryGraphic"]

Out[20]=

Create structured data that one wants the variational autoencoder to capture:

In[21]:=

$self[edgeList_List] := First[Last /@ edgeList]; g1 = Graph[{1, 2, 3, 4}, {1 -> 2, 1 -> 3, 2 -> 4, 3 -> 4}, VertexLabels -> Placed[Automatic, Center], VertexSize -> Large]; SeedRandom[20230211]; initialization1 = 1 -> RandomVariate[DiscreteUniformDistribution[{1, 10}], 100]; functions1 = {x, y, z} |-> Switch[self[z], 2, MapThread[ v |-> RandomVariate[DiscreteUniformDistribution[{v, v + 3}]], x], 3, MapThread[v |-> RandomVariate[NormalDistribution[v, 1]], x], 4, MapThread[{v1, v2} |-> RandomVariate[NormalDistribution[2 v1 - v2, 1]], x]];$