Wolfram Research

Function Repository Resource:

KullbackLeiblerDivergence

Source Notebook

Calculate the Kullback–Leibler divergence between two distributions

Contributed by: Sjoerd Smit

ResourceFunction["KullbackLeiblerDivergence"][P,Q]

computes DKL(PQ), the Kullback–Leibler divergence from distribution Q to distribution P.

Details and Options

The Kullback–Leibler divergence (DKL) is an asymmetric measure of dissimilarity between two probability distributions P and Q. If it can be computed, it will always be a number ≥0 (with equality if and only if the two distributions are the same almost everywhere). In a Bayesian setting, it represents the information gained when updating a prior distribution Q to posterior distribution P. Conversly, it can be seen as the information lost when Q is used to approximate P. Because DKL is assymetric, DKL(PQ) is generally different from DKL(QP).
P and Q must both satistfy DistributionParameterQ.
The support of P must be contained in the support of Q. In other words: the random variable P may never take values that Q can't take. Intuitively, this is because any value excluded by the prior will automatically also be excluded by the posterior. If the supports do not satisfy this constraint, ResourceFunction["KullbackLeiblerDivergence"] will raise a message and return Undefined.
ResourceFunction["KullbackLeiblerDivergence"] works for multivariate distributions.
ResourceFunction["KullbackLeiblerDivergence"] has a Method option that allows the user to specify if the computation should be done with Expectation (default) or NExpectation. Sub-options for these functions can also be provided. For example, to pass suboptions into Expectation you can use Method {Expectation,opt1val1,}.

Examples

Basic Examples

Calculate the KullbackLeiblerDivergence between two NormalDistributions:

In[1]:=
ResourceFunction["KullbackLeiblerDivergence"][
 NormalDistribution[0, 1], NormalDistribution[1, 2]]
Out[1]=

KullbackLeiblerDivergence is not symmetric:

In[2]:=
ResourceFunction["KullbackLeiblerDivergence"][
 NormalDistribution[1, 2], NormalDistribution[0, 1]]
Out[2]=
In[3]:=
% == %%
Out[3]=

KullbackLeiblerDivergence works with symbolic distributions:

In[4]:=
ResourceFunction["KullbackLeiblerDivergence"][
 NormalDistribution[\[Mu]1, \[Sigma]1], NormalDistribution[\[Mu]2, \[Sigma]2]]
Out[4]=

The distributions do not have to be from the same family:

In[5]:=
ResourceFunction["KullbackLeiblerDivergence"][
 ExponentialDistribution[\[Lambda]], NormalDistribution[\[Mu], \[Sigma]]]
Out[5]=

It also works with discrete distributions:

In[6]:=
ResourceFunction["KullbackLeiblerDivergence"][
 BernoulliDistribution[p1], BernoulliDistribution[p2]]
Out[6]=
In[7]:=
ResourceFunction["KullbackLeiblerDivergence"][
 BernoulliDistribution[p1], PoissonDistribution[\[Lambda]]]
Out[7]=

Use a custom defined ProbabilityDistribution:

In[8]:=
ResourceFunction["KullbackLeiblerDivergence"][
 ProbabilityDistribution[Sqrt[2]/(
  Pi *(1 + \[FormalX]^4)), {\[FormalX], -Infinity, Infinity}],
 NormalDistribution[]
 ]
Out[8]=

Scope

KullbackLeiblerDivergence works for multivariate distributions:

In[9]:=
ResourceFunction["KullbackLeiblerDivergence"][
 BinormalDistribution[\[Rho]1], BinormalDistribution[\[Rho]2]]
Out[9]=

KullbackLeiblerDivergence also works with EmpiricalDistribution:

In[10]:=
ResourceFunction["KullbackLeiblerDivergence"][
 EmpiricalDistribution[{"a", "b", "b"}],
 EmpiricalDistribution[{"a", "b", "c"}]
 ]
Out[10]=
In[11]:=
ResourceFunction["KullbackLeiblerDivergence"][
 BernoulliDistribution[p],
 EmpiricalDistribution[{0, 0, 1}]
 ]
Out[11]=

Options

Method

Symbolic evaluation of the divergence is unfeasible for some distributions:

In[12]:=
TimeConstrained[
 ResourceFunction["KullbackLeiblerDivergence"][
  NormalDistribution[0., 1.], StableDistribution[1, 1.3, 0.5, 0., 2.]], 10]
Out[12]=

Use NExpectation instead:

In[13]:=
ResourceFunction["KullbackLeiblerDivergence"][
 NormalDistribution[0., 1.], StableDistribution[1, 1.3, 0.5, 0., 2.], Method -> NExpectation]
Out[13]=

Supply extra options to NExpectation:

In[14]:=
ResourceFunction["KullbackLeiblerDivergence"][
 NormalDistribution[0, 1], SetPrecision[StableDistribution[1, 1.3, 0.5, 0., 2.], 20],
 Method -> {NExpectation, AccuracyGoal -> 5, PrecisionGoal -> 5, WorkingPrecision -> 20}
 ]
Out[14]=

NExpectation is designed to work with distributions with numeric parameters:

In[15]:=
ResourceFunction["KullbackLeiblerDivergence"][
 NormalDistribution[\[Mu], \[Sigma]], StableDistribution[1, 1.3, 0.5, 0., 2.], Method -> NExpectation]
Out[15]=

Assumptions

Without Assumptions, a message will be raised and conditions are generated in the result:

In[16]:=
ResourceFunction["KullbackLeiblerDivergence"][
 UniformDistribution[{0, n}], UniformDistribution[{0, m}]]
Out[16]=

With assumptions specified, a result valid under those conditions is returned:

In[17]:=
ResourceFunction["KullbackLeiblerDivergence"][
 UniformDistribution[{0, n}], UniformDistribution[{0, m}], Assumptions -> m > n]
Out[17]=

Applications

If X and Y are two random variables with a joint distribution 𝒟, then the mutual information between them is defined as the Kullback–Leibler divergence from the product distribution of the marginals to 𝒟.

As an example, calculate the mutual information of the components of a BinormalDistribution:

In[18]:=
binormalDist = BinormalDistribution[{\[Mu]1, \[Mu]2}, {\[Sigma]1, \[Sigma]2}, \
\[Rho]];
ResourceFunction["KullbackLeiblerDivergence"][
 binormalDist,
 ProductDistribution @@ Map[MarginalDistribution[binormalDist, #] &, {1, 2}]
 ]
Out[19]=

The Kullback–Leibler divergence can be used to fit distributions to data and also provides a measure of the quality of the fit in a way very similar to maximum likelihood estimation. First generate some samples from a discrete distribution:

In[20]:=
data = RandomVariate[PoissonDistribution[2], 100];
Histogram[data, {1}, "PDF", PlotRange -> All]
Out[21]=

Calculate the divergence from the EmpiricalDistribution to a symbolic target distribution:

In[22]:=
dataDist = EmpiricalDistribution[data];
ResourceFunction["KullbackLeiblerDivergence"][dataDist, PoissonDistribution[\[Mu]]]
Out[23]=

Minimize with respect to μ:

In[24]:=
NMinimize[{%, DistributionParameterAssumptions[
   PoissonDistribution[\[Mu]]]}, \[Mu]]
Out[24]=

Try a different distribution to compare:

In[25]:=
ResourceFunction["KullbackLeiblerDivergence"][dataDist, GeometricDistribution[p]]
Out[25]=

The minimized divergence is larger, indicating this distribution is a worse approximation to the data:

In[26]:=
NMinimize[{%, DistributionParameterAssumptions[GeometricDistribution[p]]}, p]
Out[26]=

Note that the reverse divergences are not defined because the supports of PoissonDistribution and GeometricDistribution are infinite:

In[27]:=
ResourceFunction["KullbackLeiblerDivergence"][
 PoissonDistribution[\[Mu]], dataDist]
Out[27]=

Also note that this does not work for continuous data because the support of EmpiricalDistribution is discrete:

In[28]:=
ResourceFunction["KullbackLeiblerDivergence"][
 EmpiricalDistribution[RandomVariate[NormalDistribution[], 100]],
 NormalDistribution[]
 ]
Out[28]=

Use a KernelMixtureDistribution instead for continuous data:

In[29]:=
ResourceFunction["KullbackLeiblerDivergence"][
 KernelMixtureDistribution[RandomVariate[NormalDistribution[], 100]],
 NormalDistribution[],
 Method -> NExpectation
 ]
Out[29]=

Properties and Relations

The divergence from a distribution to itself is zero:

In[30]:=
ResourceFunction["KullbackLeiblerDivergence"][
 NormalDistribution[\[Mu], \[Sigma]], NormalDistribution[\[Mu], \[Sigma]]]
Out[30]=

Possible Issues


The dimensions of the distributions have to match:

In[31]:=
ResourceFunction["KullbackLeiblerDivergence"][
 MultinormalDistribution[IdentityMatrix[2]], MultinormalDistribution[IdentityMatrix[3]]]
Out[31]=

Matrix distributions are currently not supported:

In[32]:=
ResourceFunction["KullbackLeiblerDivergence"][
 WishartMatrixDistribution[3, IdentityMatrix[2]], WishartMatrixDistribution[4, IdentityMatrix[2]]]
Out[32]=

The divergence is Undefined if the first distribution has a wider support than the second:

In[33]:=
ResourceFunction["KullbackLeiblerDivergence"][
 NormalDistribution[\[Mu], \[Sigma]], ExponentialDistribution[\[Lambda]]]
Out[33]=
In[34]:=
ResourceFunction["KullbackLeiblerDivergence"][
 PoissonDistribution[\[Lambda]], BernoulliDistribution[p1]]
Out[34]=

KullbackLeiblerDivergence is undefined between discrete and continuous distributions:

In[35]:=
ResourceFunction["KullbackLeiblerDivergence"][
 BernoulliDistribution[1/2], NormalDistribution[]]
Out[35]=

For some symbolic distributions the expectation cannot be evaluated:

In[36]:=
ResourceFunction["KullbackLeiblerDivergence"][
 BernoulliDistribution[p2], BinomialDistribution[n, p1]]
Out[36]=

Resource History

Related Resources

License Information