Function Repository Resource:

KullbackLeiblerDivergence

Calculate the Kullback–Leibler divergence between two distributions

Contributed by: Sjoerd Smit

ResourceFunction["KullbackLeiblerDivergence"][P,Q]

computes the Kullback–Leibler divergence from distribution Q to distribution P.

Details and Options

The Kullback–Leibler divergence D_KL is an asymmetric measure of dissimilarity between two probability distributions P and Q. If it can be computed, it will always be a number ≥0 (with equality if and only if the two distributions are the same almost everywhere).

In a Bayesian setting, the Kullback–Leibler divergence represents the information gained when updating a prior distribution Q to posterior distribution P. Conversly, it can be seen as the information lost when Q is used to approximate P. Because D_KL is assymetric,

is generally different from

P and Q must both satistfy DistributionParameterQ.

The support of P must be contained in the support of Q. In other words: the random variable P may never take values that Q can't take. Intuitively, this is because any value excluded by the prior will automatically also be excluded by the posterior. If the supports do not satisfy this constraint, ResourceFunction["KullbackLeiblerDivergence"] will raise a message and return Undefined.

ResourceFunction["KullbackLeiblerDivergence"] works for multivariate distributions.

ResourceFunction["KullbackLeiblerDivergence"] has a Method option that allows the user to specify if the computation should be done with Expectation (default) or NExpectation. Sub-options for these functions can also be provided. For example, to pass suboptions into Expectation you can use Method→{Expectation,opt₁→val₁,…}.

Examples

Basic Examples (5)

Calculate the KullbackLeiblerDivergence between two normal distributions:

In[1]:=

Out[1]=

KullbackLeiblerDivergence is not symmetric:

In[2]:=

Out[2]=

In[3]:=

Out[3]=

KullbackLeiblerDivergence works with symbolic distributions:

In[4]:=

$ResourceFunction["KullbackLeiblerDivergence"][ NormalDistribution[\[Mu]1, \[Sigma]1], NormalDistribution[\[Mu]2, \[Sigma]2]]$

Out[4]=

The distributions do not have to be from the same family:

In[5]:=

$ResourceFunction["KullbackLeiblerDivergence"][ ExponentialDistribution[\[Lambda]], NormalDistribution[\[Mu], \[Sigma]]]$

Out[5]=

KullbackLeiblerDivergence also works with discrete distributions:

In[6]:=

Out[6]=

In[7]:=

$ResourceFunction["KullbackLeiblerDivergence"][ BernoulliDistribution[p1], PoissonDistribution[\[Lambda]]]$

Out[7]=

Use a custom-defined ProbabilityDistribution:

In[8]:=

$ResourceFunction["KullbackLeiblerDivergence"][ ProbabilityDistribution[Sqrt[2]/( Pi *(1 + \[FormalX]^4)), {\[FormalX], -Infinity, Infinity}], NormalDistribution[] ]$

Out[8]=

Scope (2)

KullbackLeiblerDivergence works for multivariate distributions:

In[9]:=

$ResourceFunction["KullbackLeiblerDivergence"][ BinormalDistribution[\[Rho]1], BinormalDistribution[\[Rho]2]]$

Out[9]=

KullbackLeiblerDivergence also works with EmpiricalDistribution:

In[10]:=

ResourceFunction["KullbackLeiblerDivergence"][
EmpiricalDistribution[{"a", "b", "b"}],
EmpiricalDistribution[{"a", "b", "c"}]
]

Out[10]=

In[11]:=

ResourceFunction["KullbackLeiblerDivergence"][
BernoulliDistribution[p],
EmpiricalDistribution[{0, 0, 1}]
]

Out[11]=

Options (4)

Method (2)

Symbolic evaluation of the divergence is unfeasible for some distributions:

In[12]:=

Out[12]=

Use NExpectation instead:

In[13]:=

Out[13]=

Supply extra options to NExpectation:

In[14]:=

ResourceFunction["KullbackLeiblerDivergence"][
NormalDistribution[0, 1], SetPrecision[StableDistribution[1, 1.3, 0.5, 0., 2.], 20],
Method -> {NExpectation, AccuracyGoal -> 5, PrecisionGoal -> 5, WorkingPrecision -> 20}
]

Out[14]=

NExpectation is designed to work with distributions with numeric parameters:

In[15]:=

$ResourceFunction["KullbackLeiblerDivergence"][ NormalDistribution[\[Mu], \[Sigma]], StableDistribution[1, 1.3, 0.5, 0., 2.], Method -> NExpectation]$

Out[15]=

Assumptions (2)

Without Assumptions, a message will be raised and conditions are generated in the result:

In[16]:=

Out[16]=

With assumptions specified, a result valid under those conditions is returned:

In[17]:=

Out[17]=

Applications (2)

If X and Y are two random variables with a joint distribution 𝒟, then the mutual information between them is defined as the Kullback–Leibler divergence from the product distribution of the marginals to 𝒟.

As an example, calculate the mutual information of the components of a BinormalDistribution:

In[18]:=

$binormalDist = BinormalDistribution[{\[Mu]1, \[Mu]2}, {\[Sigma]1, \[Sigma]2}, \[Rho]]; ResourceFunction["KullbackLeiblerDivergence"][ binormalDist, ProductDistribution @@ Map[MarginalDistribution[binormalDist, #] &, {1, 2}] ]$

Out[18]=

The Kullback–Leibler divergence can be used to fit distributions to data and also provides a measure of the quality of the fit in a way very similar to maximum likelihood estimation. First generate some samples from a discrete distribution:

In[19]:=