# Wolfram Function Repository

Instant-use add-on functions for the Wolfram Language

Function Repository Resource:

Calculate the Kullback–Leibler divergence between two distributions

Contributed by:
Sjoerd Smit

ResourceFunction["KullbackLeiblerDivergence"][ computes the Kullback–Leibler divergence from distribution |

The Kullback–Leibler divergence *D*_{KL} is an asymmetric measure of dissimilarity between two probability distributions *P* and *Q*. If it can be computed, it will always be a number ≥0 (with equality if and only if the two distributions are the same almost everywhere).

In a Bayesian setting, the Kullback–Leibler divergence represents the information gained when updating a prior distribution *Q* to posterior distribution *P*. Conversly, it can be seen as the information lost when *Q* is used to approximate *P*. Because *D*_{KL} is assymetric, is generally different from .

The support of *P* must be contained in the support of *Q*. In other words: the random variable *P* may never take values that *Q* can't take. Intuitively, this is because any value excluded by the prior will automatically also be excluded by the posterior. If the supports do not satisfy this constraint, ResourceFunction["KullbackLeiblerDivergence"] will raise a message and return Undefined.

ResourceFunction["KullbackLeiblerDivergence"] works for multivariate distributions.

ResourceFunction["KullbackLeiblerDivergence"] has a Method option that allows the user to specify if the computation should be done with Expectation (default) or NExpectation. Sub-options for these functions can also be provided. For example, to pass suboptions into Expectation you can use Method→{Expectation,*opt*_{1}→*val*_{1},…}.

Calculate the KullbackLeiblerDivergence between two normal distributions:

In[1]:= |

Out[1]= |

KullbackLeiblerDivergence is not symmetric:

In[2]:= |

Out[2]= |

In[3]:= |

Out[3]= |

KullbackLeiblerDivergence works with symbolic distributions:

In[4]:= |

Out[4]= |

The distributions do not have to be from the same family:

In[5]:= |

Out[5]= |

KullbackLeiblerDivergence also works with discrete distributions:

In[6]:= |

Out[6]= |

In[7]:= |

Out[7]= |

Use a custom-defined ProbabilityDistribution:

In[8]:= |

Out[8]= |

KullbackLeiblerDivergence works for multivariate distributions:

In[9]:= |

Out[9]= |

KullbackLeiblerDivergence also works with EmpiricalDistribution:

In[10]:= |

Out[10]= |

In[11]:= |

Out[11]= |

Symbolic evaluation of the divergence is unfeasible for some distributions:

In[12]:= |

Out[12]= |

Use NExpectation instead:

In[13]:= |

Out[13]= |

Supply extra options to NExpectation:

In[14]:= |

Out[14]= |

NExpectation is designed to work with distributions with numeric parameters:

In[15]:= |

Out[15]= |

Without Assumptions, a message will be raised and conditions are generated in the result:

In[16]:= |

Out[16]= |

With assumptions specified, a result valid under those conditions is returned:

In[17]:= |

Out[17]= |

If *X* and *Y* are two random variables with a joint distribution 𝒟, then the mutual information between them is defined as the Kullback–Leibler divergence from the product distribution of the marginals to 𝒟.

As an example, calculate the mutual information of the components of a BinormalDistribution:

In[18]:= |

Out[18]= |

The Kullback–Leibler divergence can be used to fit distributions to data and also provides a measure of the quality of the fit in a way very similar to maximum likelihood estimation. First generate some samples from a discrete distribution:

In[19]:= |

Out[19]= |

Calculate the divergence from the EmpiricalDistribution to a symbolic target distribution:

In[20]:= |

Out[20]= |

Minimize with respect to * μ*:

In[21]:= |

Out[21]= |

Try a different distribution to compare:

In[22]:= |

Out[22]= |

The minimized divergence is larger, indicating this distribution is a worse approximation to the data:

In[23]:= |

Out[23]= |

Note that the reverse divergences are not defined because the supports of PoissonDistribution and GeometricDistribution are infinite:

In[24]:= |

Out[24]= |

Also note that this does not work for continuous data because the support of EmpiricalDistribution is discrete:

In[25]:= |

Out[25]= |

Use a KernelMixtureDistribution instead for continuous data:

In[26]:= |

Out[26]= |

The divergence from a distribution to itself is zero:

In[27]:= |

Out[27]= |

The dimensions of the distributions have to match:

In[28]:= |

Out[28]= |

Matrix distributions are currently not supported:

In[29]:= |

Out[29]= |

The divergence is Undefined if the first distribution has a wider support than the second:

In[30]:= |

Out[30]= |

In[31]:= |

Out[31]= |

KullbackLeiblerDivergence is undefined between discrete and continuous distributions:

In[32]:= |

Out[32]= |

For some symbolic distributions the expectation cannot be evaluated:

In[33]:= |

Out[33]= |

- 3.0.0 – 10 March 2020
- 2.0.0 – 08 November 2019
- 1.0.0 – 27 September 2019

This work is licensed under a Creative Commons Attribution 4.0 International License