# Wolfram Function Repository

Instant-use add-on functions for the Wolfram Language

Function Repository Resource:

Sample from a probability density function using the Markov chain Monte Carlo (MCMC) method

Contributed by:
Brian Lu

ResourceFunction["MonteCarloSample"][ generates | |

ResourceFunction["MonteCarloSample"][ generates samples starting from an ensemble consisting of | |

ResourceFunction["MonteCarloSample"][ uses probability distribution | |

ResourceFunction["MonteCarloSample"][ generates samples but discards the first | |

ResourceFunction["MonteCarloSample"][ generates samples and takes every |

The number of arguments that the probability function *f* takes should match the dimension of the initial points.

The probability function *f* does not need to be normalized.

When the initial points are specified as {*dist*,*m*}, the domain of the prior distribution *dist* should have the same dimension as the number of arguments of *f* and should have a probability distribution function (PDF) in symbolic closed form.

The *burn* specifies the number of samples to be discarded at the beginning of the chain, where thermalization has not been reached, often referred to as "burn-in". The *burn* can be specified as Automatic, where the length of thermalization steps are automatically determined from the list of probabilities.

The *i* is a positive integer step size, in which samples are taken. The *i* can be specified as Automatic, where it is set to be the inverse of the overall acceptance rate with an upper limit of 10.

The following options are supported:

Method | Automatic | the MCMC algorithm to use |

"LogProbability" | False | whether the probability function f is specified on a log scale |

"TargetAcceptanceRate" | 0.25 | the target rate at which proposed points are accepted for certain methods |

“ListableFunction" | False | whether the probability function f is listable |

Parallelization | False | whether parallelization is used in sampling |

Possible settings for Method include:

"RandomWalk" | uses a fixed probability distribution for drawing proposals |

"AdaptiveMetropolis" | use the method of Haario et al. |

"EMCEE" | use the method of Foreman-Mackey et al. |

"DifferentialEvolution" | use the method of ter Braak |

For the method "RandomWalk", the following additional suboption is supported:

"JumpDistribution" | the symmetrical probability distribution for drawing proposals |

For the method "AdaptiveMetropolis", the following additional suboptions are supported:

"InitialScaleMultiplier" | 1 | the initial scale of the step relative to the default one |

"RandomWalkScale" | 0.1 | the absolute scale of steps when random walk is adopted |

For the method "EMCEE", the following additional suboptions are supported:

"StretchScale" | 2 | the maximum scale of the stretch moves |

For the method "DifferentialEvolution", the following additional suboptions are supported:

"InitialScaleMultiplier" | 2 | the initial scale of the step relative to the default one |

"RandomnessScale" | 0.001 | the relative scale of randomness of the steps |

Generate 10 samples from a Student *t* distribution with 3 degrees of freedom:

In[1]:= |

Out[1]= |

Generate 1000 samples, discard the first 200 samples, and take every fifth sample:

In[2]:= |

Out[2]= |

In[3]:= |

Out[3]= |

Do automatic burn-in removal and thinning instead:

In[4]:= |

Out[4]= |

In[5]:= |

Out[5]= |

Sample from a 2D normal distribution starting from an ensemble consisting of 10 random points:

In[6]:= |

Out[6]= |

In[7]:= |

Out[7]= |

Specify a prior in addition to the likelihood function:

In[8]:= |

Out[8]= |

Out[9]= |

In[10]:= |

Out[10]= |

In[11]:= |

Out[11]= |

The random walk method takes jumps following a normal distribution by default. A custom distribution can also be specified:

In[12]:= |

Out[12]= |

The adaptive Metropolis method takes jumps following an adaptive multinormal distribution that is derived from existing samples:

In[13]:= |

Out[13]= |

In[14]:= |

Out[14]= |

The EMCEE and differential evolution methods take jumps being the displacement between two points multiplied by a scale factor. Increasing the number of initial points may improve the performance of these two methods:

In[15]:= |

Out[15]= |

In[16]:= |

Out[16]= |

When the probability function evaluates to a tiny number, machine numbers may not be able to represent them accurately and the samples do not converge:

In[17]:= |

Out[17]= |

In[18]:= |

Out[18]= |

"LogProbability"→True indicates that the probability function is on a log scale, and the samples converge to *x*≈50:

In[19]:= |

Out[19]= |

In[20]:= |

Out[20]= |

Setting the target acceptance rate is only effective for adaptive Metropolis or differential evolution methods. A higher acceptance rate decreases the step size and vice versa:

In[21]:= |

Out[21]= |

In[22]:= |

Out[22]= |

In[23]:= |

Out[23]= |

If the probability function is listable, specifying "ListableFunction"→True and employing a large number of initial points may significantly improve the performance:

In[24]:= |

Out[24]= |

In[25]:= |

Out[25]= |

Enabling parallelization is only beneficial when the probability function can be run in parallel and each evaluation takes a long time:

In[26]:= |

Out[27]= |

In[28]:= |

Out[28]= |

Get the posterior distribution of two parameters in a Student *t* distribution:

In[29]:= |

Out[29]= |

Choose a uniform prior over * μ*∈[-2,2],

In[30]:= |

Out[31]= |

Run MCMC:

In[32]:= |

Out[32]= |

In[33]:= |

Out[33]= |

The location of the peak in the posterior distribution matches the result given by EstimatedDistribution:

In[34]:= |

Out[34]= |

Get the posterior distribution of amplitude, frequency, phase, and mean of a sinusoidal function:

In[35]:= |

Out[37]= |

In[38]:= |

Out[38]= |

The likelihood is derived based on the assumption that each point has a normally distributed error. Define the likelihood, and select the initial points for MCMC around the peak of the likelihood function:

In[39]:= |

Out[40]= |

Run MCMC:

In[41]:= |

Out[41]= |

The marginal distribution of all four parameters compared to their input values:

In[42]:= |

Out[42]= |

The normalization of the probability function does not affect the result:

In[43]:= |

Out[44]= |

If all of the initial points are on an hyperplane, all samples will be on the same hyperplane when the EMCEE or differential evolution method is used:

In[45]:= |

Out[45]= |

In[46]:= |

Out[46]= |

In[47]:= |

Out[47]= |

Automatic burn-in removal is done by comparing the log-posteriors with the median log-posterior. In some cases, it may keep samples where thermalization has not been reached:

In[48]:= |

Out[48]= |

In[49]:= |

Out[49]= |

In[50]:= |

Out[50]= |

To mitigate this problem, place the initial points on a wider range:

In[51]:= |

Out[51]= |

In[52]:= |

Out[52]= |

- 1.0.0 – 05 July 2022

This work is licensed under a Creative Commons Attribution 4.0 International License