# Wolfram Function Repository

Instant-use add-on functions for the Wolfram Language

Function Repository Resource:

Perform Bayesian linear regression with conjugate priors

Contributed by:
Sjoerd Smit

ResourceFunction["BayesianLinearRegression"][{{ performs a Bayesian linear model fit on the data using basis functions | |

ResourceFunction["BayesianLinearRegression"][{{ generates the same result. | |

ResourceFunction["BayesianLinearRegression"][{{ generates the same result. |

The *y*_{i} in the input data can be either scalar or vector observations.

The number of *var*_{i} have to match the dimensions of the inputs *x*_{i}.

ResourceFunction["BayesianLinearRegression"] inherits the IncludeConstantBasis option from DesignMatrix. The default option value is "IncludeConstantBasis"→True.

By default, a weakly informative prior is assumed that is still normalizable so that the log-evidence is well-defined. Different priors can be specified with the "PriorParameters" option in the same format as the output of ResourceFunction["BayesianLinearRegression"] (see options section below).

The output of ResourceFunction["BayesianLinearRegression"] is an Association containing the following items:

"PriorParameters" | values of the hyperparameters corresponding to the prior distribution of the fit coefficients. |

"PosteriorParameters" | values of the hyperparameters corresponding to the posterior distribution of the fit coefficients. |

"LogEvidence" | the natural log of the marginal likelihood , which is the Bayesian goodness-of-fit measure of the model. |

"Prior" | prior distributions. |

"Posterior" | posterior distributions. |

"Basis" | the basis functions supplied by the user. |

"IndependentVariables" | the independent variables supplied by the user. |

"Functions" | functions that yield distributions when applied arbitrary hyperparameters; only included when the "IncludeFunctions"→True option is set. |

The keys "Prior", "Posterior" and "Functions" contain the following items:

"PredictiveDistribution" | predictive distribution of the data. |

"UnderlyingValueDistribution" | the distribution of of the fit line of the data. |

"RegressionCoefficientDistribution" | marginal distribution of the fit coefficients. |

"ErrorDistribution" | marginal distribution of the variance (for 1D regression) or covariance matrix. |

The implementation of the formulas is based on the Wikipedia article on multivariate Bayesian linear regression (see link below). Univariate regression (i.e., when the *y*_{i} are scalars or 1D vectors) is treated as a special case of multivariate regression using the lower-dimensional equivalents of the multivariate and matrix distributions.

Generate a Bayesian linear regression on random data:

In[1]:= |

Out[1]= |

Generate test data:

In[2]:= |

Out[2]= |

Fit the data with a first-order polynomial:

In[3]:= |

Out[3]= |

Show the predictive distribution of the model and the distribution of the fit line (dashed):

In[4]:= |

Out[4]= |

Plot the joint distribution of the coefficients *a* and *b* in the regression equation *y*=*ax*+*b*:

In[5]:= |

Out[5]= |

Plot the PDF of the posterior variance of the residuals:

In[6]:= |

Out[6]= |

Fit the data with a polynomial of arbitrary degree and compare the prediction bands and log-evidence of fits up to degree 4:

In[7]:= |

Out[7]= |

BayesianLinearRegression can do multivariate regression. For example, generate some 3D data:

In[8]:= |

Out[8]= |

Do a regression where the *x*-coordinate is treated as the independent variable and *y* and *z* are both dependent on *x*:

In[9]:= |

Out[9]= |

The predictive joint distribution for *y* and *z* is a MultivariateTDistribution:

In[10]:= |

Out[10]= |

Visualize the joint distribution as an animation:

In[11]:= |

Out[11]= |

Just like LinearModelFit, BayesianLinearRegression constructs a model with bias term by default:

In[12]:= |

Out[12]= |

Force the intercept to be zero:

In[13]:= |

Out[13]= |

The prior can be specified in the same format as the parameter outputs of the Bayesian linear regression. This can be used to update a model with new observations. To illustrate this, generate some test data and divide the dataset into two parts:

In[14]:= |

Out[14]= |

Fit the first half of the data:

In[15]:= |

Out[15]= |

Use the posterior parameters of *model1* as the prior to update the model with the second half of the dataset:

In[16]:= |

Out[16]= |

The result is exactly the same as when you would fit all of the data in a single step:

In[17]:= |

Out[17]= |

When "IncludeFunctions"→True, the returned association contains the key "Functions". This key holds the functions that can be used to generate distributions from arbitrary values of hyperparameters:

In[18]:= |

Out[18]= |

The posterior distribution is obtained from the predictive distribution function:

In[19]:= |

Out[19]= |

Apply it to the posterior parameters:

In[20]:= |

Out[20]= |

Here is a dataset where it is unclear if the fit should be first or second order:

In[21]:= |

Out[21]= |

Fit the data with polynomials up to fourth degree, rank the log-evidences and inspect them:

In[22]:= |

Out[22]= |

Calculate the weights for each fit:

In[23]:= |

Out[23]= |

Because the weights for the first and second order fits are so close, the correct Bayesian treatment is to include both in the final regression. Define a mixture of posterior predictive distributions:

In[24]:= |

Out[24]= |

Compare the mixture fit to the individual polynomial regressions:

In[25]:= |

Out[25]= |

Symbolically calculate the predictive posterior after a single observation for any prior:

In[26]:= |

Out[26]= |

Visualize the prediction bands:

In[27]:= |

Out[27]= |

- 2.0.0 – 13 September 2019
- 1.0.0 – 16 August 2019

This work is licensed under a Creative Commons Attribution 4.0 International License