Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Compute quantile regression fits over a time series, a list of numbers or a list of numeric pairs
ResourceFunction["QuantileRegression"][data,knots,probs] does quantile regression over the times series or data array data using the knots specification knots for the probabilities probs. | |
ResourceFunction["QuantileRegression"][data,knots,probs,opts] does quantile regression with the options opts. |
InterpolationOrder | 3 | interpolation order |
Method | LinearProgramming | method for the quantile regression computations |
Make a random signal:
In[1]:= | ![]() |
Compute QuantileRegression with five knots for the probabilities 0.25 and 0.75:
In[2]:= | ![]() |
Here are the formulas of the obtained regression quantiles:
In[3]:= | ![]() |
Out[3]= | ![]() |
Here is a plot of the original data and the obtained regression quantiles:
In[4]:= | ![]() |
Out[4]= | ![]() |
Find the fraction of the data points that are under the second regression quantile:
In[5]:= | ![]() |
Out[5]= | ![]() |
The obtained fraction is close to the second probability, 0.75, given to QuantileRegression.
Here is a quantile regression computation over a numerical vector:
In[6]:= | ![]() |
Out[6]= | ![]() |
Here is a quantile regression computation over a time series object:
In[7]:= | ![]() |
Out[7]= | ![]() |
The second argument—the knots specification—can be an integer specifying the number of knots or a list of numbers specifying the knots of the B-spline basis:
In[8]:= | ![]() |
The option InterpolationOrder specifies the polynomial order of the B-spline basis. Its values are expected to be non-negative integers:
In[9]:= | ![]() |
In[10]:= | ![]() |
Out[10]= | ![]() |
QuantileRegression uses LinearProgramming. Additional parameters can be passed to LinearProgramming with the Method option:
In[11]:= | ![]() |
In[12]:= | ![]() |
Out[12]= | ![]() |
Here is heteroscedastic data (the variance is not constant with respect to the predictor variable):
In[13]:= | ![]() |
Out[13]= | ![]() |
Find quantile regression fits:
In[14]:= | ![]() |
Plot the data and the regression quantiles:
In[15]:= | ![]() |
Out[15]= | ![]() |
Note that the regression quantiles clearly outline the heteroscedastic nature of the data.
A certain contextual type of anomaly is a subset of points that have variance very different than other subsets. Using quantile regression we can (1) evaluate the regressor-dependent variance for each point using the regression quantiles 0.25 and 0.75; and (2) find the points that have outlier variances.
Here we compute and plot the variance estimates for a signal:
In[16]:= | ![]() |
Out[16]= | ![]() |
Find the lower and upper thresholds for the variance outliers:
In[17]:= | ![]() |
Out[17]= | ![]() |
Find the outlier positions:
In[18]:= | ![]() |
Plot the data and the outliers found:
In[19]:= | ![]() |
Out[19]= | ![]() |
Here is a financial time series:
In[20]:= | ![]() |
Out[20]= | ![]() |
In[21]:= | ![]() |
Out[21]= | ![]() |
Do a quantile regression fit and plot it:
In[22]:= | ![]() |
Out[22]= | ![]() |
Here are the errors of the fit found:
In[23]:= | ![]() |
Out[23]= | ![]() |
Find anomalies' positions in the list of fit errors:
In[24]:= | ![]() |
Out[24]= | ![]() |
Plot the data, fit and anomalies:
In[25]:= | ![]() |
Out[25]= | ![]() |
Get temperature data:
In[26]:= | ![]() |
Out[26]= | ![]() |
Convert the time series into a list of numeric pairs:
In[27]:= | ![]() |
Compute quantile regression fits:
In[28]:= | ![]() |
Plot the data and the regression quantiles:
In[29]:= | ![]() |
Out[29]= | ![]() |
Find an estimate of the conditional cumulative distribution function (CDF) at the date 2017-10-01:
In[30]:= | ![]() |
Out[30]= | ![]() |
Find outliers in the temperature data—outliers are defined as points below or above the 0.02 and 0.98 regression quantiles respectively:
In[31]:= | ![]() |
In[32]:= | ![]() |
In[33]:= | ![]() |
Out[33]= | ![]() |
QuantileRegression can be compared with FindFormula, Fit, LinearModelFit and NonlinearModelFit:
In[34]:= | ![]() |
In[35]:= | ![]() |
In[36]:= | ![]() |
Out[36]= | ![]() |
Quantile regression is much more robust than linear regression. In order to demonstrate that, add a few large outliers in the data:
In[37]:= | ![]() |
Here quantile regression and linear regression are applied, as in the previous example:
In[38]:= | ![]() |
In[39]:= | ![]() |
Here is a plot of the obtained curves. Note that the curve corresponding to linear regression is different and a worse fit than the one from the previous example:
In[40]:= | ![]() |
Out[40]= | ![]() |
Because of the linear programming formulation for some data and knots specifications, the computations can be slow.
For most data, the quantile regression fitting for probabilities 0 and 1 produces regression quantiles that are "too far away from the data."
Find regression quantiles for probabilities 0 and 0.5 and plot them:
In[41]:= | ![]() |
Out[41]= | ![]() |
Find regression quantiles for probabilities 0.5 and 1 and plot them:
In[42]:= | ![]() |
Out[42]= | ![]() |
One way to fix this is to use probabilities that are close to 0 and 1 from above and below, respectively:
In[43]:= | ![]() |
Out[43]= | ![]() |
Consider the following nonlinear data:
In[44]:= | ![]() |
Out[44]= | ![]() |
Make a quantile regression fit with 20 knots:
In[45]:= | ![]() |
Make a quantile regression fit with 40 knots:
In[46]:= | ![]() |
Plot the regression quantiles and the data:
In[47]:= | ![]() |
Out[47]= | ![]() |
You can see that the regression quantile computed with 40 knots is "overfitted" between 0 and 8—the B-spline basis knots are too densely placed between 0 and 8.
When regression quantiles are overfitted, then the estimate of the conditional cumulative distribution function (CDF) can be problematic—the estimated CDF is not a monotonically increasing function.
Compute regression quantiles using "too many" knots:
In[48]:= | ![]() |
Plot the regression quantiles:
In[49]:= | ![]() |
Out[49]= | ![]() |
Here is the estimated conditional CDF:
In[50]:= | ![]() |
Out[50]= | ![]() |
For certain data, it is beneficial to rescale the predictor values, predicted values, or both before doing the quantile regression computations:
In[51]:= | ![]() |
In[52]:= | ![]() |
Out[52]= | ![]() |
In[53]:= | ![]() |
Out[53]= | ![]() |
Compute and plot regression quantiles over symmetric data:
In[54]:= | ![]() |
Out[54]= | ![]() |
This work is licensed under a Creative Commons Attribution 4.0 International License