Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Compute quantile regression fits over a time series, a list of numbers or a list of numeric pairs
ResourceFunction["QuantileRegression"][data,knots,probs] does quantile regression over the times series or data array data using the knots specification knots for the probabilities probs. | |
ResourceFunction["QuantileRegression"][data,knots,probs,opts] does quantile regression with the options opts. |
InterpolationOrder | 3 | interpolation order |
Method | LinearProgramming | method for the quantile regression computations |
Make a random signal:
In[1]:= |
Compute QuantileRegression with five knots for the probabilities 0.25 and 0.75:
In[2]:= |
Here are the formulas of the obtained regression quantiles:
In[3]:= |
Out[3]= |
Here is a plot of the original data and the obtained regression quantiles:
In[4]:= |
Out[4]= |
Find the fraction of the data points that are under the second regression quantile:
In[5]:= |
Out[5]= |
The obtained fraction is close to the second probability, 0.75, given to QuantileRegression.
Here is a quantile regression computation over a numerical vector:
In[6]:= |
Out[6]= |
Here is a quantile regression computation over a time series object:
In[7]:= |
Out[7]= |
The second argument—the knots specification—can be an integer specifying the number of knots or a list of numbers specifying the knots of the B-spline basis:
In[8]:= |
The option InterpolationOrder specifies the polynomial order of the B-spline basis. Its values are expected to be non-negative integers:
In[9]:= |
In[10]:= |
Out[10]= |
QuantileRegression uses LinearProgramming. Additional parameters can be passed to LinearProgramming with the Method option:
In[11]:= |
In[12]:= |
Out[12]= |
Here is heteroscedastic data (the variance is not constant with respect to the predictor variable):
In[13]:= |
Out[13]= |
Find quantile regression fits:
In[14]:= |
Plot the data and the regression quantiles:
In[15]:= |
Out[15]= |
Note that the regression quantiles clearly outline the heteroscedastic nature of the data.
A certain contextual type of anomaly is a subset of points that have variance very different than other subsets. Using quantile regression we can (1) evaluate the regressor-dependent variance for each point using the regression quantiles 0.25 and 0.75; and (2) find the points that have outlier variances.
Here we compute and plot the variance estimates for a signal:
In[16]:= |
Out[16]= |
Find the lower and upper thresholds for the variance outliers:
In[17]:= |
Out[17]= |
Find the outlier positions:
In[18]:= |
Plot the data and the outliers found:
In[19]:= |
Out[19]= |
Here is a financial time series:
In[20]:= |
Out[20]= |
In[21]:= |
Out[21]= |
Do a quantile regression fit and plot it:
In[22]:= |
Out[22]= |
Here are the errors of the fit found:
In[23]:= |
Out[23]= |
Find anomalies' positions in the list of fit errors:
In[24]:= |
Out[24]= |
Plot the data, fit and anomalies:
In[25]:= |
Out[25]= |
Get temperature data:
In[26]:= |
Out[26]= |
Convert the time series into a list of numeric pairs:
In[27]:= |
Compute quantile regression fits:
In[28]:= |
Plot the data and the regression quantiles:
In[29]:= |
Out[29]= |
Find an estimate of the conditional cumulative distribution function (CDF) at the date 2017-10-01:
In[30]:= |
Out[30]= |
Find outliers in the temperature data—outliers are defined as points below or above the 0.02 and 0.98 regression quantiles respectively:
In[31]:= |
In[32]:= |
In[33]:= |
Out[33]= |
QuantileRegression can be compared with FindFormula, Fit, LinearModelFit and NonlinearModelFit:
In[34]:= |
In[35]:= |
In[36]:= |
Out[36]= |
Quantile regression is much more robust than linear regression. In order to demonstrate that, add a few large outliers in the data:
In[37]:= |
Here quantile regression and linear regression are applied, as in the previous example:
In[38]:= |
In[39]:= |
Here is a plot of the obtained curves. Note that the curve corresponding to linear regression is different and a worse fit than the one from the previous example:
In[40]:= |
Out[40]= |
Because of the linear programming formulation for some data and knots specifications, the computations can be slow.
For most data, the quantile regression fitting for probabilities 0 and 1 produces regression quantiles that are "too far away from the data."
Find regression quantiles for probabilities 0 and 0.5 and plot them:
In[41]:= |
Out[41]= |
Find regression quantiles for probabilities 0.5 and 1 and plot them:
In[42]:= |
Out[42]= |
One way to fix this is to use probabilities that are close to 0 and 1 from above and below, respectively:
In[43]:= |
Out[43]= |
Consider the following nonlinear data:
In[44]:= |
Out[44]= |
Make a quantile regression fit with 20 knots:
In[45]:= |
Make a quantile regression fit with 40 knots:
In[46]:= |
Plot the regression quantiles and the data:
In[47]:= |
Out[47]= |
You can see that the regression quantile computed with 40 knots is "overfitted" between 0 and 8—the B-spline basis knots are too densely placed between 0 and 8.
When regression quantiles are overfitted, then the estimate of the conditional cumulative distribution function (CDF) can be problematic—the estimated CDF is not a monotonically increasing function.
Compute regression quantiles using "too many" knots:
In[48]:= |
Plot the regression quantiles:
In[49]:= |
Out[49]= |
Here is the estimated conditional CDF:
In[50]:= |
Out[50]= |
For certain data, it is beneficial to rescale the predictor values, predicted values, or both before doing the quantile regression computations:
In[51]:= |
In[52]:= |
Out[52]= |
In[53]:= |
Out[53]= |
Compute and plot regression quantiles over symmetric data:
In[54]:= |
Out[54]= |
This work is licensed under a Creative Commons Attribution 4.0 International License