Basic Examples (5)
Make a random signal:
Compute QuantileRegression with five knots for the probabilities 0.25 and 0.75:
Here are the formulas of the obtained regression quantiles:
Here is a plot of the original data and the obtained regression quantiles:
Find the fraction of the data points that are under the second regression quantile:
The obtained fraction is close to the second probability, 0.75, given to QuantileRegression.
Scope (3)
Here is a quantile regression computation over a numerical vector:
Here is a quantile regression computation over a time series object:
The second argument—the knots specification—can be an integer specifying the number of knots or a list of numbers specifying the knots of the B-spline basis:
Applications (18)
Fit for heteroscedastic data (3)
Here is heteroscedastic data (the variance is not constant with respect to the predictor variable):
Find quantile regression fits:
Plot the data and the regression quantiles:
Note that the regression quantiles clearly outline the heteroscedastic nature of the data.
Find variance anomalies (4)
A certain contextual type of anomaly is a subset of points that have variance very different than other subsets. Using quantile regression we can (1) evaluate the regressor-dependent variance for each point using the regression quantiles 0.25 and 0.75; and (2) find the points that have outlier variances.
Here we compute and plot the variance estimates for a signal:
Find the lower and upper thresholds for the variance outliers:
Find the outlier positions:
Plot the data and the outliers found:
Fit and anomalies for financial time series (5)
Here is a financial time series:
Do a quantile regression fit and plot it:
Here are the errors of the fit found:
Find anomalies’ positions in the list of fit errors:
Plot the data, fit and anomalies:
Analyze temperature time series (6)
Get temperature data:
Convert the time series into a list of numeric pairs:
Compute quantile regression fits:
Plot the data and the regression quantiles:
Find an estimate of the conditional cumulative distribution function (CDF) at the date 2017-10-01:
Find outliers in the temperature data—outliers are defined as points below or above the 0.02 and 0.98 regression quantiles respectively:
Properties and Relations (2)
QuantileRegression can be compared with FindFormula, Fit, LinearModelFit and NonlinearModelFit:
Quantile regression is much more robust than linear regression. In order to demonstrate that, add a few large outliers in the data:
Here quantile regression and linear regression are applied, as in the previous example:
Here is a plot of the obtained curves. Note that the curve corresponding to linear regression is different and a worse fit than the one from the previous example:
Possible Issues (11)
Slow computations
Because of the linear programming formulation for some data and knots specifications, the computations can be slow.
Fitting for probabilities 0 and 1 (3)
For most data, the quantile regression fitting for probabilities 0 and 1 produces regression quantiles that are "too far away from the data."
Find regression quantiles for probabilities 0 and 0.5 and plot them:
Find regression quantiles for probabilities 0.5 and 1 and plot them:
One way to fix this is to use probabilities that are close to 0 and 1 from above and below, respectively:
Overfitting (4)
Consider the following nonlinear data:
Make a quantile regression fit with 20 knots:
Make a quantile regression fit with 40 knots:
Plot the regression quantiles and the data:
You can see that the regression quantile computed with 40 knots is "overfitted" between 0 and 8—the B-spline basis knots are too densely placed between 0 and 8.
Intersecting regression quantiles (3)
When regression quantiles are overfitted, then the estimate of the conditional cumulative distribution function (CDF) can be problematic—the estimated CDF is not a monotonically increasing function.
Compute regression quantiles using “too many” knots:
Plot the regression quantiles:
Here is the estimated conditional CDF:
Rescaling (1)
For certain data, it is beneficial to rescale the predictor values, predicted values, or both before doing the quantile regression computations: