Function Repository Resource:

TukeyMedianPolish

Source Notebook

Find row and column effects in a data matrix by repeatedly subtracting the median

Contributed by: Sjoerd Smit

ResourceFunction["TukeyMedianPolish"][mat]

repeatedly removes row and column medians from matrix mat until it no longer changes.

ResourceFunction["TukeyMedianPolish"][mat,"prop"]

specifies what property should be returned.

ResourceFunction["TukeyMedianPolish"][]

represents an operator form of ResourceFunction["TukeyMedianPolish"] that can be applied to an expression.

Details and Options

Matrix mat has to be numerical.
Median polishing is a robust exploratory data analysis method to find row and column effects in a matrix of data similar to two-way ANOVA. The main idea is that the elements of the matrix are of the form mi,j=c+xi+yj+ϵi,j. In this model, the xi are called the row effects, yj the column effects, c the overall effect and the ϵi,j are the residuals. In order to estimate these effects, it is assumed that all of these quantities (except the constant c) average out to 0 over i and/or j. However, since the amount of data is limited, the Mean is not a good location estimator to use and instead the Median is used as a more robust alternative. Other location estimators can be specified if necessary.
To estimate c, xi and yj, the algorithm first subtracts the medians from the columns of mat and adds them to the yj and also subtracts the median from the xi vector and adds it to c. Then the algorithm repeats the process for the rows (adding the medians to the xi) and the yj (adding its median to c again). This process is iterated until convergence is reached.
The argument "prop" can either be "Matrix" (default) or "PropertyAssociation".
ResourceFunction["TukeyMedianPolish"] takes the following options:
"LocationEstimator"Medianlocation estimator to use
MaxIterations100maximum number of iterations to use
ToleranceScaled[10-4]maximum (scaled) difference between two successive steps before the algorithm terminates
"ConvergenceTest"Automaticfunction that tests if convergence has been reached
CompiledFalsewhether to use function compilation to speed up the computation
If a custom "LocationEstimator" is specified, it should behave in the same way as Median and Mean when applied to a matrix, meaning that locEst[matrix] should produce the same result as locEst/@Transpose[matrix].
If an option value other than Automatic is specified for "ConvergenceTest", the Tolerance option will be ignored. The algorithm will evaluate sametest[medians,current] after each iteration, with medians being the median values computed in this step and current being the current output result if the algorithm stops. When the function returns True, the algorithm stops.
For "ConvergenceTest" Automatic, the Tolerance option determines the stopping criterion. If Tolerance Scaled[tol] is specified, the default test is Function[Max@Abs[#1] < tol*Max@Abs[#2]]. If the tolerance is instead specified as Tolerance tol, the default test is Function[Max@Abs[#1]<tol].
If the Compiled option is used, extra compilation options can be propagated into Compile by specifying Compiled{True,opts…}.

Examples

Basic Examples (7) 

Define a matrix:

In[1]:=
matrix = N@RandomInteger[10, {4, 6}];
MatrixForm[matrix]
Out[2]=

Calculate the median polish:

In[3]:=
polish = ResourceFunction["TukeyMedianPolish"][matrix];
Grid[polish, Dividers -> {-2 -> True, -2 -> True}]
Out[4]=

The medians of the columns of the polished residuals are (approximately) zero:

In[5]:=
Median[polish[[;; -2, ;; -2]]]
Out[5]=

As are the medians of the rows:

In[6]:=
Median /@ polish[[;; -2, ;; -2]]
Out[6]=

The median of the column effects is (approximately) zero:

In[7]:=
Median[polish[[-1, ;; -2]]]
Out[7]=

As is the median of the rows effects:

In[8]:=
Median[polish[[;; -2, -1]]]
Out[8]=

The overall effect is not zero, since the original matrix consists of numbers ≥ 0:

In[9]:=
polish[[-1, -1]]
Out[9]=

Scope (1) 

The second argument can be used to request the result to be returned as an Association instead of a matrix:

In[10]:=
ResourceFunction["TukeyMedianPolish"][
 RandomReal[10, {4, 6}], "PropertyAssociation"]
Out[10]=

Options (9) 

LocationEstimator (1) 

Compare the results when different location estimators are used:

In[11]:=
With[{mat = RandomVariate[CauchyDistribution[], {5, 6}]},
 AssociationMap[
  Grid[ResourceFunction["TukeyMedianPolish"][mat, "LocationEstimator" -> #], Dividers -> {-2 -> True, -2 -> True}] &,
  {Median, Mean, TrimmedMean[#, 0.2] &, WinsorizedMean[#, 0.2] &}
  ]
 ]
Out[11]=

Tolerance (2) 

The medians of the row and column residuals depends on the specified tolerance:

In[12]:=
matrix = RandomReal[100, {4, 6}];
residualsColumn = Column[{
     Median@#Residuals, Median /@ #Residuals,
     Median @#RowEffects, Median@#ColumnEffects
     }] &;
residualsColumn[
 ResourceFunction["TukeyMedianPolish"][matrix, "PropertyAssociation"]]
Out[13]=

Specify a smaller tolerance to bring the medians closer to zero:

In[14]:=
residualsColumn@
 ResourceFunction["TukeyMedianPolish"][matrix, "PropertyAssociation", Tolerance -> Scaled[10^-8]]
Out[14]=

Specify an absolute tolerance:

In[15]:=
residualsColumn@
 ResourceFunction["TukeyMedianPolish"][matrix, "PropertyAssociation", Tolerance -> 10^-8]
Out[15]=

Use the operator form to map over lists of matrices:

In[16]:=
ResourceFunction["TukeyMedianPolish"][Tolerance -> 10^-8] /@ RandomReal[1, {3, 2, 4}]
Out[16]=

MaxIterations (2) 

Tolerances may not be met if the matrix is large or the tolerance is small:

In[17]:=
matrix = RandomReal[1, {40, 60}];
residualsColumn = Column[{
     Max@Abs[Median@#Residuals], Max@Abs[Median /@ #Residuals],
     Median @#RowEffects, Median@#ColumnEffects
     }] &;
residualsColumn[
 ResourceFunction["TukeyMedianPolish"][matrix, "PropertyAssociation", Tolerance -> 10^-10]]
Out[18]=

Increasing the MaxIterations can solve this:

In[19]:=
residualsColumn[
 ResourceFunction["TukeyMedianPolish"][matrix, "PropertyAssociation", Tolerance -> 10^-10, MaxIterations -> 1000]]
Out[19]=

Compiled (4) 

The computation speed can be increased significantly with compilation:

In[20]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/bb818904-03e5-49b2-898d-cdb361ec42e6"]
Out[21]=

Compiled computations will be done at machine precision, while the default computation will retain precision:

In[22]:=
Column[{
  ResourceFunction["TukeyMedianPolish"][{{1, 2}, {3, 4}}, Compiled -> False],
  ResourceFunction["TukeyMedianPolish"][{{1, 2}, {3, 4}}, Compiled -> True]
  }]
Out[22]=

The operator form returns a CompiledFunction if the CompiledTrue option is used:

In[23]:=
ResourceFunction["TukeyMedianPolish"][Compiled -> True]
Out[23]=

You can set the Listable attribute to the compiled function so that it automatically threads over multiple inputs:

In[24]:=
MatrixForm /@ ResourceFunction["TukeyMedianPolish"][
   Compiled -> {True, RuntimeAttributes -> {Listable}}][{{{1, 2}, {3, 4}}, {{1, 2, 3}, {4, 5, 6}}}]
Out[24]=

Possible Issues (1) 

Using exact numbers can produce very large expressions:

In[25]:=
ResourceFunction["TukeyMedianPolish"][{{1, Sqrt[2]}, {3, 4}}, MaxIterations -> 2]
Out[25]=

Publisher

Sjoerd Smit

Version History

  • 1.0.0 – 10 March 2020

Related Resources

License Information