# Wolfram Function Repository

Instant-use add-on functions for the Wolfram Language

Function Repository Resource:

Find row and column effects in a data matrix by repeatedly subtracting the median

Contributed by:
Sjoerd Smit

ResourceFunction["TukeyMedianPolish"][ repeatedly removes row and column medians from matrix | |

ResourceFunction["TukeyMedianPolish"][ specifies what property should be returned. | |

ResourceFunction["TukeyMedianPolish"][] represents an operator form of ResourceFunction["TukeyMedianPolish"] that can be applied to an expression. |

Matrix *mat* has to be numerical.

Median polishing is a robust exploratory data analysis method to find row and column effects in a matrix of data similar to two-way ANOVA. The main idea is that the elements of the matrix are of the form *m*_{i,j}=*c*+*x*_{i}+*y*_{j}+*ϵ*_{i,j}. In this model, the *x*_{i} are called the *row effects*, *y*_{j} the *column effects*, *c* the *overall effect* and the *ϵ*_{i,j} are the residuals. In order to estimate these effects, it is assumed that all of these quantities (except the constant *c*) average out to 0 over *i* and/or *j*. However, since the amount of data is limited, the Mean is not a good location estimator to use and instead the Median is used as a more robust alternative. Other location estimators can be specified if necessary.

To estimate *c*, *x*_{i} and *y*_{j}, the algorithm first subtracts the medians from the columns of *mat* and adds them to the *y*_{j} and also subtracts the median from the *x*_{i} vector and adds it to *c*. Then the algorithm repeats the process for the rows (adding the medians to the *x*_{i}) and the *y*_{j} (adding its median to *c* again). This process is iterated until convergence is reached.

The argument "*prop*" can either be "Matrix" (default) or "PropertyAssociation".

ResourceFunction["TukeyMedianPolish"] takes the following options:

"LocationEstimator" | Median | location estimator to use |

MaxIterations | 100 | maximum number of iterations to use |

Tolerance | Scaled[10^{-4}] | maximum (scaled) difference between two successive steps before the algorithm terminates |

"ConvergenceTest" | Automatic | function that tests if convergence has been reached |

Compiled | False | whether to use function compilation to speed up the computation |

If a custom "LocationEstimator" is specified, it should behave in the same way as Median and Mean when applied to a matrix, meaning that *locEst*[*matrix*] should produce the same result as *locEst*/@Transpose[*matrix*].

If an option value other than Automatic is specified for "ConvergenceTest", the Tolerance option will be ignored. The algorithm will evaluate *sametest*[*medians*,*current*] after each iteration, with *medians* being the median values computed in this step and *current* being the current output result if the algorithm stops. When the function returns True, the algorithm stops.

Define a matrix:

In[1]:= |

Out[2]= |

Calculate the median polish:

In[3]:= |

Out[4]= |

The medians of the columns of the polished residuals are (approximately) zero:

In[5]:= |

Out[5]= |

As are the medians of the rows:

In[6]:= |

Out[6]= |

The median of the column effects is (approximately) zero:

In[7]:= |

Out[7]= |

As is the median of the rows effects:

In[8]:= |

Out[8]= |

The overall effect is not zero, since the original matrix consists of numbers ≥ 0:

In[9]:= |

Out[9]= |

The second argument can be used to request the result to be returned as an Association instead of a matrix:

In[10]:= |

Out[10]= |

Compare the results when different location estimators are used:

In[11]:= |

Out[11]= |

The medians of the row and column residuals depends on the specified tolerance:

In[12]:= |

Out[13]= |

Specify a smaller tolerance to bring the medians closer to zero:

In[14]:= |

Out[14]= |

Specify an absolute tolerance:

In[15]:= |

Out[15]= |

Use the operator form to map over lists of matrices:

In[16]:= |

Out[16]= |

Tolerances may not be met if the matrix is large or the tolerance is small:

In[17]:= |

Out[18]= |

Increasing the MaxIterations can solve this:

In[19]:= |

Out[19]= |

The computation speed can be increased significantly with compilation:

In[20]:= |

Out[21]= |

Compiled computations will be done at machine precision, while the default computation will retain precision:

In[22]:= |

Out[22]= |

The operator form returns a CompiledFunction if the Compiled→True option is used:

In[23]:= |

Out[23]= |

You can set the Listable attribute to the compiled function so that it automatically threads over multiple inputs:

In[24]:= |

Out[24]= |

Using exact numbers can produce very large expressions:

In[25]:= |

Out[25]= |

- 1.0.0 – 10 March 2020

This work is licensed under a Creative Commons Attribution 4.0 International License