Function Repository Resource:

TukeyMedianPolish

Source Notebook

Find row and column effects in a data matrix by repeatedly subtracting the median

Contributed by: Sjoerd Smit

ResourceFunction["TukeyMedianPolish"][mat]

repeatedly removes row and column medians from matrix mat until it no longer changes.

ResourceFunction["TukeyMedianPolish"][mat,"prop"]

specifies what property should be returned.

ResourceFunction["TukeyMedianPolish"][]

represents an operator form of ResourceFunction["TukeyMedianPolish"] that can be applied to an expression.

Details and Options

Matrix mat has to be numerical.

Median polishing is a robust exploratory data analysis method to find row and column effects in a matrix of data similar to two-way ANOVA. The main idea is that the elements of the matrix are of the form m_i,j=c+x_i+y_j+ϵ_i,j. In this model, the x_i are called the row effects, y_j the column effects, c the overall effect and the ϵ_i,j are the residuals. In order to estimate these effects, it is assumed that all of these quantities (except the constant c) average out to 0 over i and/or j. However, since the amount of data is limited, the Mean is not a good location estimator to use and instead the Median is used as a more robust alternative. Other location estimators can be specified if necessary.

To estimate c, x_i and y_j, the algorithm first subtracts the medians from the columns of mat and adds them to the y_j and also subtracts the median from the x_i vector and adds it to c. Then the algorithm repeats the process for the rows (adding the medians to the x_i) and the y_j (adding its median to c again). This process is iterated until convergence is reached.

The argument "prop" can either be "Matrix" (default) or "PropertyAssociation".

ResourceFunction["TukeyMedianPolish"] takes the following options:

"LocationEstimator"

Median

location estimator to use

MaxIterations

100

maximum number of iterations to use

Tolerance

Scaled[10^-4]

maximum (scaled) difference between two successive steps before the algorithm terminates

"ConvergenceTest"

Automatic

function that tests if convergence has been reached

Compiled

False

whether to use function compilation to speed up the computation

If a custom "LocationEstimator" is specified, it should behave in the same way as Median and Mean when applied to a matrix, meaning that locEst[matrix] should produce the same result as locEst/@Transpose[matrix].

If an option value other than Automatic is specified for "ConvergenceTest", the Tolerance option will be ignored. The algorithm will evaluate sametest[medians,current] after each iteration, with medians being the median values computed in this step and current being the current output result if the algorithm stops. When the function returns True, the algorithm stops.

For "ConvergenceTest"→ Automatic, the Tolerance option determines the stopping criterion. If Tolerance→ Scaled[tol] is specified, the default test is Function[Max@Abs[#1] < tol*Max@Abs[#2]]. If the tolerance is instead specified as Tolerance → tol, the default test is Function[Max@Abs[#1]<tol].

If the Compiled option is used, extra compilation options can be propagated into Compile by specifying Compiled→{True,opts…}.

Examples

Basic Examples (7)

Define a matrix:

In[1]:=

Out[2]=

Calculate the median polish:

In[3]:=

Out[4]=

The medians of the columns of the polished residuals are (approximately) zero:

In[5]:=

Out[5]=

As are the medians of the rows:

In[6]:=

Out[6]=

The median of the column effects is (approximately) zero:

In[7]:=

Out[7]=

As is the median of the rows effects:

In[8]:=

Out[8]=

The overall effect is not zero, since the original matrix consists of numbers ≥ 0:

In[9]:=

Out[9]=

Scope (1)

The second argument can be used to request the result to be returned as an Association instead of a matrix:

In[10]:=

Out[10]=

Options (9)

LocationEstimator (1)

Compare the results when different location estimators are used:

In[11]:=

With[{mat = RandomVariate[CauchyDistribution[], {5, 6}]},
AssociationMap[
Grid[ResourceFunction["TukeyMedianPolish"][mat, "LocationEstimator" -> #], Dividers -> {-2 -> True, -2 -> True}] &,
{Median, Mean, TrimmedMean[#, 0.2] &, WinsorizedMean[#, 0.2] &}
]
]

Out[11]=

Tolerance (2)

The medians of the row and column residuals depends on the specified tolerance:

In[12]:=

matrix = RandomReal[100, {4, 6}];
residualsColumn = Column[{
Median@#Residuals, Median /@ #Residuals,
Median @#RowEffects, Median@#ColumnEffects
}] &;
residualsColumn[
ResourceFunction["TukeyMedianPolish"][matrix, "PropertyAssociation"]]

Out[13]=

Specify a smaller tolerance to bring the medians closer to zero:

In[14]:=

Out[14]=

Specify an absolute tolerance:

In[15]:=

Out[15]=

Use the operator form to map over lists of matrices:

In[16]:=

$ResourceFunction["TukeyMedianPolish"][Tolerance -> 10^-8] /@ RandomReal[1, {3, 2, 4}]$

Out[16]=

MaxIterations (2)

Tolerances may not be met if the matrix is large or the tolerance is small:

In[17]:=

$matrix = RandomReal[1, {40, 60}]; residualsColumn = Column[{ Max@Abs[Median@#Residuals], Max@Abs[Median /@ #Residuals], Median @#RowEffects, Median@#ColumnEffects }] &; residualsColumn[ ResourceFunction["TukeyMedianPolish"][matrix, "PropertyAssociation", Tolerance -> 10^-10]]$