Function Repository Resource:

PermutationTest

Source Notebook

Compare two data samples using a specified test statistic

Contributed by: Felipe Amorim

ResourceFunction["PermutationTest"][{data1,data2}]

tests whether data1 and data2 differ according to a specified test statistic.

ResourceFunction["PermutationTest"][{data1,data2},"property"]

returns the "property" of the test.

ResourceFunction["PermutationTest"][{data1,data2},{"prop1","prop2",}]

returns an association of values for the "propi".

Details and Options

A permutation test is a non-parametric hypothesis test that compares two data samples. It works by repeatedly permuting the combined samples and computing the test statistic of interest for each permutation. The distribution of the permuted statistics is then compared to the original test statistic in order to assess how unusual the observed value is under the null hypothesis (no statistically significant difference).
By default, ResourceFunction["PermutationTest"] uses μ1-μ2 as test statistic, where μ represents the Mean of the data.
ResourceFunction["PermutationTest"] does not make assumptions about the data distribution and can use any test statistic on the data.
By default, ResourceFunction["PermutationTest"] returns a two-tailed p-value, which represents the probability of observing a test statistic measurement that is more extreme than the test statistic calculated from the original samples. If the p-value is smaller than a specified α value, which usually is 0.05, the null hypothesis is rejected.
ResourceFunction["PermutationTest"] supports the following properties as the second argument:
"PValue"two-tailed p-value relative to the observed statistic
"PValueRight"one-tailed p-value for measurements greater than the observed statistic
"PValueLeft"one-tailed p-value for measurements smaller than the observed statistic
"OriginalTestStatistic"test statistic calculated from the original samples
"TestStatisticValues"test statistic values calculated for each permutation
"TestStaticticHistogram"Histogram of the test statistic values with the original test statistic represented by an infinite line
"PermutationCount"number of permutations used in the test
AllAssociation containing all permutation test data
ResourceFunction["PermutationTest"] supports the following options:
"TestStatisticFunction"Function[{d1,d2},Mean[d1]-Mean[d2]]statistical function used to compare the data
"PermutationCount"10000number of permutations used in the test
RandomSeedingAutomaticseeding value for the random generation of permutations

Examples

Basic Examples (2) 

Generate some data:

In[1]:=
data1 = {320, 480, 290, 350};
data2 = {180, 260, 220, 275};

Test whether their means differ statistically, by calculating a p-value:

In[2]:=
ResourceFunction["PermutationTest"][{data1, data2}]
Out[2]=

Generate many properties for the permutation test:

In[3]:=
ResourceFunction[
 "PermutationTest"][{{320, 480, 210, 310}, {180, 260, 220, 291}}, All]
Out[3]=

Scope (5) 

Retrieve the two-tailed and both one-tailed p-values for a given test:

In[4]:=
ResourceFunction[
 "PermutationTest"][{{320, 480, 210, 310}, {180, 260, 220}}, {"PValue",
   "PValueRight", "PValueLeft"}]
Out[4]=

The "TestStatisticValues" property returns the test statistic calculated for each permutation:

In[5]:=
Short[ResourceFunction[
  "PermutationTest"][{{320, 480, 210, 310}, {180, 260, 220}}, "TestStatisticValues"]]
Out[5]=

Retrieve a Histogram representing the distribution of test statistic values for each permutation. The red line represents the original test statistic:

In[6]:=
ResourceFunction[
 "PermutationTest"][{{320, 480, 210, 310}, {180, 260, 220}}, "TestStaticticHistogram"]
Out[6]=

Use All to retrieve a summary Association containing all test data:

In[7]:=
testAssoc = ResourceFunction[
  "PermutationTest"][{{320, 480, 210, 310}, {180, 260, 220, 291}}, All]
Out[7]=

Use the "OriginalTestStatistic" property to retrieve the test statistic applied on the given data samples:

In[8]:=
data1 = {320, 480, 290, 350};
data2 = {180, 260, 220, 275};
ResourceFunction[
 "PermutationTest"][{data1, data2}, "OriginalTestStatistic"]
Out[10]=

Notice how it maches with the mean difference:

In[11]:=
N[Mean[data1] - Mean[data2]]
Out[11]=

Options (3) 

Specify the test statistic of interest:

In[12]:=
ResourceFunction[
 "PermutationTest"][{{320, 480, 290, 250}, {180, 260, 220}}, "TestStatisticFunction" -> (Median[#1] - Median[#2] &)]
Out[12]=

Change the number of permutations performed:

In[13]:=
ResourceFunction[
 "PermutationTest"][{{320, 480, 290, 250}, {180, 260, 220}}, "PermutationCount" -> 100]
Out[13]=

Specify a RandomSeeding value in order to have the same set of permutations for each test:

In[14]:=
Table[ResourceFunction[
  "PermutationTest"][{{320, 480, 290, 250}, {180, 260, 220}}, RandomSeeding -> 1], 3]
Out[14]=

Setting RandomSeeding to Automatic will create different permutations for each test, leading to different test results:

In[15]:=
Table[ResourceFunction[
  "PermutationTest"][{{320, 480, 290, 250}, {180, 260, 220}}, RandomSeeding -> Automatic], 3]
Out[15]=

Properties and Relations (2) 

TTest assumes that the data is normally distributed:

In[16]:=
data1 = RandomVariate[ParetoDistribution[1, 2], 50];
data2 = RandomVariate[ParetoDistribution[3, 2], 50];
TTest[{data1, data2}]
Out[18]=

On the other hand, PermutationTest does not make assumptions about the data distribution:

In[19]:=
ResourceFunction["PermutationTest"][{data1, data2}]
Out[19]=

Possible Issues (1) 

The test statistic must be a Function expression:

In[20]:=
ResourceFunction[
 "PermutationTest"][{{320, 480, 290, 250}, {180, 260, 220}}, "TestStatisticFunction" -> 1]
Out[20]=

Neat Examples (4) 

Retrieve data on tennis match statistics:

In[21]:=
data = ResourceData[\!\(\*
TagBox["\"\<ATP Tennis Singles Matches\>\"",
#& ,
BoxID -> "ResourceTag-ATP Tennis Singles Matches-Input",
AutoDelete->True]\)]
Out[21]=

Select valid entries:

In[22]:=
validTab = Select[data, And @@ NumericQ /@ {#W1stServeIn, #WServePoints, #L1stServeIn, #LServePoints} && (#WServePoints > 0 && #LServePoints > 0) &]
Out[22]=

Calculate the percentage of first-serves for winners and losers for each match:

In[23]:=
{statWinners, statLosers} = {validTab[All, "W1stServeIn"]/
   validTab[All, "WServePoints"], validTab[All, "L1stServeIn"]/validTab[All, "LServePoints"]}
Out[23]=

Check how the winners' first-serve percentage is statistically higher than that of the losers:

In[24]:=
AbsoluteTiming[
 ResourceFunction[
  "PermutationTest"][{Normal@statWinners, Normal@statLosers}, "PermutationCount" -> 300]]
Out[24]=

Publisher

Felipe Amorim

Requirements

Wolfram Language 13.2 (December 2022) or above

Version History

  • 1.0.0 – 05 December 2025

Related Resources

License Information