Function Repository Resource:

PermutationTest

Source Notebook

Compare two data samples using a specified test statistic

Contributed by: Felipe Amorim

ResourceFunction["PermutationTest"][{data₁,data₂}]

tests whether data₁ and data₂ differ according to a specified test statistic.

ResourceFunction["PermutationTest"][{data₁,data₂},"property"]

returns the "property" of the test.

ResourceFunction["PermutationTest"][{data₁,data₂},{"prop₁","prop₂",…}]

returns an association of values for the "prop_i".

Details and Options

A permutation test is a non-parametric hypothesis test that compares two data samples. It works by repeatedly permuting the combined samples and computing the test statistic of interest for each permutation. The distribution of the permuted statistics is then compared to the original test statistic in order to assess how unusual the observed value is under the null hypothesis (no statistically significant difference).

By default, ResourceFunction["PermutationTest"] uses μ₁-μ₂ as test statistic, where μ represents the Mean of the data.

ResourceFunction["PermutationTest"] does not make assumptions about the data distribution and can use any test statistic on the data.

By default, ResourceFunction["PermutationTest"] returns a two-tailed p-value, which represents the probability of observing a test statistic measurement that is more extreme than the test statistic calculated from the original samples. If the p-value is smaller than a specified α value, which usually is 0.05, the null hypothesis is rejected.

ResourceFunction["PermutationTest"] supports the following properties as the second argument:

"PValue"

two-tailed p-value relative to the observed statistic

"PValueRight"

one-tailed p-value for measurements greater than the observed statistic

"PValueLeft"

one-tailed p-value for measurements smaller than the observed statistic

"OriginalTestStatistic"

test statistic calculated from the original samples

"TestStatisticValues"

test statistic values calculated for each permutation

"TestStaticticHistogram"

Histogram of the test statistic values with the original test statistic represented by an infinite line

"PermutationCount"

number of permutations used in the test

All

Association containing all permutation test data

ResourceFunction["PermutationTest"] supports the following options:

"TestStatisticFunction"

Function[{d₁,d₂},Mean[d₁]-Mean[d₂]]

statistical function used to compare the data

"PermutationCount"

10000

number of permutations used in the test

RandomSeeding

Automatic

seeding value for the random generation of permutations

Examples

Basic Examples (2)

Generate some data:

In[1]:=

Test whether their means differ statistically, by calculating a p-value:

In[2]:=

Out[2]=

Generate many properties for the permutation test:

In[3]:=

Out[3]=

Scope (5)

Retrieve the two-tailed and both one-tailed p-values for a given test:

In[4]:=

Out[4]=

The "TestStatisticValues" property returns the test statistic calculated for each permutation:

In[5]:=

Out[5]=

Retrieve a Histogram representing the distribution of test statistic values for each permutation. The red line represents the original test statistic:

In[6]:=

Out[6]=

Use All to retrieve a summary Association containing all test data:

In[7]:=

Out[7]=

Use the "OriginalTestStatistic" property to retrieve the test statistic applied on the given data samples:

In[8]:=

data1 = {320, 480, 290, 350};
data2 = {180, 260, 220, 275};
ResourceFunction[
"PermutationTest"][{data1, data2}, "OriginalTestStatistic"]

Out[10]=

Notice how it maches with the mean difference:

In[11]:=

Out[11]=

Options (3)

Specify the test statistic of interest:

In[12]:=

Out[12]=

Change the number of permutations performed:

In[13]:=

Out[13]=

Specify a RandomSeeding value in order to have the same set of permutations for each test:

In[14]:=

Out[14]=

Setting RandomSeeding to Automatic will create different permutations for each test, leading to different test results:

In[15]:=

Out[15]=

Properties and Relations (2)

TTest assumes that the data is normally distributed:

In[16]:=

data1 = RandomVariate[ParetoDistribution[1, 2], 50];
data2 = RandomVariate[ParetoDistribution[3, 2], 50];
TTest[{data1, data2}]

Out[18]=

On the other hand, PermutationTest does not make assumptions about the data distribution:

In[19]:=

Out[19]=

Possible Issues (1)

The test statistic must be a Function expression:

In[20]:=

Out[20]=

Neat Examples (4)

Retrieve data on tennis match statistics:

In[21]:=

$data = ResourceData[\!$\* TagBox["\"\<ATP Tennis Singles Matches\>\"", #& , BoxID -> "ResourceTag-ATP Tennis Singles Matches-Input", AutoDelete->True]$]$