Function Repository Resource:

TrainTestSplit

Source Notebook

Split data into training and testing sets

Contributed by: Michael Sollami

ResourceFunction["TrainTestSplit"][data]

splits data into a pair of shuffled training and testing sets.

Details and Options

ResourceFunction["TrainTestSplit"] accepts the following options:
"TrainingSetSize"Scaled[0.8]size of the training set
"TestSetSize"Scaled[0.2]size of the testing set
"Shuffle"Truewhether to shuffle the sets

Examples

Basic Examples (1) 

The default test size is 20%:

In[1]:=
ResourceFunction["TrainTestSplit"][# & /@ Range[10]]
Out[1]=

Scope (3) 

Specify a non-default test set size as a scaled value:

In[2]:=
ResourceFunction["TrainTestSplit"][# -> EvenQ[#] & /@ Range[10], "TestSetSize" -> Scaled[0.5]]
Out[2]=

Specify a non-default test set size as an explicit value:

In[3]:=
ResourceFunction["TrainTestSplit"][# -> EvenQ[#] & /@ Range[10], "TestSetSize" -> 3]
Out[3]=

Specify a non-default training set size (a real value is taken as a Scaled):

In[4]:=
ResourceFunction["TrainTestSplit"][# -> EvenQ[#] & /@ Range[10], "TrainingSetSize" -> 0.9]
Out[4]=

Options (2) 

By default, samples are shuffled:

In[5]:=
ResourceFunction["TrainTestSplit"][Range[10]]
Out[5]=

Use "Shuffle"False to ensure that a sample keeps its original ordering:

In[6]:=
ResourceFunction["TrainTestSplit"][Range[10], "Shuffle" -> False]
Out[6]=

Possible Issues (2) 

You must give sensible sizes:

In[7]:=
ResourceFunction["TrainTestSplit"][# -> EvenQ[#] & /@ Range[10], "TrainingSetSize" -> \[Infinity]]
Out[7]=

The option "TrainSize" takes precedence over "TestSize":

In[8]:=
ResourceFunction["TrainTestSplit"][Range[10], "TrainingSetSize" -> 10,
  "TestSetSize" -> 10, "Shuffle" -> False]
Out[8]=

Publisher

Michael Sollami

Version History

  • 1.0.0 – 13 February 2020

Related Resources

License Information