Function Repository Resource:

ExampleDataset

Source Notebook

Retrieve example data as a dataset

Contributed by: Anton Antonov

ResourceFunction["ExampleDataset"][arg]

returns the ExampleData collection specified by arg as a Dataset.

Details and Options

ResourceFunction["ExampleDataset"] has one argument that should be a viable first argument for ExampleData. Only data collections in the "MachineLearning" and "Statistics" domains are supported.

Examples

Basic Examples (2) 

Get an example dataset:

In[1]:=
ResourceFunction["ExampleDataset"][{"Statistics", "AnimalWeights"}]
Out[1]=

Get another dataset:

In[2]:=
ResourceFunction["ExampleDataset"][{"MachineLearning", "WineQuality"}]
Out[2]=

Applications (2) 

Find homes in Boston with an age greater than 98 years:

In[3]:=
ResourceFunction["ExampleDataset"][{"Statistics", "BostonHomes"}][
 Select[#AGE > 98 &]]
Out[3]=

Cross tabulate odor and edibility for mushrooms (we can see that odor is a good indicator of edibility):

In[4]:=
ResourceFunction["CrossTabulate"][
 ResourceFunction["ExampleDataset"][{"MachineLearning", "Mushroom"}][
  All, {#odor, Last[#]} &]]
Out[4]=

Possible Issues (3) 

If an unknown dataset name is specified, then the result is a Failure:

In[5]:=
ResourceFunction["ExampleDataset"][{"Statistics", "BlahBlah"}]
Out[5]=

The expected data types for the "Statistics" datasets in ExampleData are "MultivariateSample", "TimeSeries" and "EventSeries". ExampleData has the following data types:

In[6]:=
Union[ExampleData[#, "DataType"] & /@ ExampleData["Statistics"]]
Out[6]=

A Failure is returned for other data types:

In[7]:=
ResourceFunction[
 "ExampleDataset"][{"Statistics", "ScientificDiscoveries"}]
Out[7]=

Here is a summary of the successes and failures of for different data types in the "Statistics" example data collection:

In[8]:=
Quiet[ResourceFunction["RecordsSummary"]@
  Map[ExampleData[#, "DataType"] -> Head[ResourceFunction["ExampleDataset"][#]] &, ExampleData["Statistics"]]]
Out[8]=

Some "MachineLearnining" example data have data shapes and variable names that do not match. In those cases, ExampleDataset returns Failure:

In[9]:=
ResourceFunction["ExampleDataset"][{"MachineLearning", "BostonHomes"}]
Out[9]=

Compare the length of the variable names:

In[10]:=
Length[Flatten@
  Apply[List, ExampleData[{"MachineLearning", "BostonHomes"}, "VariableDescriptions"]]]
Out[10]=

With the dimensions of the data:

In[11]:=
Dimensions[
 Map[Flatten, List @@@ ExampleData[{"MachineLearning", "BostonHomes"}, "Data"]]]
Out[11]=

Here is an association that shows the successes and failures over the "MachineLearning" datasets:

In[12]:=
Association[# -> Head[ResourceFunction["ExampleDataset"][#]] & /@ ExampleData["MachineLearning"]]
Out[12]=

Neat Examples (1) 

Summaries for all "Statistics" datasets in ExampleData that have six columns:

In[13]:=
Block[{resAll},
 resAll = Quiet[Association@
    Map[# -> ResourceFunction["ExampleDataset"][#] &, ExampleData["Statistics"]]];
 ResourceFunction["RecordsSummary"] /@ Select[resAll, Head[#] === Dataset && Dimensions[#][[2]] == 6 &]
 ]
Out[13]=

Publisher

Anton Antonov

Version History

  • 1.0.0 – 24 November 2020

Related Resources

Author Notes

One way to implement the functionality of ExampleDataset is to use an overloading definition for ExampleData signatures. For example, we can have: ExampleData[spec,"Dataset"]. Alternatively, instead of overloading the unprotected symbol ExampleData, we can have a resource function.

License Information