Function Repository Resource:

ResetDataset

Source Notebook

Force a reanalysis of the types contained in a Dataset, sometimes leading to a different presentation of the data

Contributed by: Seth J. Chandler

ResourceFunction["ResetDataset"][d]

computes the Normal form of d, a Dataset, and then wraps Dataset around the Normal form, causing various heuristics to attempt anew to deduce the types of the contained data.

Details and Options

Improved deduction of types in the data will often improve the visual presentation of the information in the Dataset. It may also permit better compression of the data stored therein, as well as optimization of a Query executed on the data.

Examples

Basic Examples (1) 

ResetDataset conducts useful type deduction and provides attractive column headers for the data after type inference in determining the yth part of the original Dataset fails to capture all the regularities:

In[1]:=
With[{ds = Dataset[<|"x" -> <|"a" -> <|"c" -> 9, "d" -> 4|>, "b" -> <|"c" -> 3, "d" -> 4|>|>, "y" -> <|"a" -> <|"c" -> 8, "d" -> 5|>, "b" -> <|"c" -> 8, "d" -> 5|>|>|>]},
 {ds["y"], ResourceFunction["ResetDataset"][ds["y"]]}
 ]
Out[1]=

Scope (1) 

Here the passengers on the Titanic are grouped according to sex and cabin class and then counted, but the original grouping fails to realize that the data is better presented in a “spread” format with row headers and column headers, rather than in a narrow way:

In[2]:=
titanic = ExampleData[{"Dataset", "Titanic"}];
In[3]:=
Module[{beforeReset, afterReset},
 beforeReset = titanic[GroupBy[#sex &], GroupBy[#class &], Length, #survived &];
 afterReset = ResourceFunction["ResetDataset"][beforeReset];
 {Labeled[beforeReset, "before reset"], Labeled[afterReset, "after reset"]}
 ]
Out[3]=

Properties and Relations (1) 

One can see that the internal representation of the data can change after ResetDataset, here being transformed from a struct in which the values are associations to an Association in which the values are structs:

In[4]:=
With[{ds = Dataset[<|"x" -> <|"a" -> <|"c" -> 9, "d" -> 4|>, "b" -> <|"c" -> 3, "d" -> 4|>|>, "y" -> <|"a" -> <|"c" -> 8, "d" -> 5|>, "b" -> <|"c" -> 8, "d" -> 5|>|>|>]},
 Column[{Framed@TreeForm[Dataset`GetType[ds["y"]], ImageSize -> 500], Framed@TreeForm[
     Dataset`GetType[ResourceFunction["ResetDataset"][ds["y"]]], ImageSize -> 500]}]
 ]
Out[4]=

Possible Issues (3) 

Often ResetDataset does nothing, though it would not seem to do any harm:

In[5]:=
With[{d = Dataset[{3, 4}]}, {d, ResourceFunction["ResetDataset"][d]}]
Out[5]=

There are limits to what ResetDataset can do. It does not, for example, sort the keys in a way that would produce a more attractive presentation:

In[6]:=
With[{ds = Dataset[{Association["a" -> 4, "b" -> 5], Association["b" -> 6, "a" -> 2]}]},
 {ds, ResourceFunction["ResetDataset"][ds], ds[Map[KeySort]]}]
Out[6]=

There are methods other than ResetDataset for transforming the presentation of a Dataset, such as sorting the keys:

In[7]:=
Module[{f1 = GroupBy[#class &], f2 = GroupBy[#sex &], f3 = Mean/*Round, f4 = Quantity[#age, "Years"] &},
 queryResult1 = titanic[f1, f2, f3, f4];
 queryResult2 = titanic[f1, f2/*KeySort, f3, f4];
 {queryResult1, queryResult2}
 ]
Out[7]=

Publisher

Seth J. Chandler

Version History

  • 1.0.0 – 19 June 2019

Author Notes

Future versions of the Wolfram Language may feature improved heuristics for inferring types following various operations and make this function less useful, but for now it can greatly help in data analysis and presentation.

License Information