Wolfram Research

Function Repository Resource:

ResetDataset

Source Notebook

Force a reanalysis of the types contained in a Dataset, sometimes leading to a different presentation of the data

Contributed by: Seth J. Chandler

ResourceFunction["ResetDataset"][d]

computes the Normal form of d, a Dataset, and then wraps Dataset around the Normal form, causing various heuristics to attempt anew to deduce the types of the contained data.

Details and Options

Improved deduction of types in the data will often improve the visual presentation of the information in the Dataset. It may also permit better compression of the data stored therein, as well as optimization of a Query executed on the data.

Examples

Basic Examples

ResetDataset conducts useful type deduction and provides attractive column headers for the data after type inference in determining the yth part of the original Dataset fails to capture all the regularities:

In[1]:=
With[{ds = Dataset[<|"x" -> <|"a" -> <|"c" -> 9, "d" -> 4|>, "b" -> <|"c" -> 3, "d" -> 4|>|>, "y" -> <|"a" -> <|"c" -> 8, "d" -> 5|>, "b" -> <|"c" -> 8, "d" -> 5|>|>|>]},
 {ds["y"], ResourceFunction["ResetDataset"][ds["y"]]}
 ]
Out[1]=

Scope

Here the passengers on the Titanic are grouped according to sex and cabin class and then counted, but the original grouping fails to realize that the data is better presented in a “spread” format with row headers and column headers, rather than in a narrow way:

In[2]:=
titanic = ExampleData[{"Dataset", "Titanic"}];
In[3]:=
Module[{beforeReset, afterReset},
 beforeReset = titanic[GroupBy[#sex &], GroupBy[#class &], Length, #survived &];
 afterReset = ResourceFunction["ResetDataset"][beforeReset];
 {Labeled[beforeReset, "before reset"], Labeled[afterReset, "after reset"]}
 ]
Out[3]=

Properties and Relations

One can see that the internal representation of the data can change after ResetDataset, here being transformed from a struct in which the values are associations to an Association in which the values are structs:

In[4]:=
With[{ds = Dataset[<|"x" -> <|"a" -> <|"c" -> 9, "d" -> 4|>, "b" -> <|"c" -> 3, "d" -> 4|>|>, "y" -> <|"a" -> <|"c" -> 8, "d" -> 5|>, "b" -> <|"c" -> 8, "d" -> 5|>|>|>]},
 Column[{Framed@TreeForm[Dataset`GetType[ds["y"]], ImageSize -> 500], Framed@TreeForm[
     Dataset`GetType[ResourceFunction["ResetDataset"][ds["y"]]], ImageSize -> 500]}]
 ]
Out[4]=

Possible Issues

Often ResetDataset does nothing, though it would not seem to do any harm:

In[5]:=
With[{d = Dataset[{3, 4}]}, {d, ResourceFunction["ResetDataset"][d]}]
Out[5]=

There are limits to what ResetDataset can do; it does not, for example sort the keys in a way that would produce a more attractive presentation:

In[6]:=
With[{ds = Dataset[{Association["a" -> 4, "b" -> 5], Association["b" -> 6, "a" -> 2]}]},
 {ds, ResourceFunction["ResetDataset"][ds], ds[Map[KeySort]]}]
Out[6]=

There are methods other than ResetDataset of transforming the presentation of a Dataset, such as sorting the keys:

In[7]:=
Module[{f1 = GroupBy[#class &], f2 = GroupBy[#sex &], f3 = Mean /* Round, f4 = Quantity[#age, "Years"] &},
 queryResult1 = titanic[f1, f2, f3, f4];
 queryResult2 = titanic[f1, f2 /* KeySort, f3, f4];
 {queryResult1, queryResult2}
 ]
Out[7]=

Resource History

Source Metadata

License Information