Function Repository Resource:

DatasetQuery

Source Notebook

A version of Query that maintains the options of a Dataset

Contributed by: Seth J. Chandler

ResourceFunction["DatasetQuery"][operator1,operator2,][ds]

operates just like Query (in which the successor operatori are applied at successively deeper levels of a Dataset ds) except that options from ds are inherited.

Details and Options

The following options can be given:
"Inheritance"Allcontrols inheritance of options from the original Dataset
"Output""Query"determines the form of the output
Possible settings for "Inheritance" include:
Allthe result inherits all options from the original Dataset
Nonethe result inherits no options from the original Dataset (like Query)
<|"Additions"add,"Deletions"del|>explicitly add or remove options
When using an Association for the "Inheritance" option, the two permitted keys are "Additions" and "Deletions". The "Additions" option lets you add or override presentation options that apply to the result. The "Deletions" options specifies that the listed options are not to be inherited from the underlying dataset.
Possible settings for "Output" include:
"Query"the output is the same as it would be had Query been run, except for presentation options
"OptionsOnly"the output is just the inherited presentation options; the Query is not actually run
Allthe output is an Association in which the keys are the "InheritedOptions", "ResultsWithoutInheritance", and "Result"
ResourceFunction["DatasetQuery"] can do almost everything Query can do, but not absolutely everything. The following ResourceFunction["DatasetQuery"] will not behave as expected because it interprets the list of rules in the second argument as options rather than as the application of different operators to specific parts of the result: ResourceFunction["DatasetQuery"][All,{"key1"f,"key2"g}].
Because ResourceFunction["DatasetQuery"] has no options "key1" or "key2", the kernel will issue an OptionValue::nodef error and return the input Dataset without change. If one wishes to apply different operators to specific parts of the result, one can use this code instead: ResourceFunction["DatasetQuery"][All,MapAt[f,"key1"]/*MapAt[g,"key2"]]. One can then use whatever styling options one wants: ResourceFunction["DatasetQuery"][All,MapAt[f,"key1"]/*MapAt[g,"key2"],"Inheritance"Association["Additions"{BackgroundYellow}]].

Examples

Basic Examples (2) 

Take a highly formatted Dataset and use DatasetQuery to select just the females but have the resulting Dataset inherit all options:

In[1]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/4205af68-3672-4c5e-995c-1b5ad39ae3f2"]
Out[1]=

Select just the females but do not inherit options for HeaderAlignment and override the ItemStyle so that items appear Blue:

In[2]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/447c9cc1-1a2e-46b8-854e-5914cde03e80"]
Out[2]=

Scope (2) 

You can apply Normal to a DatasetQuery in which only the query operators and any DatasetQuery options are present and there is no underlying Dataset to which DatasetQuery is applied. In this case, Normal operates the same way it does on Query:

In[3]:=
Normal[ResourceFunction["DatasetQuery"][Select[#sex == "female" &], "Inheritance" -> <|"Additions" -> {ItemStyle -> RGBColor[
       0, 0, 1]}|>, "Mode" -> "Query"]]
Out[3]=

A Query, however, can both increase or decrease the number of levels in the Dataset. Although various formatting overrides and deletions may be appropriate in these circumstances (see "Inheritance" option), Dataset does not generally "break" when option specifications assume levels and descriptions of data that no longer exist:

Format the Titanic data:

In[4]:=
formattedTitanic = ResourceFunction[
ResourceObject[<|"Name" -> "FormatDataset", "ShortName" -> "FormatDataset", "UUID" -> "76670bca-1587-4e7e-9e89-5b698a30759d", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Format a dataset using a given set of option values", "RepositoryLocation" -> URL[
       "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$66a3086203b4405b88cdb0de8a5c3128`FormatDataset", "FunctionLocation" -> CloudObject[
       "https://www.wolframcloud.com/obj/70389ad6-7dbc-48c8-b898-72c65c00f14e"]|>, ResourceSystemBase -> Automatic]][
   HeaderStyle -> Red, ItemStyle -> {None, {{Blue, Orange}}, {{2, "age"} -> Purple}}, MaxItems -> 10][ExampleData[{"Dataset", "Titanic"}]]
Out[4]=

Write a sequence of operators that reduces the number of levels in the data:

In[5]:=
ResourceFunction["DatasetQuery"][GroupBy[#class &], Mean, #age &, "Output" -> All][formattedTitanic]
Out[5]=

Write a sequence of operators that increases the number of levels in the data:

In[6]:=
ResourceFunction["DatasetQuery"][1 ;; 5, Association[#, "extra" -> Association["r1" -> RandomInteger[{0, 9}], "r2" -> RandomInteger[{0, 9}]]] &][formattedTitanic]
Out[6]=

Options (7) 

Inheritance (4) 

Setting the "Inheritance" option to None, makes DatasetQuery perform like Query:

In[7]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/9c0681d4-2f53-44d8-8e15-425b0ed14d8c"]
Out[7]=

Setting the "Inheritance" option to an Association with keys "Additions" and/or "Deletions" allows one to override formatting inheritance that would otherwise occur:

Use an ItemStyle option to format data to make the class column green and to emphasize that a 65 year old male in 1st class did not survive:

In[8]:=
ds2 = Dataset[{<|"class" -> "3rd", "age" -> 16, "sex" -> "female", "survived" -> True|>, <|"class" -> "1st", "age" -> 65, "sex" -> "male", "survived" -> False|>, <|"class" -> "3rd", "age" -> 2, "sex" -> "female", "survived" -> False|>, <|"class" -> "3rd", "age" -> Missing[], "sex" -> "female", "survived" -> True|>, <|"class" -> "3rd", "age" -> 18, "sex" -> "female", "survived" -> False|>}, ItemStyle -> {{2, _} -> Red, {_, "class"} -> Green}, HeaderStyle -> Blue]
Out[8]=

Use an "Additions" key-value pair with the "Inheritance" option to preserve the green coloring of the class column but eliminate the special coloring of the second row:

In[9]:=
ResourceFunction["DatasetQuery"][Select[#sex == "female" &], "Inheritance" -> Association["Additions" -> {ItemStyle -> {None, {Green}}}]][ds2]
Out[9]=

You can use "Additions" and "Deletions" where the sequence of operators changes the number of levels in the data.

Output (3) 

Setting the "Output"option to "OptionsOnly" yields the presentation options that the new Dataset would have but does not perform the actual Query on the Dataset. This option value may be useful in determining how to hand-tweak the presentation options, particularly where running the underlying Query is time-consuming or where one anticipates the nature of the Query will pose option inheritance complications:

In[10]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/f4d09f8d-c05a-478a-a0b9-2bc14288a440"]
Out[10]=

Hand - tweak the inherited options to replace Brown in the HeaderBackground with Purple:

In[11]:=
tweaked = Normal@Association[dsops, HeaderBackground -> {None, {{RGBColor[1, 1, 0], Purple}}}]
Out[11]=
In[12]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/21df74b7-2a6e-41b0-8654-266721c2d41a"]
Out[12]=

Setting the "Output" option to All yields an Association in which the inherited options, the query result (stripped of options), and the query result (with options) are shown:

In[13]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/fdea7bf2-718d-454e-9a3f-cb501d012538"]
Out[13]=

Applications (4) 

Format data so that the row backgrounds cycle between different shades of green, so that only 10 rows appear, so that survival is hidden, and so that the "sex" header is displayed as "gender":

In[14]:=
sexToGender[item_, _, _] := If[item == "sex", "gender", item]
In[15]:=
titanicGreen = Dataset[ExampleData[{"Dataset", "Titanic"}], Background -> {{{Lighter[Green, 0.98], Lighter[Green, 0.95]}}}, MaxItems -> 10, HeaderDisplayFunction -> (sexToGender[#1, #2, #3] &), HiddenItems -> {"survived"}]
Out[15]=

Select only those passengers age 18 or less but preserve the formatting options from the original Dataset:

In[16]:=
ResourceFunction["DatasetQuery"][Select[#age <= 18 &]][titanicGreen]
Out[16]=

Perform the same selection but reveal the hidden survival column:

In[17]:=
ResourceFunction["DatasetQuery"][Select[#age <= 18 &], "Inheritance" -> Association["Deletions" -> HiddenItems]][titanicGreen]
Out[17]=

Write a sequence of operators that computes survival on the Titanic by cabin class and gender but add new item styling and delete formatting resulting from the HeaderDisplayFunction:

In[18]:=
ResourceFunction["DatasetQuery"][GroupBy[#class &]/*KeySort, GroupBy[#sex &]/*KeySort, GroupBy[#survived &]/*KeySort, Length, "Inheritance" -> Association[
    "Additions" -> {ItemStyle -> {None, None, {Red, Darker[Green, 0.7]}}}, "Deletions" -> HeaderDisplayFunction]][titanicGreen]
Out[18]=

Possible Issues (3) 

DatasetQuery will not transform any Wolfram Language expression except a Dataset:

In[19]:=
ResourceFunction["DatasetQuery"][Select[EvenQ]][{3, 4, 5, 6, 7, 8}]
Out[19]=

Queries that change the number of levels in an expression can sometimes create issues where, for example, application of an ItemDisplayFunction or HeaderDisplayFunction no longer makes sense:

In[20]:=
sexToGender[item_, _, _] := If[item == "sex", "gender", item]
In[21]:=
titanicGreen = Dataset[ExampleData[{"Dataset", "Titanic"}], Background -> {{{Lighter[Green, 0.98], Lighter[Green, 0.95]}}}, MaxItems -> 10, HeaderDisplayFunction -> (sexToGender[#1, #2, #3] &), HiddenItems -> {"survived"}]

The HeaderDisplayFunction fails to perform as desired because, as a result of the grouping, the second level header is now a Boolean value which, when compared with a string ("sex"), fails to evaluate to True or False:

In[22]:=
ResourceFunction["DatasetQuery"][GroupBy[#class &]/*KeySort, GroupBy[#sex &]/*KeySort, GroupBy[#survived &]/*KeySort, Length][titanicGreen]
Out[22]=

We can fix this by deleting the HeaderDisplayFunction:

In[23]:=
ResourceFunction["DatasetQuery"][GroupBy[#class &]/*KeySort, GroupBy[#sex &]/*KeySort, GroupBy[#survived &]/*KeySort, Length, "Inheritance" -> Association["Deletions" -> HeaderDisplayFunction]][titanicGreen]
Out[23]=

Or, we could write a HeaderDisplayFunction that anticipates the issue:

In[24]:=
sexToGender2[item_, _, _] := If[item === "sex", "gender", item]
In[25]:=
titanicGreen2 = Dataset[ExampleData[{"Dataset", "Titanic"}], Background -> {{{Lighter[Green, 0.98], Lighter[Green, 0.95]}}}, MaxItems -> 10, HeaderDisplayFunction -> (sexToGender2[#1, #2, #3] &), HiddenItems -> {"survived"}];
In[26]:=
ResourceFunction["DatasetQuery"][GroupBy[#class &]/*KeySort, GroupBy[#sex &]/*KeySort, GroupBy[#survived &]/*KeySort, Length][titanicGreen2]
Out[26]=

The following operation works in Query:

In[27]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/e778b924-4e7e-4eee-a788-8e07602e0c27"]
Out[27]=

But not in DatasetQuery:

In[28]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/84baf44f-a893-4b2a-9aba-29761a1c99ac"]
Out[28]=

Use MapAt instead:

In[29]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/e391ac63-1d04-4ac7-bcbe-6d51a564204c"]
Out[29]=

Publisher

Seth J. Chandler

Version History

  • 1.0.0 – 29 July 2022

Related Resources

License Information