Function Repository Resource:

GroupMerge

Source Notebook

Group rows in data by key values and summarize the remaining keys with a merging operator

Contributed by: Sjoerd Smit

ResourceFunction["GroupMerge"][{assoc1,assoc2,}, key, Merge[f]]

creates a new list of associations by first grouping the data by key and then merging the remaining columns of the grouped data with Merge[f].

ResourceFunction["GroupMerge"][{assoc1,assoc2,}, key, Automatic]

creates a new list of associations by first grouping the data by key and then populating the remaining columns with Missing[…] or a value (if it is unique).

ResourceFunction["GroupMerge"][data, keyg, ]

groups the data according to function g and stores the results in column key.

ResourceFunction["GroupMerge"][data, {spec1,spec2,}, ]

combines multiple simultaneous groupings.

ResourceFunction["GroupMerge"][data, spec, {lbl1f1,lbl2f2,}]

adds new columns lbli to the data obtained by applying fi to the groups.

ResourceFunction["GroupMerge"][key, f]

represents an operator form of ResourceFunction["GroupMerge"] that can be applied to an expression.

Details

ResourceFunction["GroupMerge"] works on Dataset objects.
In the second argument, key can be any specification that can be used with Part: a string, an integer or a Key[] expression.
In the third argument, the merging operator does not need to be of the form Merge[f]. Any operator (such as CountsBy) that returns an association when applied to a list of associations is valid.
The Automatic merging function returns a unique value for a column if it exists and Missing["NotAvailable"] otherwise.

Examples

Basic Examples (2) 

Perform a simple group merge:

In[1]:=
ResourceFunction["GroupMerge"][
 {
  <|"a" -> 1, "b" -> 1|>,
  <|"a" -> 1, "b" -> 2, "c" -> "x"|>,
  <|"a" -> 2, "b" -> 3, "c" -> "y"|>,
  <|"a" -> 2, "b" -> 3, "c" -> "z"|>
  },
 "a",
 Automatic
 ]
Out[1]=

Create a new dataset with rows corresponding to the unique values of the "a" column:

In[2]:=
data = {
   <|"a" -> 1, "b" -> 1|>,
   <|"a" -> 1, "b" -> 2, "c" -> "x"|>,
   <|"a" -> 1, "b" -> 2, "c" -> "y"|>,
   <|"a" -> 2, "b" -> 3|>,
   <|"a" -> 2|>
   };
In[3]:=
ResourceFunction["GroupMerge"][
 data,
 "a",
 Merge[f]
 ]
Out[3]=

Count the number of rows for each value of "a"

In[4]:=
ResourceFunction["GroupMerge"][
 data,
 "a",
 "counts" -> Length
 ]
Out[4]=

Combine merged rows and counts:

In[5]:=
ResourceFunction["GroupMerge"][
 data,
 "a",
 {Automatic, "counts" -> Length}
 ]
Out[5]=
In[6]:=
ResourceFunction["GroupMerge"][
 data,
 "a",
 {Merge[f], "counts" -> Length}
 ]
Out[6]=

Scope (5) 

Summarize data in the Titanic example Dataset:

In[7]:=
titanic = ExampleData[{"Dataset", "Titanic"}];

Group by gender, class and take the mean of the ages:

In[8]:=
Sort@ResourceFunction["GroupMerge"][titanic ,
   {"sex", "class"},
  {"average_age" -> Query[Lookup["age"]/*Mean/*N], "count" -> Length}
  ]
Out[8]=

Group by an arbitrary grouping function:

In[9]:=
Sort@ResourceFunction["GroupMerge"][titanic ,
  {"under 30?" -> Function[TrueQ[#age < 30]], "class"},
  {
   "count" -> Length
   }
  ]
Out[9]=

Any function that outputs an Association can be used in the 3rd argument:

In[10]:=
Sort@ResourceFunction["GroupMerge"][titanic ,
  {"under 30?" -> Function[TrueQ[#age < 30]], "class"},
  {
   "count" -> Length,
   CountsBy[Key["sex"]]
   }
  ]
Out[10]=

GroupMerge can also be used as an operator:

In[11]:=
titanic[
 ResourceFunction["GroupMerge"][
  {"under 30?" -> Function[TrueQ[#age < 30]], "class"},
  {
   "count" -> Length,
   CountsBy[Key["sex"]]
   }
  ]
 ]
Out[11]=

Properties and Relations (1) 

The Automatic merging function is equivalent to:

In[12]:=
data = {
   <|"a" -> 1, "b" -> 1|>,
   <|"a" -> 1, "b" -> 2, "c" -> "x"|>,
   <|"a" -> 2, "b" -> 3, "c" -> "y"|>,
   <|"a" -> 2, "b" -> 3, "c" -> "z"|>
   };
In[13]:=
ResourceFunction["GroupMerge"][
 data,
 "a",
 Merge[
  Replace[
   {
    list : {__} /; SameQ @@ list :> First[list],
    _ :> Missing["NotAvailable"]
    }
   ]
  ]
 ]
Out[13]=
In[14]:=
ResourceFunction["GroupMerge"][
 data,
 "a",
 Automatic
 ]
Out[14]=

Neat Examples (1) 

Use the resource function MergeByKey to summarize different columns in different ways:

In[15]:=
ResourceFunction["GroupMerge"][
 ExampleData[{"Dataset", "Titanic"}] ,
 "under 30?" -> Function[TrueQ[#age < 30]],
 ResourceFunction["MergeByKey"][
  {"age" -> Histogram},
  BarChart[KeySort@Counts[#], ChartLabels -> Placed[Automatic, Top]] &
  ]
 ]
Out[15]=

Publisher

Sjoerd Smit

Version History

  • 1.1.0 – 29 March 2023
  • 1.0.0 – 27 March 2023

Related Resources

License Information