Function Repository Resource:

MergeByKey

Source Notebook

Merge a list of associations using different merge functions for different keys

Contributed by: Sjoerd Smit

ResourceFunction["MergeByKey"][{assoc1,assoc2,},{key1f1,key2f2,}]

merges the associations associ, using the the functions fi for combining the values of keys keyi.

ResourceFunction["MergeByKey"][{assoc1,assoc2,},{key1f1,key2f2,},fdefault]

uses fdefault as the merging function for any key not specified.

ResourceFunction["MergeByKey"][{assoc1,assoc2,},{,{keyi,1,keyi,2,}fi,},]

uses fi for merging keys keyi,j.

ResourceFunction["MergeByKey"][{key1 f1, key2 f2, }]

represents an operator form of ResourceFunction["MergeByKey"] that can be applied to an expression.

ResourceFunction["MergeByKey"][{key1 f1, key2 f2, }, fdefault]

gives an operator with a default merging function.

Details and Options

If no default merging function is specified, any key without a merging function will be assigned a list of all values found for that key. This is equivalent to merging with the function Identity.
The Key wrapper can be used to disambiguate between multiple keys that use the same merging function and a single key that is a List.
Merging functions that are specified for keys that do not exist in the data are ignored.

Examples

Basic Examples (6) 

Merge the values of keys a and b with different functions to combine the values:

In[1]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, {a -> Total,
   b -> RootMeanSquare}]
Out[1]=

Not all associations need to have the same keys:

In[2]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10, c -> Pi|>, <|
   c -> E|>}, {a -> Total, b -> RootMeanSquare, c -> Mean}]
Out[2]=

Specify multiple keys that use the same merging function:

In[3]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10, c -> Pi|>, <|
   c -> E|>}, {{a, b} -> Total, c -> Mean}]
Out[3]=

If no function is specified for a given key, all values are returned as a List:

In[4]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, {a -> Total}]
Out[4]=

Specify a default function for merging unspecified keys:

In[5]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, {a -> Total}, RootMeanSquare]
Out[5]=

Define two operator forms and data:

In[6]:=
merge1 = ResourceFunction["MergeByKey"][{a -> Total}];
merge2 = ResourceFunction["MergeByKey"][{a -> Total}, RootMeanSquare];
data = {<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>};

Apply them to the data:

In[7]:=
merge1@data
Out[7]=
In[8]:=
merge2@data
Out[8]=

Applications (3) 

Quickly summarize categorical and numerical columns of a dataset:

In[9]:=
data = Normal@ExampleData[{"Dataset", "Titanic"}];
ResourceFunction["MergeByKey"][data,
 {"age" -> Histogram},
 BarChart[Counts[#], ChartLabels -> Automatic] &
 ]
Out[10]=

Turn the columns into distributions:

In[11]:=
ResourceFunction["MergeByKey"][data,
 {"age" -> DeleteMissing/*EmpiricalDistribution},
 CategoricalDistribution
 ]
Out[11]=

Draw samples from these distributions. Note that this is different from drawing a random row from the original dataset because it doesn't account for correlations between the columns:

In[12]:=
RandomVariate /@ %
Out[12]=

Properties and Relations (2) 

Only specifying a default merging function is equivalent to using Merge:

In[13]:=
ResourceFunction[
 "MergeByKey"][{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, {}, Total]
Out[13]=
In[14]:=
Merge[{<|a -> 1, b -> 2|>, <|a -> 5, b -> 10|>}, Total]
Out[14]=

For large datasets, it can be used as a faster alternative to Merge:

In[15]:=
data = Join @@ ConstantArray[Normal @ ExampleData[{"Dataset", "Titanic"}], 10];
merge1 = Merge[data, Counts]; // RepeatedTiming
Out[16]=
In[17]:=
merge2 = ResourceFunction["MergeByKey"][data, {}, Counts]; // RepeatedTiming
Out[17]=
In[18]:=
merge1 === merge2
Out[18]=

Merging an empty list returns an empty Association:

In[19]:=
ResourceFunction["MergeByKey"][{}, {a -> 1}]
Out[19]=
In[20]:=
ResourceFunction["MergeByKey"][{}, {a -> 1}, f]
Out[20]=

Similarly for a list of empty associations:

In[21]:=
ResourceFunction["MergeByKey"][{<||>, <||>, <||>}, {a -> 1}]
Out[21]=
In[22]:=
ResourceFunction["MergeByKey"][{<||>, <||>, <||>}, {a -> 1}, f]
Out[22]=

Possible Issues (2) 

If the associations have keys wrapped in List, you need the Key wrapper to indicate this. The following will not work since it is ambiguous:

In[23]:=
ResourceFunction[
 "MergeByKey"][{<|{a} -> 1, b -> 2|>, <|{a} -> 5, b -> 10|>}, {{a} -> Total}]
Out[23]=

Use Key to specify a List as a key:

In[24]:=
ResourceFunction[
 "MergeByKey"][{<|{a} -> 1, b -> 2|>, <|{a} -> 5, b -> 10|>}, {Key[{a}] -> Total}]
Out[24]=

Publisher

Sjoerd Smit

Version History

  • 2.0.0 – 23 July 2020
  • 1.0.0 – 10 July 2020

License Information