Function Repository Resource:

AssociateColumns

Source Notebook

Create associations among columns of a tabular object

Contributed by: Daniele Gregori

ResourceFunction["AssociateColumns"][tab,{col1,col2}]

associates one column of tab to another column.

ResourceFunction["AssociateColumns"][tab,{col1,col2},f]

specify a merging function for possible duplicate keys.

ResourceFunction["AssociateColumns"][tab,{{col1,col2,},{col3,col4,}}]

associates multiple columns as keys to multiple columns as values.

ResourceFunction["AssociateColumns"][tab,{{col1,col2,},{col3,col4,}},f]

specify a merging function.

ResourceFunction["AssociateColumns"][tab,{cols1,cols2,cols3,}]

creates an arbitrarily nested Association.

ResourceFunction["AssociateColumns"][tab,cols1cols2cols3]

alternative syntax with a rule for each level of nesting.

ResourceFunction["AssociateColumns"][tab,cols1cols2cols3,f]

specify the same merging function for each nesting level.

ResourceFunction["AssociateColumns"][tab,cols1cols2cols3,{f12,f23,}]

specify a different merging function for each nesting level.

Details and Options

Data is well collected in Tabular objects, but its manipulation and analysis often requires the use of associations. Thus ResourceFunction["AssociateColumns"] allows one to create associations among columns of a Tabular.
ResourceFunction["AssociateColumns"] takes as first argument a Tabular (or a list of associations, or a Dataset of the same Normal form) and as second argument a list of columns or a nested list of columns.
In the case only two columns or two lists of columns are specified, the association will have the former as keys and the latter as values. If the column of the keys contains duplicate elements, the corresponding values should be merged through a merging function, specified as third argument.
The association can be arbitrarily nested by specifying more than two columns or lists of columns in the second argument. Different merging functions between each nesting level can also be specified as a list in the third argument.
An alternative and arguably clearer syntax for the second argument is obtained by replacing the list at first level by a possibly nested Rule.
ResourceFunction["AssociateColumns"] accepts the following option:
"DuplicatesWarning"Falsewhether to output a warning message if the column of the keys contains duplicates
This message is automatically triggered if no merging function is specified as a third argument, and in that case the default Automatic merging is done through the Identity function.

Examples

Basic Examples (4) 

Create a Tabular expression:

In[1]:=
tabHurricanes = Tabular[ResourceFunction["ExampleDataset"][{"Statistics", "USHurricaneLoss"}]]
Out[1]=

Create an association between two columns of the Tabular:

In[2]:=
ResourceFunction[
 "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, {"Name", "Year"}]
Out[2]=

Create a nested association with three columns:

In[3]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][
  tabHurricanes, {"Year", "Name", "ReportedLosses"}] // KeySort
Out[3]=

If the first column contains duplicate keys, you can choose to merge them through a function specified as third argument:

In[4]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, {"Year", "Name"}, Sort] // KeySort
Out[4]=

Scope (4) 

Use an alternative syntax with a Rule for each level of nesting:

In[5]:=
tabSpeed = Tabular[ResourceFunction["ExampleDataset"][{"Statistics", "SpeedOfLight"}]]
Out[5]=
In[6]:=
ResourceFunction[
 "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabSpeed, "Experiment" -> "Run" -> "Speed"]
Out[6]=

Create an association with multiple columns as Values:

In[7]:=
tabHurricanes = Tabular[ResourceFunction["ExampleDataset"][{"Statistics", "USHurricaneLoss"}]]
Out[7]=
In[8]:=
ResourceFunction[
 "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, "Name" -> {"ReportedLosses", "AdjustedLosses"}]
Out[8]=

Use multiple columns as Keys:

In[9]:=
tabEarthquakes = Tabular[ResourceFunction["ExampleDataset"][{"Statistics", "USEarthquakes"}]]
Out[9]=
In[10]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][
  tabEarthquakes, {"Latitude", "Longitude"} -> "Magnitude", Median] // Shallow
Out[10]=
In[11]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][
  tabEarthquakes, {"Year", "Month", "Day"} -> {"Latitude", "Longitude"},
   MapApply[GeoPosition, #] &] // Shallow
Out[11]=

Create an arbitrarily nested association and visualize it as a Dataset:

In[12]:=
tabOrders = 
Tabular[<|"RawSchema" -> <|"ColumnProperties" -> <|"id" -> <|"ElementType" -> "Integer64"|>, "order_id" -> <|"ElementType" -> "Integer64"|>,
         "activity_id" -> <|"ElementType" -> "Integer64"|>, "entity_name" -> <|"ElementType" -> "String"|>, "entity_code" -> <|"ElementType" -> "String"|>, "periodicity" -> <|"ElementType" -> "Integer64"|>, "year" -> <|"ElementType" -> "Integer64"|>|>, "KeyColumns" -> None, "Backend" -> "WolframKernel"|>, "Options" -> {},
     "BackendData" -> <|"ColumnData" -> DataStructure["ColumnTable", {{
TabularColumn[<|"Data" -> {{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
               14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26}, {},
               None}, "ElementType" -> "Integer64"|>], 
TabularColumn[<|"Data" -> {{237, 242, 264, 270, 276, 320, 350, 353, 357, 359, 368, 383, 383, 385, 385, 400, 433, 439, 439, 446, 447, 451, 451, 451, 451, 455}, {}, None}, "ElementType" -> "Integer64"|>], 
TabularColumn[<|"Data" -> {{9, 28, 9, 28, 28, 28, 9, 29, 28, 10, 17, 29, 29, 10, 10, 17, 10, 36, 36, 28, 10, 28, 28, 28, 28, 17}, {}, None}, "ElementType" -> "Integer64"|>], 
TabularColumn[<|"Data" -> {{3, {0, 12, 26, 39, 56, 81, 99, 111, 124, 131, 136, 153, 166, 179, 183, 187, 201, 208, 215, 222, 236, 243, 250, 264, 271, 285, 299}, "Valle SabbiaIsola DovareseVallecamonicaPessina CremoneseTerre di Pievi e CastelliTorre de PicenardiValle SabbiaCosta VolpinoDossenaCainoBorgo San GiacomoCosta VolpinoCosta VolpinoIseoIseoCastelviscontiBarbataDossenaDossenaCastelviscontiBarbataDossenaCastelviscontiDossenaCastelviscontiCastelvisconti"}, {}, None},
             "ElementType" -> "String"|>], 
TabularColumn[<|"Data" -> {{3, {0, 6, 15, 25, 34, 44, 53, 59, 68, 77, 86, 95, 104, 113, 122, 131, 140, 149, 158, 167, 176, 185, 194, 203, 212, 221, 230}, "90302110301905310301561101030190701030266100103019107903021103016086103016092103017031103017020103016086103016086103017085103017085103019027103016019103016092103016092103019027103016019103016092103019027103016092103019027103019027"}, {}, None}, "ElementType" -> "String"|>], 
TabularColumn[<|"Data" -> {{3, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3}, {}, None}, "ElementType" -> "Integer64"|>], 
TabularColumn[<|"Data" -> {{2024, 2024, 2024, 2024, 2024, 2024, 2025, 2025, 2025, 2025, 2025, 2024, 2025, 2024, 2025, 2025, 2025, 2024, 2025, 2025, 2025, 2024, 2024, 2025, 2025, 2025}, {}, None}, "ElementType" -> "Integer64"|>]}}]|>|>]
  ;
In[13]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][
  tabOrders, {"year", "periodicity"} -> "activity_id" -> "order_id" -> {"entity_code", "entity_name"}, Sort] // KeySort
Out[13]=
In[14]:=
Dataset[%]
Out[14]=

Specify a different merging function for each level of nesting:

In[15]:=
ResourceFunction[
 "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabOrders, {"year", "periodicity"} -> "activity_id" -> "order_id" -> {"entity_code", "entity_name"}, {First, Identity, First}]
Out[15]=

Options (2) 

DuplicatesWarning (2) 

Return a warning message when the column of the Keys contains duplicates and no merging function is specified:

In[16]:=
tabHurricanes = Tabular[ResourceFunction["ExampleDataset"][{"Statistics", "USHurricaneLoss"}]]
Out[16]=
In[17]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, {"Year", "Name"}] // KeySort
Out[17]=

To avoid this message, you can just specify a merging function as the third argument, or set the option "DuplicatesWarning" to False, to obtain the automatic merging through Identity without error messages:

In[18]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, {"Year", "Name"}, Identity] // KeySort
Out[18]=
In[19]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, {"Year", "Name"}, "DuplicatesWarning" -> False] // KeySort
Out[19]=
In[20]:=
% == %%
Out[20]=

Applications (2) 

Create an association between the coordinates and magnitudes of earthquakes:

In[21]:=
tabEarthquakes = Tabular[ResourceFunction["ExampleDataset"][{"Statistics", "USEarthquakes"}]]
Out[21]=
In[22]:=
(assocGeoMagn = ResourceFunction[
    "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][
    tabEarthquakes, {"Latitude", "Longitude"} -> "Magnitude", Total]) // Shallow
Out[22]=

Use it to create a GeoHistogram of US seismicity:

In[23]:=
GeoHistogram[KeyMap[GeoPosition, assocGeoMagn], GeoRange -> Entity["Country", "UnitedStates"]]
Out[23]=

Properties and Relations (2) 

In the simplest case with only 2 columns, if there are no duplicates in the column of the keys, AssociateColumn[tab,{col1,col2}] is equivalent to Association[Rule@@@Values@Normal@tab[[All,{col1,col2}]]]:

In[24]:=
tabHurricanes = Tabular[ResourceFunction["ExampleDataset"][{"Statistics", "USHurricaneLoss"}]]
Out[24]=
In[25]:=
ResourceFunction[
 "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, {"Name", "Year"}]
Out[25]=
In[26]:=
Association[Rule @@@ Values@Normal@tabHurricanes[[All, {"Name", "Year"}]]]
Out[26]=
In[27]:=
% === %%
Out[27]=

If there are duplicates in the column of the keys, AssociateColumns[tab, {col1,col2},f] is equivalent to Map[f,GroupBy[Values@Normal@tab[[All,{col1,col2}]],FirstLast]]:

In[28]:=
ResourceFunction[
 "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, {"Year", "Name"}, Sort]
Out[28]=
In[29]:=
Map[Sort, GroupBy[Values@Normal@tabHurricanes[[All, {"Year", "Name"}]], First -> Last]]
Out[29]=
In[30]:=
% === %%
Out[30]=

Possible Issues (2) 

The merging functions should be actually allowed by the structure of the data:

In[31]:=
tabOrders = 
Tabular[<|"RawSchema" -> <|"ColumnProperties" -> <|"id" -> <|"ElementType" -> "Integer64"|>, "order_id" -> <|"ElementType" -> "Integer64"|>,
         "activity_id" -> <|"ElementType" -> "Integer64"|>, "entity_name" -> <|"ElementType" -> "String"|>, "entity_code" -> <|"ElementType" -> "String"|>, "periodicity" -> <|"ElementType" -> "Integer64"|>, "year" -> <|"ElementType" -> "Integer64"|>|>, "KeyColumns" -> None, "Backend" -> "WolframKernel"|>, "Options" -> {},
     "BackendData" -> <|"ColumnData" -> DataStructure["ColumnTable", {{
TabularColumn[<|"Data" -> {{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
               14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26}, {},
               None}, "ElementType" -> "Integer64"|>], 
TabularColumn[<|"Data" -> {{237, 242, 264, 270, 276, 320, 350, 353, 357, 359, 368, 383, 383, 385, 385, 400, 433, 439, 439, 446, 447, 451, 451, 451, 451, 455}, {}, None}, "ElementType" -> "Integer64"|>], 
TabularColumn[<|"Data" -> {{9, 28, 9, 28, 28, 28, 9, 29, 28, 10, 17, 29, 29, 10, 10, 17, 10, 36, 36, 28, 10, 28, 28, 28, 28, 17}, {}, None}, "ElementType" -> "Integer64"|>], 
TabularColumn[<|"Data" -> {{3, {0, 12, 26, 39, 56, 81, 99, 111, 124, 131, 136, 153, 166, 179, 183, 187, 201, 208, 215, 222, 236, 243, 250, 264, 271, 285, 299}, "Valle SabbiaIsola DovareseVallecamonicaPessina CremoneseTerre di Pievi e CastelliTorre de PicenardiValle SabbiaCosta VolpinoDossenaCainoBorgo San GiacomoCosta VolpinoCosta VolpinoIseoIseoCastelviscontiBarbataDossenaDossenaCastelviscontiBarbataDossenaCastelviscontiDossenaCastelviscontiCastelvisconti"}, {}, None},
             "ElementType" -> "String"|>], 
TabularColumn[<|"Data" -> {{3, {0, 6, 15, 25, 34, 44, 53, 59, 68, 77, 86, 95, 104, 113, 122, 131, 140, 149, 158, 167, 176, 185, 194, 203, 212, 221, 230}, "90302110301905310301561101030190701030266100103019107903021103016086103016092103017031103017020103016086103016086103017085103017085103019027103016019103016092103016092103019027103016019103016092103019027103016092103019027103019027"}, {}, None}, "ElementType" -> "String"|>], 
TabularColumn[<|"Data" -> {{3, 1, 3, 1, 1, 1, 3, 1, 1, 1, 3, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3}, {}, None}, "ElementType" -> "Integer64"|>], 
TabularColumn[<|"Data" -> {{2024, 2024, 2024, 2024, 2024, 2024, 2025, 2025, 2025, 2025, 2025, 2024, 2025, 2024, 2025, 2025, 2025, 2024, 2025, 2025, 2025, 2024, 2024, 2025, 2025, 2025}, {}, None}, "ElementType" -> "Integer64"|>]}}]|>|>]
  ;
In[32]:=
ResourceFunction[
 "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabOrders, {"year", "periodicity"} -> "activity_id" -> "order_id" -> {"entity_code", "entity_name"}, First]
Out[32]=
In[33]:=
ResourceFunction[
 "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabOrders, {"year", "periodicity"} -> "activity_id" -> "order_id" -> {"entity_code", "entity_name"}, {First, Identity, First}]
Out[33]=
In[34]:=
ResourceFunction[
 "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabOrders, {"year", "periodicity"} -> "activity_id" -> "order_id" -> {"entity_code", "entity_name"}, {Identity, First, First}]
Out[34]=

Sometimes the error messages suggest what to change:

In[35]:=
tabHurricanes = Tabular[ResourceFunction["ExampleDataset"][{"Statistics", "USHurricaneLoss"}]]
Out[35]=
In[36]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, "Year" -> "Name" -> "ReportedLosses", Sort] // KeySort
Out[36]=
In[37]:=
ResourceFunction[
  "AssociateColumns", ResourceSystemBase -> "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"][tabHurricanes, "Year" -> "Name" -> "ReportedLosses", {Sort, Identity}] // KeySort
Out[37]=

Publisher

Daniele Gregori

Version History

  • 1.0.0 – 29 August 2025

Related Resources

Author Notes

The development of this function benefitted from suggestions by the Wolfram Review Team.

License Information