Function Repository Resource:

BinListsBy

Source Notebook

Bin data into lists based on applying a function to each item

Contributed by: Sander Huisman

ResourceFunction["BinListsBy"][{x1,x2,},{f,xmin,xmax}]

gives lists of the elements xi for which the values of f[xi] lie in bins from xmin to xmax with unit width.

ResourceFunction["BinListsBy"][{x1,x2,},{f,xmin,xmax,Δx}]

gives lists of the elements xi for which the values of f[xi] lie in bins from xmin to xmax with width Δx.

ResourceFunction["BinListsBy"][{x1,x2,},binspec1,binspec2,]

gives an array of lists of the elements xi for which each index is determined by applying fi of the binning specifications binspeci to the value xi.

Details and Options

ResourceFunction["BinListsBy"] drops those elements for which the function evaluates to values that are outside the binning specification.
ResourceFunction["BinListsBy"] drops those elements for which the function evaluates to complex numbers or to non-numerical data.
Within each bin, elements appear in the same order as in the original data.
ResourceFunction["BinListsBy"] places elements in bin i when their function values satisfy .

Examples

Basic Examples (4) 

Bin some data based only on the first element of each list:

In[1]:=
data = {{5.1, 2}, {4.2, 5}, {2.2, 5}, {7.1, 6}, {2.9, 3}, {1.4, 1}, {5.3, 2}};
ResourceFunction["BinListsBy"][data, {First, 1, 10}] // Column
Out[2]=

Bin a list of data based on the string length of each item:

In[3]:=
data = {"This", "is", "just", "some", "test", "data", "from", "Sander", "ok?"};
ResourceFunction["BinListsBy"][data, {StringLength, 1, 10}] // Column
Out[4]=

Bin some data based on the total of each list:

In[5]:=
data = {{1, 2}, {4, 5}, {-2, 5}, {3, 6}, {2, 3}, {1, 1}};
ResourceFunction["BinListsBy"][data, {Total, 1, 10}] // Column
Out[6]=

Bin some data with some "auxiliary" data:

In[7]:=
data = {{7.2, <|"Name" -> "Sander", "Age" -> Missing["Disputed"]|>}, {5.1, <|"Name" -> "Tanya", "Age" -> 25|>}, {2.5, <|"Name" -> "Eric", "Age" -> 41|>}, {3.1, <|
     "Name" -> "Paul", "Age" -> 20|>}, {5.4, <|"Name" -> "Henry", "Age" -> 72|>}, {7.1, <|"Name" -> "Emma", "Age" -> 42|>}, {6.2, <|
     "Name" -> "Natalie", "Age" -> 31|>}};
ResourceFunction["BinListsBy"][data, {First, 1, 10}] // Column
Out[8]=

Scope (3) 

Perform a 2-dimensional binning based on the value of the first element, and the string length of the last element:

In[9]:=
data = {{1.2, 2.3, "This"}, {5.1, 1.2, "is"}, {4.5, 3.1, "just"}, {3.1, 1.87, "some"}, {2.4, 1.6, "test"}, {7.1, 1.4, "data"}, {6.2, 7.3, "from"}, {3.14, 9.1, "Sander"}, {-0.7, 0.8, "ok?"}};
ResourceFunction["BinListsBy"][
  data, {First, -2, 8}, {StringLength@*Last, 1, 7, 1}];
Grid[%, Frame -> All]
Out[11]=

Perform a 3-dimensional binning based on the value of the first element, the value of the middle element, and the length of the last element:

In[12]:=
data = {{1.2, 2.3, "This"}, {5.1, 1.2, "is"}, {4.5, 3.1, "just"}, {3.1, 1.87, "some"}, {2.4, 1.6, "test"}, {7.1, 1.4, "data"}, {6.2, 7.3, "from"}, {3.14, 9.1, "Sander"}, {-0.7, 0.8, "ok?"}};
ResourceFunction[
 "BinListsBy"][data, {First, -2, 8}, {#[[2]] &, 0, 10, 5}, {StringLength@*Last, 1, 7, 1}]
Out[13]=

The lists do not need to have the same length, structure, or type, allowing to bin data with auxiliary data:

In[14]:=
data = {
   {1.2, "This"},
   {6.1, <|"a" -> 3, "b" -> 5|>},
   {4.5, 10 + 2 Pi},
   {3.1, Plot[Sin[x], {x, 0, Pi}, ImageSize -> 125]},
   {2.4, 1.6, "test"},
   {7.1},
   {6.2, {7.3, "from"}},
   {3.14, Expand[(x + y)^3]}
   };
Grid[ResourceFunction["BinListsBy"][data, {First, 0, 9}], Frame -> All]
Out[15]=

Applications (5) 

Bin some data based on the number of unique letters in each string:

In[16]:=
data = {"This", "is", "just", "some", "test", "data", "from", "Sander", "ok?"};
ResourceFunction[
 "BinListsBy"][data, {Length@*DeleteDuplicates@*Characters@*
   ToLowerCase, 1, 10}]
Out[17]=

Bin complex numbers based on their imaginary and real parts:

In[18]:=
SeedRandom[1234];
data = RandomComplex[{0, 4 + 4 I}, 30];
out = ResourceFunction["BinListsBy"][data, {Im, 0, 4}, {Re, 0, 4}];
Grid[Map[Column, out, {2}], Frame -> All]
Out[21]=

Bin some strings based on their first letter:

In[22]:=
data = {"about", "votive", "pink-slipped", "ungodly", "perceptual", "zenzizenzizenzic", "superconducting", "snared", "comate"};
out = ResourceFunction["BinListsBy"][
   data, {First@*LetterNumber@*ToLowerCase, 1, 27}];
TableForm[out, TableHeadings -> {CharacterRange["a", "z"]}]
Out[24]=

Bin some person entities based on their birth century:

In[25]:=
data = {Entity["Person", "AlbertEinstein::6tb7g"], Entity["Person", "JaneGoodall::4k9zt"], Entity["Person", "Claude-LouisNavier::2g48v"], Entity["Person", "MarieCurie::v9f84"], Entity["Person", "IsaacNewton::bhz5x"], Entity["Person", "GraceBrewsterMurrayHopper::g9sk6"], Entity["Person", "LeonardoDaVinci::47w36"], Entity["Person", "HendrikLorentz::765z3"]};
out = ResourceFunction["BinListsBy"][
   data, {DateValue[#["BirthDate"], "Year"] &, 1401, 2001, 100}];
TableForm[Row /@ out, TableHeadings -> {ToString[#] <> "th" & /@ Range[15, 20]}]
Out[27]=

Bin some associations based on age:

In[28]:=
data = {
   <|"Name" -> "Albert", "Age" -> 76, "Occupation" -> "Physicist"|>,
   <|"Name" -> "Bernard", "Age" -> 64, "Occupation" -> "Physicist"|>,
   <|"Name" -> "Claude", "Age" -> 51, "Occupation" -> "Mathematical Physicist"|>,
   <|"Name" -> "Marie", "Age" -> 66, "Occupation" -> "Physicist, Chemist"|>,
   <|"Name" -> "Grace", "Age" -> 85, "Occupation" -> "Computer scientist"|>};
Grid[ResourceFunction["BinListsBy"][data, {#["Age"] &, 0, 100, 10}], Frame -> All]
Out[29]=

Properties and Relations (2) 

BinLists can sometimes perform similar tasks as BinListsBy by giving a single large bin in the "other" dimensions:

In[30]:=
data = {{29, 48}, {76, 70}, {37, 78}, {60, 63}, {83, 0}, {42, 44}, {26, 77}, {59, 91}, {93, 46}, {38, 24}, {30, 97}, {76, 60}, {98, 50}, {35, 2}, {22, 17}, {90, 90}, {90, 67}, {34, 22}, {97, 26}, {78, 85}, {70, 55}, {59, 92}, {15, 66}, {30, 84}, {45, 48}, {91, 13}, {69, 94}, {10, 100}, {97, 40}, {91, 87}, {74, 24}, {66, 91}, {9, 93}, {67, 2}, {44, 22}, {54, 96}, {67, 13}, {23, 12}, {88, 2}, {35, 82}, {76, 64}, {84, 32}, {62, 5}, {56, 84}, {88, 68}, {25, 99}, {41, 13}, {23, 4}, {40, 14}, {89, 90}};
ResourceFunction["BinListsBy"][data, {First, 0, 100, 10}] === BinLists[data, {0, 100, 10}, {-1000, 1000, 2000}][[All, 1]]
Out[31]=

GatherBy gives similar output as BinListsBy, but BinListsBy always returns an output with the same dimensions, the bins are sorted, and includes empty lists where necessary:

In[32]:=
data = {"This", "is", "just", "some", "test", "data", "from", "Sander", "ok?"};
In[33]:=
out1 = ResourceFunction["BinListsBy"][data, {StringLength, 1, 10}]
Out[33]=
In[34]:=
out2 = GatherBy[data, StringLength]
Out[34]=

Explicitly check they return the same:

In[35]:=
Sort[DeleteCases[out1, {}]] === Sort[out2]
Out[35]=

Possible Issues (2) 

If the function does not return a number it will not be binned:

In[36]:=
data = {1, 4, -2, 3, 2, 1};
ResourceFunction["BinListsBy"][data, {g, 1, 10}]
Out[37]=

Values outside the binning range are discarded:

In[38]:=
data = {1, 4, 2, 5, 2, 6, 2, 8};
ResourceFunction["BinListsBy"][data, {Identity, 1, 3}]
Out[39]=

Neat Examples (2) 

Bin some datasets based on their regressed slope:

In[40]:=
SeedRandom[1];
data = Table[t = RandomReal[{2, 8}]; Table[t x + RandomReal[{-1, 1}], {x, 0, 10}], 6];
out = ResourceFunction["BinListsBy"][
  data, {LinearModelFit[#, x, x]["BestFitParameters"][[2]] &, 1, 10, 1}]
Out[42]=

Plot the datasets with similar slopes together:

In[43]:=
ListPlot /@ Select[out, Length[#] > 0 &]
Out[43]=

Publisher

SHuisman

Requirements

Wolfram Language 11.3 (March 2018) or above

Version History

  • 1.0.0 – 01 April 2019

License Information