Function Repository Resource:

BinListsBy

Source Notebook

Bin data into lists based on applying a function to each item

Contributed by: Sander Huisman

ResourceFunction["BinListsBy"][{x1,x2,},{f,xmin,xmax}]

gives lists of the elements xi for which the values of f[xi] lie in bins from xmin to xmax with unit width.

ResourceFunction["BinListsBy"][{x1,x2,},{f,xmin,xmax,Δx}]

gives lists of the elements xi for which the values of f[xi] lie in bins from xmin to xmax with width Δx.

ResourceFunction["BinListsBy"][{x1,x2,},binspec1,binspec2,]

gives an array of lists of the elements xi for which each index is determined by applying fi of the binning specifications binspeci to the value xi.

ResourceFunction["BinListsBy"] drops those elements for which the function evaluates to values that are outside the binning specification.
ResourceFunction["BinListsBy"] drops those elements for which the function evaluates to complex numbers or to non-numerical data.
Within each bin, elements appear in the same order as in the original data.
ResourceFunction["BinListsBy"] places elements in bin i when their function values satisfy .

Examples

Basic Examples (4) 

Bin some data based only on the first element of each list:

In[1]:=
data = {{5.1, 2}, {4.2, 5}, {2.2, 5}, {7.1, 6}, {2.9, 3}, {1.4, 1}, {5.3, 2}};
ResourceFunction["BinListsBy"][data, {First, 1, 10}] // Column
Out[2]=

Bin a list of data based on the string length of each item:

In[3]:=
data = {"This", "is", "just", "some", "test", "data", "from", "Sander", "ok?"};
ResourceFunction["BinListsBy"][data, {StringLength, 1, 10}] // Column
Out[4]=

Bin some data based on the total of each list:

In[5]:=
data = {{1, 2}, {4, 5}, {-2, 5}, {3, 6}, {2, 3}, {1, 1}};
ResourceFunction["BinListsBy"][data, {Total, 1, 10}] // Column
Out[6]=

Bin some data with some "auxiliary" data:

In[7]:=
data = {{7.2, <|"Name" -> "Sander", "Age" -> Missing["Disputed"]|>}, {5.1, <|"Name" -> "Tanya", "Age" -> 25|>}, {2.5, <|"Name" -> "Eric", "Age" -> 41|>}, {3.1, <|
     "Name" -> "Paul", "Age" -> 20|>}, {5.4, <|"Name" -> "Henry", "Age" -> 72|>}, {7.1, <|"Name" -> "Emma", "Age" -> 42|>}, {6.2, <|
     "Name" -> "Natalie", "Age" -> 31|>}};
ResourceFunction["BinListsBy"][data, {First, 1, 10}] // Column
Out[8]=

Perform a 2-dimensional binning based on the value of the first element, and the string length of the last element:

In[9]:=
data = {{1.2, 2.3, "This"}, {5.1, 1.2, "is"}, {4.5, 3.1, "just"}, {3.1, 1.87, "some"}, {2.4, 1.6, "test"}, {7.1, 1.4, "data"}, {6.2, 7.3, "from"}, {3.14, 9.1, "Sander"}, {-0.7, 0.8, "ok?"}};
ResourceFunction["BinListsBy"][
  data, {First, -2, 8}, {StringLength@*Last, 1, 7, 1}];
Grid[%, Frame -> All]
Out[11]=

Perform a 3-dimensional binning based on the value of the first element, the value of the middle element, and the length of the last element:

In[12]:=
data = {{1.2, 2.3, "This"}, {5.1, 1.2, "is"}, {4.5, 3.1, "just"}, {3.1, 1.87, "some"}, {2.4, 1.6, "test"}, {7.1, 1.4, "data"}, {6.2, 7.3, "from"}, {3.14, 9.1, "Sander"}, {-0.7, 0.8, "ok?"}};
ResourceFunction[
 "BinListsBy"][data, {First, -2, 8}, {#[[2]] &, 0, 10, 5}, {StringLength@*Last, 1, 7, 1}]
Out[13]=

The lists do not need to have the same length, structure, or type, allowing to bin data with auxiliary data:

In[14]:=
data = {
   {1.2, "This"},
   {6.1, <|"a" -> 3, "b" -> 5|>},
   {4.5, 10 + 2 Pi},
   {3.1, Plot[Sin[x], {x, 0, Pi}, ImageSize -> 125]},
   {2.4, 1.6, "test"},
   {7.1},
   {6.2, {7.3, "from"}},
   {3.14, Expand[(x + y)^3]}
   };
Grid[ResourceFunction["BinListsBy"][data, {First, 0, 9}], Frame -> All]
Out[15]=

Bin some data based on the number of unique letters in each string:

In[16]:=
data = {"This", "is", "just", "some", "test", "data", "from", "Sander", "ok?"};
ResourceFunction[
 "BinListsBy"][data, {Length@*DeleteDuplicates@*Characters@*
   ToLowerCase, 1, 10}]
Out[17]=

Bin complex numbers based on their imaginary and real parts:

In[18]:=
SeedRandom[1234];
data = RandomComplex[{0, 4 + 4 I}, 30];
out = ResourceFunction["BinListsBy"][data, {Im, 0, 4}, {Re, 0, 4}];
Grid[Map[Column, out, {2}], Frame -> All]
Out[21]=

Bin some strings based on their first letter:

In[22]:=
data = {"about", "votive", "pink-slipped", "ungodly", "perceptual", "zenzizenzizenzic", "superconducting", "snared", "comate"};
out = ResourceFunction["BinListsBy"][
   data, {First@*LetterNumber@*ToLowerCase, 1, 27}];
TableForm[out, TableHeadings -> {CharacterRange["a", "z"]}]
Out[24]=

Bin some person entities based on their birth century:

In[25]:=
data = {Entity["Person", "AlbertEinstein::6tb7g"], Entity["Person", "JaneGoodall::4k9zt"], Entity["Person", "Claude-LouisNavier::2g48v"], Entity["Person", "MarieCurie::v9f84"], Entity["Person", "IsaacNewton::bhz5x"], Entity["Person", "GraceBrewsterMurrayHopper::g9sk6"], Entity["Person", "LeonardoDaVinci::47w36"], Entity["Person", "HendrikLorentz::765z3"]};
out = ResourceFunction["BinListsBy"][
   data, {DateValue[#["BirthDate"], "Year"] &, 1401, 2001, 100}];
TableForm[Row /@ out, TableHeadings -> {ToString[#] <> "th" & /@ Range[15, 20]}]
Out[27]=

Bin some associations based on age:

In[28]:=
data = {
   <|"Name" -> "Albert", "Age" -> 76, "Occupation" -> "Physicist"|>,
   <|"Name" -> "Bernard", "Age" -> 64, "Occupation" -> "Physicist"|>,
   <|"Name" -> "Claude", "Age" -> 51, "Occupation" -> "Mathematical Physicist"|>,
   <|"Name" -> "Marie", "Age" -> 66, "Occupation" -> "Physicist, Chemist"|>,
   <|"Name" -> "Grace", "Age" -> 85, "Occupation" -> "Computer scientist"|>};
Grid[ResourceFunction["BinListsBy"][data, {#["Age"] &, 0, 100, 10}], Frame -> All]
Out[29]=

BinLists can sometimes perform similar tasks as BinListsBy by giving a single large bin in the "other" dimensions:

In[30]:=
data = {{29, 48}, {76, 70}, {37, 78}, {60, 63}, {83, 0}, {42, 44}, {26, 77}, {59, 91}, {93, 46}, {38, 24}, {30, 97}, {76, 60}, {98, 50}, {35, 2}, {22, 17}, {90, 90}, {90, 67}, {34, 22}, {97, 26}, {78, 85}, {70, 55}, {59, 92}, {15, 66}, {30, 84}, {45, 48}, {91, 13}, {69, 94}, {10, 100}, {97, 40}, {91, 87}, {74, 24}, {66, 91}, {9, 93}, {67, 2}, {44, 22}, {54, 96}, {67, 13}, {23, 12}, {88, 2}, {35, 82}, {76, 64}, {84, 32}, {62, 5}, {56, 84}, {88, 68}, {25, 99}, {41, 13}, {23, 4}, {40, 14}, {89, 90}};
ResourceFunction["BinListsBy"][data, {First, 0, 100, 10}] === BinLists[data, {0, 100, 10}, {-1000, 1000, 2000}][[All, 1]]
Out[31]=

GatherBy gives similar output as BinListsBy, but BinListsBy always returns an output with the same dimensions, the bins are sorted, and includes empty lists where necessary:

In[32]:=
data = {"This", "is", "just", "some", "test", "data", "from", "Sander", "ok?"};
In[33]:=
out1 = ResourceFunction["BinListsBy"][data, {StringLength, 1, 10}]
Out[33]=
In[34]:=
out2 = GatherBy[data, StringLength]
Out[34]=

Explicitly check they return the same:

In[35]:=
Sort[DeleteCases[out1, {}]] === Sort[out2]
Out[35]=

If the function does not return a number it will not be binned:

In[36]:=
data = {1, 4, -2, 3, 2, 1};
ResourceFunction["BinListsBy"][data, {g, 1, 10}]
Out[37]=

Values outside the binning range are discarded:

In[38]:=
data = {1, 4, 2, 5, 2, 6, 2, 8};
ResourceFunction["BinListsBy"][data, {Identity, 1, 3}]
Out[39]=

Bin some datasets based on their regressed slope:

In[40]:=
SeedRandom[1];
data = Table[t = RandomReal[{2, 8}]; Table[t x + RandomReal[{-1, 1}], {x, 0, 10}], 6];
out = ResourceFunction["BinListsBy"][
  data, {LinearModelFit[#, x, x]["BestFitParameters"][[2]] &, 1, 10, 1}]
Out[42]=

Plot the datasets with similar slopes together:

In[43]:=
ListPlot /@ Select[out, Length[#] > 0 &]
Out[43]=

Publisher

SHuisman

Requirements

Wolfram Language 11.3 (March 2018) or above

Version History

  • 1.0.0 – 01 April 2019

License Information