Function Repository Resource:

SelectByColumnValues

Source Notebook

Select rows from Tabular data based on explicit column values

Contributed by: Sjoerd Smit

ResourceFunction["SelectByColumnValues"][tab,colval]

selects all rows in Tabular data tab where column col has value val.

ResourceFunction["SelectByColumnValues"][tab,col{val1,}]

selects rows where col has one of the given values vali.

ResourceFunction["SelectByColumnValues"][spec]

is an operator form of ResourceFunction["SelectByColumnValues"] that can be applied to expressions.

Details

ResourceFunction["SelectByColumnValues"] only works for Tabular data with explicit column headers as given by ColumnKeys.
ResourceFunction["SelectByColumnValues"] is intended for conveniently and quickly looking up rows by specific (potentially many) column values, such as when looking up particular IDs in a database.
ResourceFunction["SelectByColumnValues"] is more efficient that using Select for the same operations on large Tabular datasets.

Examples

Basic Examples (3) 

Create tabular data:

In[1]:=
tab = Tabular[{{1, x, Today}, {4, y, Tomorrow}, {5, x, Yesterday}}, {"col1", "col2", "col3"}]
Out[1]=

Select the rows where "col2" is x:

In[2]:=
ResourceFunction["SelectByColumnValues"][tab, "col2" -> x]
Out[2]=

Select rows where "col1" is any of the given values. Values that aren't present will be ignored:

In[3]:=
ResourceFunction["SelectByColumnValues"][tab, "col1" -> {1, 2, 4, 10}]
Out[3]=

Scope (5) 

Define an operator:

In[4]:=
selector = ResourceFunction["SelectByColumnValues"]["class" -> "3rd"]
Out[4]=

Define a dataset:

In[5]:=
data = Tabular[ExampleData[{"Dataset", "Titanic"}]]
Out[5]=

Use the operator on the dataset:

In[6]:=
selector@data
Out[6]=

Select based on a large number of possible values:

In[7]:=
data // ResourceFunction["SelectByColumnValues"][
  "age" -> Join[Range[10], Range[50, 100]]]
Out[7]=

An empty Tabular is returned if no matching rows were found:

In[8]:=
data // ResourceFunction["SelectByColumnValues"]["age" -> 1001]
Out[8]=

Properties and Relations (5) 

SelectByColumnValues is designed to be fast for large datasets:

In[9]:=
n = 10^6;
SeedRandom[1234];
tab = ToTabular[<|"a" -> Range[n], "b" -> RandomChoice[Alphabet[], n],
    "c" -> RandomReal[1, n]|>, "Columns"]
Out[10]=

Selecting based on values in the a column is much faster with SelectByColumnValues than using Select and MemberQ because MemberQ doesn't compile internally:

In[11]:=
ids = RandomSample[Range[n], 10];
ResourceFunction["SelectByColumnValues"][tab, "a" -> ids] // AbsoluteTiming
Out[12]=
In[13]:=
Select[tab, MemberQ[ids, #a] &] // AbsoluteTiming
Out[13]=

You can optimize Select by writing a query function that does compile:

In[14]:=
Select[tab, Function[Evaluate[Or @@ Thread[#a == ids]]]]; // AbsoluteTiming
Out[14]=
In[15]:=
Select[tab, MatchQ[#a, Alternatives @@ ids] &]; // AbsoluteTiming
Out[15]=

However, this doesn't scale well for larger sets of query values:

In[16]:=
ids = RandomSample[Range[n], 5000];
Select[tab, Function[Evaluate[Or @@ Thread[#a == ids]]]]; // AbsoluteTiming
Out[17]=
In[18]:=
Select[tab, MatchQ[#a, Alternatives @@ ids] &]; // AbsoluteTiming
Out[18]=

SelectByColumnValues is still fast:

In[19]:=
ResourceFunction["SelectByColumnValues"][tab, "a" -> ids]; // AbsoluteTiming
Out[19]=

Possible Issues (2) 

Tabular data without column headings cannot be used:

In[20]:=
tab = Tabular[{{1, "a"}, {2, "b"}, {3, "c"}}]
Out[20]=
In[21]:=
ResourceFunction["SelectByColumnValues"][tab, 2 -> "b"]
Out[21]=

Use RenameColumns to add column headings:

In[22]:=
ResourceFunction["SelectByColumnValues"][RenameColumns[tab, {1, 2}], 2 -> "b"]
Out[22]=

Publisher

Sjoerd Smit

Version History

  • 1.0.0 – 26 September 2025

Related Resources

License Information