Function Repository Resource:

SelectByColumnValues

Source Notebook

Select rows from Tabular data based on explicit column values

Contributed by: Sjoerd Smit

ResourceFunction["SelectByColumnValues"][tab,col→val]

selects all rows in Tabular data tab where column col has value val.

ResourceFunction["SelectByColumnValues"][tab,col→{val₁,…}]

selects rows where col has one of the given values val_i.

ResourceFunction["SelectByColumnValues"][spec]

is an operator form of ResourceFunction["SelectByColumnValues"] that can be applied to expressions.

Details

ResourceFunction["SelectByColumnValues"] only works for Tabular data with explicit column headers as given by ColumnKeys.

ResourceFunction["SelectByColumnValues"] is intended for conveniently and quickly looking up rows by specific (potentially many) column values, such as when looking up particular IDs in a database.

ResourceFunction["SelectByColumnValues"] is more efficient that using Select for the same operations on large Tabular datasets.

Examples

Basic Examples (3)

Create tabular data:

In[1]:=

Out[1]=

Select the rows where "col2" is x:

In[2]:=

Out[2]=

Select rows where "col1" is any of the given values. Values that aren't present will be ignored:

In[3]:=

Out[3]=

Scope (5)

Define an operator:

In[4]:=

Out[4]=

Define a dataset:

In[5]:=

Out[5]=

Use the operator on the dataset:

In[6]:=

Out[6]=

Select based on a large number of possible values:

In[7]:=

Out[7]=

An empty Tabular is returned if no matching rows were found:

In[8]:=

Out[8]=

Properties and Relations (5)

SelectByColumnValues is designed to be fast for large datasets:

In[9]:=

n = 10^6;
SeedRandom[1234];
tab = ToTabular[<|"a" -> Range[n], "b" -> RandomChoice[Alphabet[], n],
"c" -> RandomReal[1, n]|>, "Columns"]

Out[10]=

Selecting based on values in the a column is much faster with SelectByColumnValues than using Select and MemberQ because MemberQ doesn't compile internally:

In[11]:=