Function Repository Resource:

TableSet

Source Notebook

Create a lightweight, named column store of tabular data that is compatible with Query, Select and other SQL-like functions

Contributed by: Edmund B Robinson

ResourceFunction["TableSet"][names, data]

creates a data structure with columns named names and tabular data data.

Details

Tabular data is the most common data structure seen in data analysis. This form of data does not require much of the functionality needed to manage hierarchical data that is in Dataset. ResourceFunction["TableSet"] fills the gap between accessing a lightweight List with indices and a memory-hungry Dataset with named columns.
ResourceFunction["TableSet"] takes a list of strings for the column names.
ResourceFunction["TableSet"] takes any expression that satisfies MatrixQ for data.
The length of names must equal the number of columns in data.
ResourceFunction["TableSet"][]["prop"] can take the following string properties as "prop":
"ColumnNames"list of the column names
"Data"list of the data
"RowCount"number of rows
ResourceFunction["TableSet"] is compatible with the following functions:
Deletedelete a row or multiple rows
DeleteDuplicatesdelete duplicate rows
Insertadd a row at a specified position
JoinAcrossjoin two ResourceFunction["TableSet"] objects by "Left", "Inner" and "Right" specifications
Partreturn parts of the rows and columns
Queryperform a query using the column names as named slots
Selectpick out rows according to a criterion
TableFormdisplay a ResourceFunction["TableSet"] arranged in an array of rectangular cells
Unionreturn a sorted list of all the distinct elements that appear in a ResourceFunction["TableSet"] object
For JoinAcross, "Outer" join and the KeyCollisionFunction option are not implemented.
For Query, use named slots to reference columns.
Since ResourceFunction["TableSet"] is optimised for tabular data, it has a smaller memory footprint than Dataset, which is optimized for hierarchical data. When working with large tabular data, a significant decrease in memory usage can be gained by using ResourceFunction["TableSet"] as compared to Dataset.

Examples

Basic Examples (5) 

Create a TableSet:

In[1]:=
table = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@12, 3, 1]]
Out[1]=

View the number of rows:

In[2]:=
table["RowCount"]
Out[2]=

Insert a row:

In[3]:=
table = Insert[table, Range@3, 4]
Out[3]=

View the updated number of rows:

In[4]:=
table["RowCount"]
Out[4]=

View the TableSet in TableForm:

In[5]:=
TableForm[table]
Out[5]=

Scope (10) 

TableSet has a summary form:

In[6]:=
table = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@12, 3, 1]]
Out[6]=

Return the column names:

In[7]:=
table["ColumnNames"]
Out[7]=

Return the row count:

In[8]:=
table["RowCount"]
Out[8]=

Return the data in the TableSet:

In[9]:=
table["Data"]
Out[9]=

Delete a row or rows from a TableSet:

In[10]:=
deleteTable = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@15, 3, 1]]
Out[10]=

TableSet rows before deleting a row:

In[11]:=
deleteTable["RowCount"]
Out[11]=

Delete row 3:

In[12]:=
deleteTable = Delete[deleteTable, 3]
Out[12]=

Row count after deleting row 3:

In[13]:=
deleteTable["RowCount"]
Out[13]=

Delete row 7 and the last row:

In[14]:=
deleteTable = Delete[deleteTable, {{7}, {-1}}]
Out[14]=

Row count after deleting row 7 and the last row:

In[15]:=
deleteTable["RowCount"]
Out[15]=

Delete duplicate rows:

In[16]:=
deleteTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[Flatten[Range /@ {4, 6}], 2]]
Out[16]=

The first two data rows are identical to the next two rows:

In[17]:=
TableForm[deleteTable]
Out[17]=

Delete duplicate rows:

In[18]:=
deleteTable = DeleteDuplicates[deleteTable];
TableForm[deleteTable]
Out[13]=

Add a row at a specified position:

In[19]:=
insertTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[Flatten[Range@6], 2]]
Out[19]=

Insert a row at position 3:

In[20]:=
insertTable = Insert[insertTable, Range@2, 3];
TableForm[insertTable]
Out[18]=

Insert rows into position 3 and the last position:

In[21]:=
insertTable = Insert[insertTable, Range@2, {{3}, {-1}}];
TableForm[insertTable]
Out[14]=

For JoinAcross, "Outer" and KeyCollisionFunction are not implemented:

In[22]:=
table1 = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@10, 3, 1]];
table2 = ResourceFunction["TableSet"][{"B", "C", "D"}, Partition[Range[6, 15], 3, 1]];
Grid[{TableForm /@ {table1, table2}}
 , Spacings -> {3, Automatic}
 ]
Out[22]=

Evaluate "Left", "Inner" and "Right" joins:

In[23]:=
Function[{jspec},
   Labeled[
    TableForm@JoinAcross[table1, table2, "B", jspec],
    jspec, Top
    ]] /@ {"Left", "Inner", "Right"} //
 Grid[{#}, Alignment -> {Center, Top}, Spacings -> {3, Automatic}] &
Out[23]=

Return parts of the rows and columns:

In[24]:=
partTable = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@10, 3, 1]]
Out[24]=

Return the first two rows:

In[25]:=
partTable[[;; 2]]
Out[25]=

Return every other row:

In[26]:=
partTable[[;; ;; 2]]
Out[26]=

Return the last column:

In[27]:=
partTable[[All, -1]]
Out[27]=

Use named slots to reference TableSet columns in Query:

In[28]:=
queryTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[RandomInteger[{1, 100}, 10^6], 2]]
Out[28]=

Query a Histogram of the sum of the columns:

In[29]:=
Query[Histogram, #A + #B &]@queryTable
Out[29]=

Add columns based on calculations on existing columns:

In[30]:=
queryTable = Query[All, <|#, "Total" -> #A + #B, "Difference" -> Differences@{#A, #B}|> &]@queryTable;
queryTable["ColumnNames"]
Out[27]=

Pick out rows according to a criterion:

In[31]:=
selectTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[RandomInteger[{1, 100}, 10^6], 2]]
Out[31]=

Select rows with column A > 4 and B < 12:

In[32]:=
resultTable = Select[selectTable, #A > 4 && #B < 12 &]
Out[32]=

Compare the row count of the original and result table:

In[33]:=
selectTable["RowCount"]
Out[33]=
In[34]:=
resultTable["RowCount"]
Out[34]=

Display the TableSet arranged in an array of rectangular cells:

In[35]:=
displayTable = ResourceFunction["TableSet"][{"A", "B"}, Partition[Flatten[Range@6], 2]]
Out[35]=
In[36]:=
TableForm[displayTable]
Out[36]=

Create TableSets with example data:

In[37]:=
table1 = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[Range@10, 3, 1]];
table2 = ResourceFunction["TableSet"][{"B", "C", "D"}, Partition[Range[6, 15], 3, 1]];

View TableSetobjects in TableForm:

In[38]:=
Grid[{TableForm /@ {table1, table2}}
 , Spacings -> {3, Automatic}
 ]
Out[38]=

Union gives the union of the rows maintaining the column names from the first table:

In[39]:=
TableForm@Union[table1, table2]
Out[39]=

Properties and Relations (4) 

When working with large tabular data, a significant decrease in memory usage can be gained by using TableSet as compared to Dataset. For example, with a two-column table of integer values, the ByteCount difference quickly widens with the record count as the graph tooltips and the following table demonstrate:

In[40]:=
aBytes = {};
dBytes = {};
tsBytes = {};
obs = Power[2, Range@24];

Create the data:

In[41]:=
Module[
  {data, a}
  , Scan[
   Function[n
    , data = Partition[RandomReal[{1, 100}, n], 2]
    ; a = AssociationThread[{"A", "B"}, #] & /@ data
    ; AppendTo[aBytes, ByteCount[a]]
    ; AppendTo[dBytes, ByteCount[Dataset@a]]
    ; AppendTo[tsBytes, ByteCount[ResourceFunction["TableSet"][{"A", "B"}, data]]];
    ]
   , obs
   ]
  ];

Plot it:

In[42]:=
ListPlot[
 Tooltip[{##}, Grid@{{"Obs", ":", QuantityMagnitude@#1}, {"MB", ":", N@#2}}] & @@@ Transpose@{Quantity[obs, "Unities"], #} & /@
  UnitConvert[ Quantity[{aBytes, dBytes, tsBytes}, "Bytes"], "Megabytes"]
 , TargetUnits -> {"Mega", Automatic}
 , PlotStyle -> {Directive[Thickness[.02], Opacity[.4]], Directive[Red, Thickness[.005]], Automatic}
 , PlotLabel -> "ByteCount Comparison"
 , Joined -> True
 , FrameLabel -> Automatic
 , PlotLegends -> {Association, Dataset, ResourceFunction[
   "TableSet"]}
 , Mesh -> All
 , MeshStyle -> Purple
 , PlotRange -> Full
 , PlotTheme -> "Detailed"
 , ImageSize -> Large
 ]
Out[42]=

See the data in a table:

In[43]:=
TableForm[
 MapAt[UnitConvert[N@Quantity[#, "Bytes"], "Megabytes"] &, {2 ;;, All}]@{obs, aBytes, dBytes, tsBytes}[[All, -11 ;; -6]]
 , TableHeadings -> {{"Records", Association, Dataset, ResourceFunction["TableSet"]}, None}
 , TableAlignments -> {Decimal, Automatic}
 ]
Out[43]=

Possible Issues (2) 

A Query to add columns returns a TableSet:

In[44]:=
queryTable = ResourceFunction["TableSet"][{"A", "B", "E"}, Partition[RandomInteger[10, 15], 3, 1]]
Out[44]=
In[45]:=
Query[All, <|#, "ABTotal" -> #A + #B|> &]@queryTable
Out[45]=

However, a general query does not return a TableSet:

In[46]:=
Query[GroupBy[#A - #B &]]@queryTable
Out[46]=

Publisher

Edmund B Robinson

Version History

  • 1.0.0 – 10 August 2021

Related Resources

License Information