Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Compute the contingency table for a two- or three- column dataset or array
ResourceFunction["CrossTabulate"][data] finds the contingency table for the Dataset or array data. |
Here is an array of random integer-word pairs:
| In[1]:= | ![]() |
| Out[3]= |
Compute the contingency table:
| In[4]:= |
| Out[4]= | ![]() |
Here is a Dataset, the first two columns of which are categorical columns and the third column which is numeric:
| In[5]:= | ![]() |
| Out[5]= | ![]() |
Compute the contingency table:
| In[6]:= |
| Out[6]= | ![]() |
For large contingency tables instead of using Dataset it is faster and more convenient to use sparse arrays. That is specified with the option “Sparse”:
| In[7]:= | ![]() |
| Out[7]= | ![]() |
| In[8]:= |
| Out[8]= |
Here is a full array with three columns:
| In[9]:= | ![]() |
| Out[9]= | ![]() |
Compute the contingency table of the co-occurrences of each letter and with each word found by cross tabulating over the first two columns only:
| In[10]:= |
| Out[10]= | ![]() |
Here the cross tabulation uses the third column -- for each unique letter-word pair the corresponding values of the third column are added:
| In[11]:= |
| Out[11]= | ![]() |
If any of the columns have missing values they are shown in the contingency table:
| In[12]:= | ![]() |
| Out[12]= | ![]() |
| In[13]:= |
| Out[13]= | ![]() |
The result of CrossTabulate is a Dataset by default. With the option setting "Sparse"→True the result is an Association with three elements: a sparse matrix with the contingency values, row names, and column names.
Here is an example:
| In[14]:= | ![]() |
| In[15]:= |
| Out[15]= | ![]() |
Using MatrixForm we can visualize the result:
| In[16]:= |
| Out[16]= | ![]() |
Take the Titanic dataset:
| In[17]:= |
Find how many males and females are in each passenger class:
| In[18]:= |
| Out[18]= | ![]() |
Find how many males and females survived:
| In[19]:= |
| Out[19]= | ![]() |
Find the aggregated ages of the class-sex breakdown:
| In[20]:= |
| Out[20]= | ![]() |
Here is a function to plot sparse contingency tables:
| In[21]:= | ![]() |
| In[22]:= |
| Out[22]= | ![]() |
Start with movie review data:
| In[23]:= | ![]() |
| Out[22]= |
For each movie review we make a list of word-sentiment pairs and then join them into one big list:
| In[24]:= | ![]() |
| Out[20]= |
Here is a sample:
| In[25]:= |
| Out[25]= |
Here we find the word-sentiment contingency table as a sparse matrix in order to plot it below:
| In[26]:= |
Here is a function to plot sparse contingency tables:
| In[27]:= | ![]() |
Plot the contingency table:
| In[28]:= |
| Out[28]= | ![]() |
Find the contingency table Dataset:
| In[29]:= |
Show the most prominent words for negative reviews:
| In[30]:= |
| Out[30]= | ![]() |
The functionality of CrossTabulate can be emulated with Tally or GroupBy.
Here is a contingency matrix of a two column array:
| In[31]:= | ![]() |
| Out[31]= | ![]() |
Obtain the contingency value triplets using Tally:
| In[32]:= |
| Out[32]= | ![]() |
Obtain the contingency values rules using GroupBy:
| In[33]:= |
| Out[33]= | ![]() |
GroupBy generalizes better than Tally -- we can use GroupBy to get the contingency values for three column data:
| In[34]:= |
| Out[34]= | ![]() |
Find the corresponding result of CrossTabulate:
| In[35]:= |
| Out[35]= | ![]() |
Convert the Association obtained with the option setting "Sparse"→True into a Dataset:
| In[36]:= | ![]() |
| Out[36]= | ![]() |
| In[37]:= |
| Out[37]= | ![]() |
If the second variable is numerical or has missing values the resulting Dataset would not have a tabular form:
| In[38]:= | ![]() |
| Out[38]= | ![]() |
One way to get a tabular form is to replace Missing[___] with a string:
| In[39]:= |
| Out[39]= | ![]() |
Find the co-occurrence of the integers [1,3] in a list of random integer pairs:
| In[40]:= |
| Out[40]= | ![]() |
Again, replacing the integer values with strings produces tabular form:
| In[41]:= |
| Out[41]= | ![]() |
Here is a grid of contingency tables showing various breakdown perspectives of the Titanic data:
| In[42]:= | ![]() |
| Out[38]= | ![]() |
This work is licensed under a Creative Commons Attribution 4.0 International License