Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Compute the contingency table for a two- or three- column dataset or array
ResourceFunction["CrossTabulate"][data] finds the contingency table for the Dataset or array data. |
Here is an array of random integer-word pairs:
In[1]:= | ![]() |
Out[3]= | ![]() |
Compute the contingency table:
In[4]:= | ![]() |
Out[4]= | ![]() |
Here is a Dataset, the first two columns of which are categorical columns and the third column which is numeric:
In[5]:= | ![]() |
Out[5]= | ![]() |
Compute the contingency table:
In[6]:= | ![]() |
Out[6]= | ![]() |
For large contingency tables instead of using Dataset it is faster and more convenient to use sparse arrays. That is specified with the option “Sparse”:
In[7]:= | ![]() |
Out[7]= | ![]() |
In[8]:= | ![]() |
Out[8]= | ![]() |
Here is a full array with three columns:
In[9]:= | ![]() |
Out[9]= | ![]() |
Compute the contingency table of the co-occurrences of each letter and with each word found by cross tabulating over the first two columns only:
In[10]:= | ![]() |
Out[10]= | ![]() |
Here the cross tabulation uses the third column -- for each unique letter-word pair the corresponding values of the third column are added:
In[11]:= | ![]() |
Out[11]= | ![]() |
If any of the columns have missing values they are shown in the contingency table:
In[12]:= | ![]() |
Out[12]= | ![]() |
In[13]:= | ![]() |
Out[13]= | ![]() |
The result of CrossTabulate is a Dataset by default. With the option setting "Sparse"→True the result is an Association with three elements: a sparse matrix with the contingency values, row names, and column names.
Here is an example:
In[14]:= | ![]() |
In[15]:= | ![]() |
Out[15]= | ![]() |
Using MatrixForm we can visualize the result:
In[16]:= | ![]() |
Out[16]= | ![]() |
Take the Titanic dataset:
In[17]:= | ![]() |
Find how many males and females are in each passenger class:
In[18]:= | ![]() |
Out[18]= | ![]() |
Find how many males and females survived:
In[19]:= | ![]() |
Out[19]= | ![]() |
Find the aggregated ages of the class-sex breakdown:
In[20]:= | ![]() |
Out[20]= | ![]() |
Here is a function to plot sparse contingency tables:
In[21]:= | ![]() |
In[22]:= | ![]() |
Out[22]= | ![]() |
Start with movie review data:
In[23]:= | ![]() |
Out[22]= | ![]() |
For each movie review we make a list of word-sentiment pairs and then join them into one big list:
In[24]:= | ![]() |
Out[20]= | ![]() |
Here is a sample:
In[25]:= | ![]() |
Out[25]= | ![]() |
Here we find the word-sentiment contingency table as a sparse matrix in order to plot it below:
In[26]:= | ![]() |
Here is a function to plot sparse contingency tables:
In[27]:= | ![]() |
Plot the contingency table:
In[28]:= | ![]() |
Out[28]= | ![]() |
Find the contingency table Dataset:
In[29]:= | ![]() |
Show the most prominent words for negative reviews:
In[30]:= | ![]() |
Out[30]= | ![]() |
The functionality of CrossTabulate can be emulated with Tally or GroupBy.
Here is a contingency matrix of a two column array:
In[31]:= | ![]() |
Out[31]= | ![]() |
Obtain the contingency value triplets using Tally:
In[32]:= | ![]() |
Out[32]= | ![]() |
Obtain the contingency values rules using GroupBy:
In[33]:= | ![]() |
Out[33]= | ![]() |
GroupBy generalizes better than Tally -- we can use GroupBy to get the contingency values for three column data:
In[34]:= | ![]() |
Out[34]= | ![]() |
Find the corresponding result of CrossTabulate:
In[35]:= | ![]() |
Out[35]= | ![]() |
Convert the Association obtained with the option setting "Sparse"→True into a Dataset:
In[36]:= | ![]() |
Out[36]= | ![]() |
In[37]:= | ![]() |
Out[37]= | ![]() |
If the second variable is numerical or has missing values the resulting Dataset would not have a tabular form:
In[38]:= | ![]() |
Out[38]= | ![]() |
One way to get a tabular form is to replace Missing[___] with a string:
In[39]:= | ![]() |
Out[39]= | ![]() |
Find the co-occurrence of the integers [1,3] in a list of random integer pairs:
In[40]:= | ![]() |
Out[40]= | ![]() |
Again, replacing the integer values with strings produces tabular form:
In[41]:= | ![]() |
Out[41]= | ![]() |
Here is a grid of contingency tables showing various breakdown perspectives of the Titanic data:
In[42]:= | ![]() |
Out[38]= | ![]() |
This work is licensed under a Creative Commons Attribution 4.0 International License