Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Compute the contingency table for a two- or three- column dataset or array
ResourceFunction["CrossTabulate"][data] finds the contingency table for the Dataset or array data. |
Here is an array of random integer-word pairs:
In[1]:= |
Out[3]= |
Compute the contingency table:
In[4]:= |
Out[4]= |
Here is a Dataset, the first two columns of which are categorical columns and the third column which is numeric:
In[5]:= |
Out[5]= |
Compute the contingency table:
In[6]:= |
Out[6]= |
For large contingency tables instead of using Dataset it is faster and more convenient to use sparse arrays. That is specified with the option “Sparse”:
In[7]:= |
Out[7]= |
In[8]:= |
Out[8]= |
Here is a full array with three columns:
In[9]:= |
Out[9]= |
Compute the contingency table of the co-occurrences of each letter and with each word found by cross tabulating over the first two columns only:
In[10]:= |
Out[10]= |
Here the cross tabulation uses the third column -- for each unique letter-word pair the corresponding values of the third column are added:
In[11]:= |
Out[11]= |
If any of the columns have missing values they are shown in the contingency table:
In[12]:= |
Out[12]= |
In[13]:= |
Out[13]= |
The result of CrossTabulate is a Dataset by default. With the option setting "Sparse"→True the result is an Association with three elements: a sparse matrix with the contingency values, row names, and column names.
Here is an example:
In[14]:= |
In[15]:= |
Out[15]= |
Using MatrixForm we can visualize the result:
In[16]:= |
Out[16]= |
Take the Titanic dataset:
In[17]:= |
Find how many males and females are in each passenger class:
In[18]:= |
Out[18]= |
Find how many males and females survived:
In[19]:= |
Out[19]= |
Find the aggregated ages of the class-sex breakdown:
In[20]:= |
Out[20]= |
Here is a function to plot sparse contingency tables:
In[21]:= |
In[22]:= |
Out[22]= |
Start with movie review data:
In[23]:= |
Out[22]= |
For each movie review we make a list of word-sentiment pairs and then join them into one big list:
In[24]:= |
Out[20]= |
Here is a sample:
In[25]:= |
Out[25]= |
Here we find the word-sentiment contingency table as a sparse matrix in order to plot it below:
In[26]:= |
Here is a function to plot sparse contingency tables:
In[27]:= |
Plot the contingency table:
In[28]:= |
Out[28]= |
Find the contingency table Dataset:
In[29]:= |
Show the most prominent words for negative reviews:
In[30]:= |
Out[30]= |
The functionality of CrossTabulate can be emulated with Tally or GroupBy.
Here is a contingency matrix of a two column array:
In[31]:= |
Out[31]= |
Obtain the contingency value triplets using Tally:
In[32]:= |
Out[32]= |
Obtain the contingency values rules using GroupBy:
In[33]:= |
Out[33]= |
GroupBy generalizes better than Tally -- we can use GroupBy to get the contingency values for three column data:
In[34]:= |
Out[34]= |
Find the corresponding result of CrossTabulate:
In[35]:= |
Out[35]= |
Convert the Association obtained with the option setting "Sparse"→True into a Dataset:
In[36]:= |
Out[36]= |
In[37]:= |
Out[37]= |
If the second variable is numerical or has missing values the resulting Dataset would not have a tabular form:
In[38]:= |
Out[38]= |
One way to get a tabular form is to replace Missing[___] with a string:
In[39]:= |
Out[39]= |
Find the co-occurrence of the integers [1,3] in a list of random integer pairs:
In[40]:= |
Out[40]= |
Again, replacing the integer values with strings produces tabular form:
In[41]:= |
Out[41]= |
Here is a grid of contingency tables showing various breakdown perspectives of the Titanic data:
In[42]:= |
Out[38]= |
This work is licensed under a Creative Commons Attribution 4.0 International License