Basic Examples (5)
Summarize a vector of numbers:
Summarize a matrix of strings and specify the column names:
Summarize a vector of numbers with missing values:
Summarize a full 2D array with numerical and categorical columns (numbers, strings, and symbols):
Summarize a dataset:
Summarize a dataset with column names:
Summarize an association of vectors:
Scope (4)
Define a dataset:
A larger number of categorical values can be seen using the option "MaxTallies":
The function works with missing values and summarizes them separately of the rest of the values in a column:
Here we make a list of date objects with missing values:
Here is the summary of the date objects list and with a specified column name:
Here we make an association of random images:
This summarizes the list of rules in the association:
We can summarize association’s keys and values separately using the option setting Thread→True:
A dataset does not have to have named columns:
Options (7)
MaxTallies (1)
With the option "MaxTallies" we specify how many of summarized items we want to see for each column (variable):
NumberedColumns (2)
By default the summarized columns (variables) are automatically numbered:
With the option "NumberedColumns" the automatic numbering can be prevented:
Thread (4)
The option Thread is used to specify should the summarization be "threaded" if data to be summarized is an association or a list of rules.
Here is an association of 3D points:
Summarizing without threading:
Summarizing with threading:
Optionally column names can be added:
Applications (3)
Summarize Classify-ready data (2)
Here we summarize the Titanic data:
Here we summarize the Mushroom data:
Summaries browser (1)
If we have a set of datasets we can easily build an interactive interface that allows browsing of dataset summaries:
Possible Issues (7)
It is expected that the first argument of RecordsSummary is an object that can be converted to a full array atom objects:
This fails because dataset cannot be converted to a full 2D array:
A work-around is to use HoldForm for the columns that are not vectors:
If the numerical columns have Quantity values those columns are treated as categorical:
A summary of numerical values can be obtained by using QuantityMagnitude:
For associations the values of which are not full arrays, using the option setting Thread→True produces a failure:
This works though:
Neat Examples (1)
Summarize subsets of Titanic data that correspond to each passenger class: