Wolfram Language Paclet Repository

Community-contributed installable additions to the Wolfram Language

Primary Navigation

    • Cloud & Deployment
    • Core Language & Structure
    • Data Manipulation & Analysis
    • Engineering Data & Computation
    • External Interfaces & Connections
    • Financial Data & Computation
    • Geographic Data & Computation
    • Geometry
    • Graphs & Networks
    • Higher Mathematical Computation
    • Images
    • Knowledge Representation & Natural Language
    • Machine Learning
    • Notebook Documents & Presentation
    • Scientific and Medical Data & Computation
    • Social, Cultural & Linguistic Data
    • Strings & Text
    • Symbolic & Numeric Computation
    • System Operation & Setup
    • Time-Related Computation
    • User Interface Construction
    • Visualization & Graphics
    • Random Paclet
    • Alphabetical List
  • Using Paclets
    • Get Started
    • Download Definition Notebook
  • Learn More about Wolfram Language

MosaicPlot

Tech Notes

  • Mosaic plots for data visualization
  • Mosaic plots for numerical variables via categorical mapping

Symbols

  • MosaicPlot
  • MosaicPlotTooltipTable
Mosaic plots for data visualization
Introduction
Options
Data set
References
Explanations
​
Introduction
This notebook gives a description and examples of using the function MosaicPlot of the Mathematica package MosaicPlot.m provided by the project MathematicaForPrediction at GitHub, see [1].
The function MosaicPlot summarizes the conditional probabilities of co-occurrence of the categorical values in a list of records of the same length. The list of records is assumed to be a full array and the columns to represent categorical values. (Note, that if a column is numerical but has a small number of different values then it can be seen as categorical.)
I have read the descriptions of mosaic plots in the book “R in Action” by Robert Kabakoff, [2], and one of the references provided in the book (“What is a mosaic plot?” by Steve Simon, [3]). I was impressed how informative mosaic plots are and I figured they can be relatively easily implemented using Prefix trees (also known as “Tries”) [4,5]. I implemented MosaicPlot while working on a document analyzing the census income data from 1998, [6]. This is the reason that data set is used in this document. A good alternative set provided by WL is ExampleData[{“Statistics”,”USCars1993”}].
Load the paclet
In[1]:=
Needs["AntonAntonov`MosaicPlot`"]
Load the dataset
In[1]:=
dsCensusData=ResourceFunction["ImportCSVToDataset"]["~/Downloads/adult/dfAdult.csv"]
Data set
The data set can be found and taken from
http://archive.ics.uci.edu/ml/datasets/Census+Income
, [6].
The description of the data set is given in the file “adult.names” of the data folder. The data folder provides two sets with the same type of data “adult.data” and “adult.test”; the former is used for training, the latter for testing.
The total number of records in the file “adult.data” is
32561
; the total number of records in the file “adult.test” is
16281
.
Here is how the data looks like:
In[15]:=
Magnify[RandomSample[dsCensusData,12],0.6]
Out[15]=
age
workclass
fnlwgt
education
education-num
marital-status
occupation
relationship
race
sex
55.0
Federal-gov
31728.0
Some-college
10.0
Married-civ
-spouse
Adm-clerical
Wife
White
Female
61.0
?
394534.
HS-grad
9.0
Married-civ
-spouse
?
Husband
Black
Male
48.0
Self-emp-inc
181307.
HS-grad
9.0
Married-civ
-spouse
Craft-repair
Husband
White
Male
43.0
State-gov
30824.0
Some-college
10.0
Divorced
Adm-clerical
Not-in-fam
ily
White
Female
42.0
Self-emp-n
ot-inc
198692.
Some-college
10.0
Married-civ
-spouse
Craft-repair
Husband
White
Male
22.0
Private
115244.
Assoc-acdm
12.0
Married-civ
-spouse
Prof-special
ty
Wife
White
Female
48.0
Private
149640.
HS-grad
9.0
Married-civ
-spouse
Transport-
moving
Husband
White
Male
56.0
Self-emp-inc
75214.0
Prof-school
15.0
Married-civ
-spouse
Prof-special
ty
Husband
White
Male
45.0
Private
89028.0
HS-grad
9.0
Divorced
Craft-repair
Not-in-fam
ily
Asian-Pac-I
slander
Male
28.0
Private
108574.
Some-college
10.0
Married-civ
-spouse
Adm-clerical
Wife
White
Female
20.0
Private
451996.
HS-grad
9.0
Never-marr
ied
Handlers-cl
eaners
Own-child
White
Male
18.0
Private
96445.0
Some-college
10.0
Never-marr
ied
Other-service
Own-child
White
Female
columns 1–10 of
15
Since I did not understand the meaning of the column “fnlwgt” I dropped it from the data:
In[17]:=
dsCensusData=dsCensusData[All,KeyDrop[#,"fnlwgt"]&];
Here is the summary table of the data:
In[18]:=
ResourceFunction["RecordsSummary"][dsCensusData[]]
Out[18]=

1 age
Min
17.
1st Qu
28.
Median
37.
Mean
38.5816
3rd Qu
48.
Max
90.
,
2 workclass
Private
22696
Self-emp-not-inc
2541
Local-gov
2093
?
1836
State-gov
1298
Self-emp-inc
1116
(Other)
981
,
3 education
HS-grad
10501
Some-college
7291
Bachelors
5355
Masters
1723
Assoc-voc
1382
11th
1175
(Other)
5134
,
4 education-num
Min
1.
1st Qu
9.
Median
10.
Mean
10.0807
3rd Qu
12.
Max
16.
,
5 marital-status
Married-civ-spouse
14976
Never-married
10683
Divorced
4443
Separated
1025
Widowed
993
Married-spouse-absent
418
Married-AF-spouse
23
,
6 occupation
Prof-specialty
4140
Craft-repair
4099
Exec-managerial
4066
Adm-clerical
3770
Sales
3650
Other-service
3295
(Other)
9541
,
7 relationship
Husband
13193
Not-in-family
8305
Own-child
5068
Unmarried
3446
Wife
1568
Other-relative
981
,
8 race
White
27816
Black
3124
Asian-Pac-Islander
1039
Amer-Indian-Eskimo
311
Other
271
,
9 sex
Male
21790
Female
10771
,
10 capital-gain
1st Qu
0.
3rd Qu
0.
Median
0.
Min
0.
Mean
1077.65
Max
99999.
,
11 capital-loss
1st Qu
0.
3rd Qu
0.
Median
0.
Min
0.
Mean
87.3038
Max
4356.
,
12 hours-per-week
Min
1.
1st Qu
40.
Median
40.
Mean
40.4375
3rd Qu
45.
Max
99.
,
13 native-country
United-States
29170
Mexico
643
?
583
Philippines
198
Germany
137
Canada
121
(Other)
1709
,
14 income
<=50K
24720
>50K
7841

On the summary table the numerical variables are described with min, max, and quartiles. The category variables are described with the tallies of their values. The tallies of values are ordered in decreasing order. The tallies of truncated values are summed under the value “(Other)”.
Note that:
-- only
24
% of the labels are “>50K”;
--
2/3
of the records are for males;
-- “capital-gain” and “capital-loss” are very skewed.
Explanations
If we pick a categorical variable, say “sex”, we can visualize the frequencies of the appearance of the variable values with the following plot:
In[20]:=
MosaicPlot
[dsCensusData〚All,{"sex"}〛,ColorRules{_GrayLevel[0.7]},ImageSize250]
Out[20]=
The size of the rectangles depends on the frequencies of appearance of the values “Male” and “Female” in the data records. From the rectangle sizes we can see what we already knew from the data summary table: approximately
2/3
of the records are about males.
We can subdivide every rectangle
r
according to the frequencies of co-occurrence of
r
’s value with the values of a second categorical variable, say “relationship”:
In[22]:=
MosaicPlot
[dsCensusData〚All,{"sex","relationship"}〛,"LabelRotation"{{2,1.1},{0,1}},"ColumnNamesOffset"0.075,ColorRules{_GrayLevel[0.7]},ImageSize300]
Out[22]=
The labels corresponding to the values of “relationship” are rotated for legibility. The "relationship" labels are placed according to the co-occurrence with the value "Male" of the variable "sex". The correspondent fractions of the pairs ("Female","Husband"), ("Female","Not-in-family"), etc., are deduced from the order of the "relationship" labels.
Using colored mosaic plots can help distinguishing which rectangles correspond to which values. Here is the last plot with rectangles colored across the "relationship" data variable:
In[26]:=
MosaicPlot
[dsCensusData〚All,{"sex","relationship"}〛,"LabelRotation"{{2,1.1},{0,1}},"ColumnNamesOffset"0.09,ColorRulesAutomatic,ImageSizeLarge]
Out[26]=
From the visual representations of the “sex vs. relationship” mosaic plot we can see that large fraction of the males are husbands, none (or a very small fraction) of them are wives. We can also see that none (or a very small fraction) of the females are husbands, the largest fraction of them are “Not-in-family”, and the “Not-in-family” females are approximately three times more than the females that are wives.
Let us make another mosaic plot for a different kind of relationship, “sex vs. education”:
By comparing the sizes of the rectangles corresponding to the values “Bachelors”, “Doctorate”, “Masters”, and “Some-college” on the “sex vs. education” mosaic plot we can see that the fraction of men that have finished college is larger than the fraction of women that have finished college.
We can further subdivide the rectangles according the co-occurrence frequencies with a third categorical variable. We are going to choose that third variable to be “income”, the values of which can be seen as outcomes or consequents of the values of the first two variables of the mosaic plot.
(The exact numbers of these observations can be seen tooltip table shown when hovering with the mouse over the rectangles.)
Instead of having the consequent (or outcome) variable to be the last variable in the mosaic plot, it is also useful to start with the consequent variable to get a perspective of how the attributes breakdown for it. Here is an example of a mosaic plot for “income vs. relationship vs. sex” (using a different color scheme):
It might be useful to make a mosaic plot for a subset of the records. Here is an example of a mosaic plot with splitting across four columns made only for people who have bachelor, master, or doctorate degrees:
Similar to the previous mosaic plot is this plot of “sex vs. education vs. marital-status vs. income”:
Options
MosaicPlot takes the following options:
In addition, MosaicPlot takes all the options of Graphics. (Because MosaicPlot is implemented with Graphics.)
The options are explained in the sub-sections below.

Visualizing categorical columns + a numerical column (“ExpandLastColumn”)

If the last data column is numerical then MosaicPlot can use it as pre-computed contingency statistics. This functionality is specified with the option “ExpandLastColumn”True.
In order to explain the functionality we are going to use following interpretation. If the last of column of the data is numerical then we can treat the data as a contracted version of a longer list of records made only of the categorical columns. For example, consider the following table with observations of people’s hair and eyes color:
The table above can be considered as a contracted version of this table:
Setting the option “ExpandLastColumn” to True gives a mosaic plot corresponding to that latter, observations-expanded table:
The last data column (which is numerical) does not need to be made of integers:

Controlling the size of the gap between the rectangles (“Gap” and “GapFactor”)

Contingency values labels (“LabelRotation” and “LabelStyle”)

The labels derived from the distinct values (levels) of each column of the data can be rotated and given style options.
The option “LabelRotation” takes directional specification for Text (the fourth argument of Text). The option “LabelStyle” takes options and arguments for the function Style.

Labels for categorical variables (“ColumnNames” and “ColumnNamesOffset”)

The names of the data columns (data’s variables) are specified with the option “ColumnNames”. (The list of names given to “ColumnNames” can be formatted with Style.) The distance of the column names from the rectangles is specified with the option “ColumnNamesOffset”.

Start of the rectangle splitting (“FirstAxis”)

The starting axis of the data splitting is specified by “FirstAxis”.

Tooltips with exact contingency statistics (“Tooltips”)

MosaicPlot has an interactive feature using Tooltip that gives a table with the exact co-occurrence (contingency) values when hovering with the mouse over the rectangles. The option “Tooltips” takes the values True or False.

Visualizing non-existing contingencies (“ZeroProbability”)

The non-existing contingencies have to be represented in the mosaic plot. MosaicPlot uses very thin rectangles for them and the size of these rectangles is controlled with the option “ZeroProbability”.

Coloring of the rectangles (ColorRules)

The rectangles can be colored using the option ColorRules which specifies how the colors of the rectangles are determined from the indices of the data columns.
If coloring for only one column index is specified the value of ColorRules can be of the form
The colors are used with Blend in order to color the rectangles according to the order of the unique values of the specified data columns.
The grid of plots below shows mosaic plots of the same data with different values for the option ColorRules (given as plot labels).
The default value for ColorRules is Automatic. When Automatic is given to ColorRules, MosaicPlot finds the data column with the largest number of unique values and colors them according to their order using ColorData[7,”ColorList”].
References

© 2025 Wolfram. All rights reserved.

  • Legal & Privacy Policy
  • Contact Us
  • WolframAlpha.com
  • WolframCloud.com