Basic Examples (4) 
Start with a categorical distribution:
Condition that distribution on the first dimension taking on a value of "A":
Find the probabilities from that categorical distribution conditioned on the second dimension taking on a value of "E":
Use a list of patterns instead of rules to impose the same condition as above:
Scope (3) 
Patterns used in the conditions can be complex:
The first category must be a letter in the word "ABLE":
A CategoricalDistribution of arbitrary dimension works with the function:
The function works with univariate categorical distributions returning a CategoricalDistribution with potentially fewer categories:
Options (2) 
The value of the "FlattenUnivariate" option (True by default) determines whether the result of a univariate ConditionalCategoricalDistribution has its categories described without a List wrapper:
Setting the "Marginalize" option to False preserves the dimensionality of the original distribution:
Applications (2) 
Here is the joint distribution of persons in group A or B who have or do not have some disease, and to whom a test classifies as negative or positive for the disease. Find the joint distribution of persons in group A with respect to disease and test result:
Find the probability that a person in group A who tests positive is actually sick:
Compute the fractions of true positives (sensitivity), true negatives (specificity) and false positives (1-specificity) for a mixture of categorical distributions:
Neat Examples (4) 
The following application comes from the field of causal inference, which is sometimes referred to as "do-calculus". Assume the joint probability distribution of the size of a kidney stone, the treatment one receives for it and how the outcome of that treatment is distributed as set forth below:
Compute the probability of a good outcome conditioned on the treatment. It will appear that B has better outcomes than A:
Now synthesize a randomized controlled trial and derive the interventional distribution when one forces the treatment to be "A" by computing a MixtureCategoricalDistribution over stone size in which the components are the conditional categorical distributions based on the stone size and the treatment being A:
Do exactly the same thing but force the treatment to be "B":
Although the observational distribution might suggest treatment B is superior, in fact, treatment A is superior and the observational distribution is distorted by the fact that small kidney stones, which generally have better outcomes, are more frequently treated with treatment B.