Function Repository Resource:

MixtureCategoricalDistribution

Source Notebook

Create a mixture distribution of categorical distributions and output it as a new CategoricalDistribution

Contributed by: Seth J. Chandler

ResourceFunction["MixtureCategoricalDistribution"][mixture,components]

combines the categorical distribution mixture with additional distributions components to create a mixture distribution.

Details and Options

ResourceFunction["MixtureCategoricalDistribution"] takes as input mixture, a univariate CategoricalDistribution, and components, a list of additional CategoricalDistribution objects all sharing the same categories. It creates a new CategoricalDistribution whose categories are the union of the categories of mixture with the categories of components and whose probabilities are the probabilities of mixture multiplied by the probabilities of the corresponding component.

Examples

Basic Examples (4) 

Take two categorical distributions whose common domain is and compose it with a categorical distribution whose domain is {"♂","♀"}:

In[1]:=
mix = ResourceFunction["MixtureCategoricalDistribution"][
  CategoricalDistribution[{"\[Mars]", "\[Venus]"}, {7, 3}], {CategoricalDistribution[{"\[Wolf]", "\[LightBulb]"}, {4, 6}], CategoricalDistribution[{"\[Wolf]", "\[LightBulb]"}, {7, 13}]}]
Out[1]=

Produce five random draws from the resulting distribution:

In[2]:=
RandomVariate[mix, 5]
Out[2]=

Represent the probabilities of the resulting distribution as an array:

In[3]:=
Information[mix, "ProbabilityArray"]
Out[3]=

Compute the entropy of the resulting distribution:

In[4]:=
Information[mix, "Entropy"]
Out[4]=

Scope (3) 

Use a symbolic univariate CategoricalDistribution to create a new bivariate CategoricalDistribution from two components that are themselves symbolic univariate categorical distributions:

In[5]:=
ResourceFunction["MixtureCategoricalDistribution"][
 CategoricalDistribution[{"\[Mars]", "\[Venus]"}, {a, b}], {CategoricalDistribution[{"\[Wolf]", "\[LightBulb]"}, {c, d}],
   CategoricalDistribution[{"\[Wolf]", "\[LightBulb]"}, {e, f}]}]
Out[5]=

See the probability values:

In[6]:=
Information[%, "Probabilities"] // Map[FullSimplify]
Out[6]=

The function works when the components are multivariate:

In[7]:=
cd1 = CategoricalDistribution[{{"A", "B", "C"}, {"D", "E"}}, {{1, 2}, {3, 4}, {5, 6}}];
cd2 = CategoricalDistribution[{{"A", "B", "C"}, {"D", "E"}}, {{1, 5}, {2, 4}, {9, 2}}];
ResourceFunction["MixtureCategoricalDistribution"][
 CategoricalDistribution[{"\[Mars]", "\[Venus]"}, {a, b}], {cd1, cd2}]
Out[8]=

See the probability values:

In[9]:=
Information[%, "ProbabilityArray"] // FullSimplify
Out[9]=

The function can be nested:

In[10]:=
ResourceFunction["MixtureCategoricalDistribution"][
 CategoricalDistribution[{"x", "y", "z"}, {0.2, 0.7, 0.1}], {ResourceFunction["MixtureCategoricalDistribution"][
   CategoricalDistribution[{"\[Mars]", "\[Venus]"}, {a, b}], {cd1, cd2}],
  ResourceFunction["MixtureCategoricalDistribution"][
   CategoricalDistribution[{"\[Mars]", "\[Venus]"}, {c, d}], {cd1, cd2}], ResourceFunction["MixtureCategoricalDistribution"][
   CategoricalDistribution[{"\[Mars]", "\[Venus]"}, {e, f}], {cd1, cd2}]}]
Out[10]=
In[11]:=
Information[%, "Probabilities"] // Map[FullSimplify] // ResourceFunction[
ResourceObject[
Association[
    "Name" -> "Terse", "ShortName" -> "Terse", "UUID" -> "6809487c-44ed-4a55-a610-ab706ebb8661", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "An operator form of Short", "RepositoryLocation" -> URL[
      "https://www.wolframcloud.com/objects/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$\
369a78f89aa2413eb5b19a962ce89cd7`Terse", "FunctionLocation" -> CloudObject[
      "https://www.wolframcloud.com/obj/c1820918-b759-4685-b9b8-\
c971a81216b5"]], ResourceSystemBase -> Automatic]][5]
Out[11]=

Applications (1) 

In some nation, 70% of the people prefer looking at Mars (♂) and 30% of the people prefer looking at Venus (♀). Among Mars supporters, 6 out of 10 prefer the "Wolf" soccer team and 4 out of 10 prefer the "Bulb" soccer team. Among Venus supporters, 7 out of 20 prefer the "Wolf" and 13 out of 20 prefer the "Bulb". What is the joint distribution of planetary preference and favorite soccer team?

In[12]:=
soccerDistribution = ResourceFunction["MixtureCategoricalDistribution"][
  CategoricalDistribution[{"\[Mars]", "\[Venus]"}, {7, 3}], {CategoricalDistribution[{"\[Wolf]", "\[LightBulb]"}, {6, 4}], CategoricalDistribution[{"\[Wolf]", "\[LightBulb]"}, {7, 13}]}]
Out[12]=

Now assume a crowd of one thousand randomly selected individuals from the nation attend a game. Compute the probability that the majority will be Bulb fans:

In[13]:=
SurvivalFunction[
  BinomialDistribution[1000, Query["\[LightBulb]"][
    Information[MarginalDistribution[soccerDistribution, 2], "Probabilities"]]], 501] // N
Out[13]=

Properties and Relations (3) 

One can construct a mixture of CategoricalDistribution objects using the MixtureDistribution function, but many desired properties of a distribution are not presently available:

In[14]:=
conventionalMixture = MixtureDistribution[{0.2, 0.8}, {CategoricalDistribution[{{"A", "B", "C"}, {"D", "E"}}, {{1, 2}, {3, 4}, {5, 6}}],
   CategoricalDistribution[{{"A", "B", "C"}, {"D", "E"}}, {{1, 5}, {2,
       4}, {9, 2}}]}]
Out[14]=

The following code, for example, attempts to get the probability of the mixture returning {"A","E"} and producing two random variables from the mixture. Neither effort succeeds:

In[15]:=
Through[{PDF[#, {"A", "E"}] &, RandomVariate[#, 2] &}[
  conventionalMixture]]
Out[15]=

By contrast, when one uses MixtureCategoricalDistribution, the same sort of information can be successfully obtained:

In[16]:=
mcd = ResourceFunction["MixtureCategoricalDistribution"][
  CategoricalDistribution[{"\[Mars]", "\[Venus]"}, {0.2, 0.8}], {CategoricalDistribution[{{"A", "B", "C"}, {"D", "E"}}, {{1, 2}, {3, 4}, {5, 6}}],
   CategoricalDistribution[{{"A", "B", "C"}, {"D", "E"}}, {{1, 5}, {2,
       4}, {9, 2}}]}]
Out[16]=
In[17]:=
Through[{PDF[MarginalDistribution[#, {2, 3}], {"A", "E"}] &, RandomVariate[#, 2] &}[mcd]]
Out[17]=

Possible Issues (1) 

The function returns $Failed if the domains of the components are not identical:

In[18]:=
ResourceFunction["MixtureCategoricalDistribution"][
 CategoricalDistribution[{"\[Mars]", "\[SadSmiley]"}, {a, b}], {CategoricalDistribution[{"\[Wolf]", "\[LightBulb]"}, {c, d}],
   CategoricalDistribution[{"\[Wolf]", "\[WarningSign]"}, {e, f}]}]
Out[18]=

Neat Examples (1) 

Here is an example from the Wikipedia entry for the Bayes theorem. "A particular test for whether someone has been using marijuana is 90% sensitive and 80% specific, meaning it leads to 90% true 'positive' results (meaning, 'Yes, he used marijuana') for marijuana users and 80% true negative results for non-users—but also generates 20% false positives for non-users. Only 5% of people actually do use marijuana. What is the probability that a random person who tests positive is really a drug user?" This can be solved in a clear fashion using MixtureCategoricalDistribution:

In[19]:=
population = ResourceFunction["MixtureCategoricalDistribution"][
  CategoricalDistribution[{"users", "nonusers"}, {0.05, 0.95}], {CategoricalDistribution[{"negative", "positive"}, {0.1, 0.9}], CategoricalDistribution[{"negative", "positive"}, {0.8, 0.2}]}]
Out[19]=
In[20]:=
PDF[population, {"users", "positive"}]/
 PDF[MarginalDistribution[population, 2], "positive"]
Out[20]=

Publisher

Seth J. Chandler

Version History

  • 1.0.0 – 11 June 2020

Related Resources

License Information