Function Repository Resource:

MissingDataLogLikelihood

Source Notebook

Compute a log-likelihood for data with missing values

Contributed by: Sjoerd Smit

ResourceFunction["MissingDataLogLikelihood"][dist, data]

computes the log-likelihood for obervations data from the distribution dist, assuming that the probabilities that values are Missing is independent of the observed values.

Details

ResourceFunction["MissingDataLogLikelihood"] uses MarginalDistribution to obtain distributions for data rows where elements are Missing.

Examples

Basic Examples (2) 

Compute a log-likelihood from data with missing values:

In[1]:=
ResourceFunction["MissingDataLogLikelihood"][
 BernoulliDistribution[p], {0, 1, 1, Missing[], 1, Missing[], 0}]
Out[1]=

Since this is univariate data, this is equivalent to:

In[2]:=
LogLikelihood[BernoulliDistribution[p], DeleteMissing@{0, 1, 1, Missing[], 1, Missing[], 0}]
Out[2]=

Compute a log-likelihood for a multivariate distribution:

In[3]:=
ResourceFunction["MissingDataLogLikelihood"][
 DirichletDistribution[{a1, a2, a3, a4}],
 \!\(\*
TagBox[
RowBox[{"{", 
RowBox[{
RowBox[{"{", 
RowBox[{
InterpretationBox[
StyleBox["\<\"0.450136\"\>",
ShowStringCharacters->False,
"NodeID" -> 12],
0.45013577777665764`,
AutoDelete->True], ",", 
InterpretationBox[
StyleBox["\<\"0.297068\"\>",
ShowStringCharacters->False,
"NodeID" -> 13],
0.29706770109764097`,
AutoDelete->True], ",", 
RowBox[{"Missing", "[", "]"}]}], "}"}], ",", 
RowBox[{"{", 
RowBox[{
InterpretationBox[
StyleBox["\<\"0.578003\"\>",
ShowStringCharacters->False,
"NodeID" -> 14],
0.5780027759958483,
AutoDelete->True], ",", 
InterpretationBox[
StyleBox["\<\"0.0273841\"\>",
ShowStringCharacters->False,
"NodeID" -> 15],
0.0273840870029496,
AutoDelete->True], ",", 
RowBox[{"Missing", "[", "]"}]}], "}"}], ",", 
RowBox[{"{", 
RowBox[{
RowBox[{"Missing", "[", "]"}], ",", 
RowBox[{"Missing", "[", "]"}], ",", 
InterpretationBox[
StyleBox["\<\"0.371392\"\>",
ShowStringCharacters->False,
"NodeID" -> 16],
0.37139167367775067`,
AutoDelete->True]}], "}"}], ",", 
RowBox[{"{", 
RowBox[{
InterpretationBox[
StyleBox["\<\"0.0115941\"\>",
ShowStringCharacters->False,
"NodeID" -> 17],
0.011594090904890523`,
AutoDelete->True], ",", 
RowBox[{"Missing", "[", "]"}], ",", 
InterpretationBox[
StyleBox["\<\"0.350979\"\>",
ShowStringCharacters->False,
"NodeID" -> 18],
0.3509791338002293,
AutoDelete->True]}], "}"}], ",", 
RowBox[{"{", 
RowBox[{
InterpretationBox[
StyleBox["\<\"0.0607383\"\>",
ShowStringCharacters->False,
"NodeID" -> 19],
0.060738304995325135`,
AutoDelete->True], ",", 
InterpretationBox[
StyleBox["\<\"0.32696\"\>",
ShowStringCharacters->False,
"NodeID" -> 20],
0.32695977786381486`,
AutoDelete->True], ",", 
InterpretationBox[
StyleBox["\<\"0.268544\"\>",
ShowStringCharacters->False,
"NodeID" -> 21],
0.2685435470656679,
AutoDelete->True]}], "}"}], ",", 
RowBox[{"{", 
RowBox[{
InterpretationBox[
StyleBox["\<\"0.230961\"\>",
ShowStringCharacters->False,
"NodeID" -> 22],
0.2309612456768055,
AutoDelete->True], ",", 
RowBox[{"Missing", "[", "]"}], ",", 
InterpretationBox[
StyleBox["\<\"0.545698\"\>",
ShowStringCharacters->False,
"NodeID" -> 23],
0.5456982769697271,
AutoDelete->True]}], "}"}]}], "}"}],
NumberForm]\)]
Out[3]=

Applications (1) 

Find a maximum-likelihood estimate for a BinormalDistribution with randomly missing values:

In[4]:=
SeedRandom[1234];
data = Table[RandomChoice[{1, 1} -> {Missing[], j}],
   {i, RandomVariate[BinormalDistribution[{-1, 2}, {3, 1}, 0.5], 200]},
   {j, i}
   ];
In[5]:=
dist = BinormalDistribution[{m1, m2}, {s1, s2}, r];
assum = DistributionParameterAssumptions[dist];
vars = DeleteDuplicates@Flatten[List @@ dist];
NMaximize[
 {ResourceFunction["MissingDataLogLikelihood"][dist, data], assum},
 vars,
 Method -> "RandomSearch"
 ]
Out[8]=

Publisher

Sjoerd Smit

Version History

  • 1.0.1 – 21 June 2023
  • 1.0.0 – 07 June 2023

Source Metadata

License Information