Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Estimate the full size of a set given the number of distinct results in a sample
ResourceFunction["SupportSizeEstimate"][samples,distincts] estimates the full population using a given number of distincts in the samples. |
Ask five hundred people when their birthday is and count the number of distinct results:
In[1]:= |
Out[1]= |
Based on that result, make an estimate for the number of days in a year:
In[2]:= |
Out[2]= |
Calculate the number of birthdays for Saturn, but keep the number secret:
In[3]:= |
Count the number of distinct results in fifty thousand random birthdays on Saturn:
In[4]:= |
Out[4]= |
With sample sizes 50,000 and 21,265, distinct results estimate how many days per year there are on Saturn:
In[5]:= |
Out[5]= |
Sample sorted subsets and use that to estimate the the full support size:
In[6]:= |
Out[6]= |
The actual answer:
In[7]:= |
Out[7]= |
Sample sorted 4-tuples and use that to estimate the the full support size:
In[8]:= |
Out[8]= |
This sampling method is not uniformly distributed, so the support size estimate is an undercount:
In[9]:= |
Out[9]= |
If the number of distinct items is the same as the sample size, you will need a larger sample.
This work is licensed under a Creative Commons Attribution 4.0 International License