Wolfram Function Repository
Instant-use add-on functions for the Wolfram Language
Function Repository Resource:
Estimate the full size of a set given the number of distinct results in a sample
ResourceFunction["SupportSizeEstimate"][samples,distincts] estimates the full population using a given number of distincts in the samples. |
Ask five hundred people when their birthday is and count the number of distinct results:
| In[1]:= |
| Out[1]= |
Based on that result, make an estimate for the number of days in a year:
| In[2]:= |
| Out[2]= |
Calculate the number of birthdays for Saturn, but keep the number secret:
| In[3]:= | ![]() |
Count the number of distinct results in fifty thousand random birthdays on Saturn:
| In[4]:= |
| Out[4]= |
With sample sizes 50,000 and 21,265, distinct results estimate how many days per year there are on Saturn:
| In[5]:= |
| Out[5]= |
Sample sorted subsets and use that to estimate the the full support size:
| In[6]:= | ![]() |
| Out[6]= |
The actual answer:
| In[7]:= |
| Out[7]= |
Sample sorted 4-tuples and use that to estimate the the full support size:
| In[8]:= | ![]() |
| Out[8]= |
This sampling method is not uniformly distributed, so the support size estimate is an undercount:
| In[9]:= |
| Out[9]= |
If the number of distinct items is the same as the sample size, you will need a larger sample.
This work is licensed under a Creative Commons Attribution 4.0 International License