Function Repository Resource:

# SupportSizeEstimate

Estimate the full size of a set given the number of distinct results in a sample

Contributed by: Ed Pegg Jr
 ResourceFunction["SupportSizeEstimate"][samples,distincts] estimates the full population using a given number of distincts in the samples.

## Examples

### Basic Examples (2)

Ask five hundred people when their birthday is and count the number of distinct results:

 In[1]:=
 Out[1]=

Based on that result, make an estimate for the number of days in a year:

 In[2]:=
 Out[2]=

Calculate the number of birthdays for Saturn, but keep the number secret:

 In[3]:=

Count the number of distinct results in fifty thousand random birthdays on Saturn:

 In[4]:=
 Out[4]=

With sample sizes 50,000 and 21,265, distinct results estimate how many days per year there are on Saturn:

 In[5]:=
 Out[5]=

### Applications (2)

Sample sorted subsets and use that to estimate the the full support size:

 In[6]:=
 Out[6]=

 In[7]:=
 Out[7]=

### Possible Issues (2)

Sample sorted 4-tuples and use that to estimate the the full support size:

 In[8]:=
 Out[8]=

This sampling method is not uniformly distributed, so the support size estimate is an undercount:

 In[9]:=
 Out[9]=

If the number of distinct items is the same as the sample size, you will need a larger sample.

## Version History

• 1.0.0 – 11 November 2019