Function Repository Resource:

NYTimesCOVID19Data

Source Notebook

Import data from the New York Times COVID-19 United States county-by-county data

Contributed by: Bob Sandheinrich and Jesse Friedman

ResourceFunction["NYTimesCOVID19Data"][type]

retrieve the specified data type from the New York Times Github repository.

Details and Options

Available types are "USCounties" and "USCountiesTimeSeries".
ResourceFunction["NYTimesCOVID19Data"][] gives "USCounties" data.
Every evaluation of ResourceFunction["NYTimesCOVID19Data"] retrieves fresh data from the web. It is not useful to call this function more frequently than the data is updated by the New York Times. Instead, store it as a variable or use Once to avoid repeated downloads.
ResourceFunction["NYTimesCOVID19Data"] accepts a MaxItems option. When set to integer n, the n most recent entries are returned. By default, all entires are included, which can cause a slow evaluation.

Examples

Basic Examples (2) 

Retrieve a formatted dataset with the latest ten thousand data points:

In[1]:=
AbsoluteTiming[
 data = ResourceFunction["NYTimesCOVID19Data"][MaxItems -> 10000];]
Out[1]=

Get the twenty entries with the most deaths:

In[2]:=
TakeLargestBy[data, "Deaths", 20]
Out[2]=

Show each county only once:

In[3]:=
data[Reverse/*DeleteDuplicatesBy[#County &]/*
  TakeLargestBy["Deaths", 20]]
Out[3]=

Get the full time series data for each state:

In[4]:=
AbsoluteTiming[
 timeseries = ResourceFunction["NYTimesCOVID19Data"]["USCountiesTimeSeries"];]
Out[4]=
In[5]:=
timeseries[[1 ;; 5]]
Out[5]=

Plot the number of deaths over time for the ten counties with the most deaths:

In[6]:=
DateListPlot[
 timeseries[TakeLargestBy[#Deaths["LastValue"] &, 10], "Deaths"], PlotRange -> Full]
Out[6]=

Plot the number of cases:

In[7]:=
DateListPlot[
 timeseries[TakeLargestBy[#Cases["LastValue"] &, 10], "Cases"], PlotRange -> Full]
Out[7]=

Scope (2) 

Create a map of case density in each county:

In[8]:=
countyLatestCaseCounts = ResourceFunction["NYTimesCOVID19Data"]["USCountiesTimeSeries", MaxItems -> 100000][All, #Cases["LastValue"] &];
In[9]:=
GeoRegionValuePlot[
 Normal@countyLatestCaseCounts[Select[IntegerQ]],
 PlotLegends -> Automatic,
 GeoRange -> Entity["Country", "UnitedStates"]
 ]
Out[9]=

Filter out cases from Florida:

In[10]:=
floridaCases = ResourceFunction["NYTimesCOVID19Data"]["USCountiesTimeSeries"][
   Select[#State === Entity["AdministrativeDivision", {"Florida", "UnitedStates"}] &], "Cases"];
In[11]:=
DateListLogPlot[
 floridaCases[
  KeyMap[StringTrim[First@StringSplit[CommonName@#, ","], " County"] &]],
 PlotRange -> All, PlotLegends -> Automatic, PlotLabel -> "COVID-19 cases in Florida by county"
 ]
Out[11]=

Properties and Relations (2) 

NYTimesCOVID19Data relies on the Entity framework to provide some caching. This can be slow on the first usage in each session:

In[12]:=
AbsoluteTiming[
 ResourceFunction["NYTimesCOVID19Data"][MaxItems -> 10000];]
Out[12]=

The second evaluation uses the cache for improved speed:

In[13]:=
AbsoluteTiming[
 ResourceFunction["NYTimesCOVID19Data"][MaxItems -> 10000];]
Out[13]=

Version History

  • 2.0.0 – 12 October 2020
  • 1.0.0 – 27 March 2020

Source Metadata

Related Resources

Author Notes

Updates: Added MaxItems option. Added Progress Bar. Improved speed of date handing.

License Information