covidcast package for COVID-19-related data

(This is a PSA post, where I share a package that I think that might be of interest to the community but I haven’t looked too deeply into myself.)

Today I learnt of the covidcast R package, which provides access to the COVIDcast Epidata API published by the Delphi group at Carnegie Mellon University. According to the covidcast R package website,

This API provides daily access to a range of COVID-related signals Delphi that builds and maintains, from sources like symptom surveys and medical claims data, and also standard signals that we simply mirror, like confirmed cases and deaths.

(There is a corresponding python package with similar functionality.) The Delphi group has done a huge amount of work in logging a wide variety of COVID-related data and making it available, along with tools to visualize and make sense of the data.

For those interested in doing COVID-related analyses, I think this is a treasure trove of information for you to use. The covidcast package contains several different types of data (which they call “signals”), including public behavior (e.g. COVID searches on Google), early indicators (e.g. COVID-related doctor visits) and late indicators (e.g. deaths). Documentation on the signals available can be found here. (Note: The data is US-focused right now; I don’t know if there are plans to include data from other countries.)

Let me end off with a simple example showing what you can do with this package. This example is modified from one of the package vignettes; see the Articles section of the package website for more examples.

The package is not available on CRAN yet but can be downloaded from Github:

devtools::install_github("cmu-delphi/covidcast", ref = "main",
                         subdir = "R-packages/covidcast")

The code below pulls data on cumulative COVID cases per 100k people on 2020-12-31 at the county level. covidcast_signal is the function to use for pulling data, and it returns an object of class c("covidcast_signal", "data.frame").

library(covidcast)

# Cumulative COVID cases per 100k people on 2020-12-31
df <- covidcast_signal(data_source = "usa-facts", 
                   signal = "confirmed_cumulative_prop",
                   start_day = "2020-12-31", end_day = "2020-12-31")
summary(df)
# A `covidcast_signal` data frame with 3142 rows and 9 columns.
# 
# data_source : usa-facts
# signal      : confirmed_cumulative_prop
# geo_type    : county
# 
# first date                          : 2020-12-31
# last date                           : 2020-12-31
# median number of geo_values per day : 3142

There is a plot method for calss covidcast_signal objects:

plot(df)

The automatic plot is usually not bad. The plot method comes with some arguments that the user can use to customize the plot (full documentation here):

breaks <- c(0, 500, 1000, 5000, 10000)
colors <- c("#D3D3D3", "#FEDDA2",  "#FD9950", "#C74E32", "#800026")
plot(df, choro_col = colors, choro_params = list(breaks = breaks),
     title = "Cumulative COVID cases per 100k people on 2020-12-31")

The plot returned is actually created using the ggplot2 package, so it is possible to add your own ggplot2 code on top of it:

library(ggplot2)
plot(df, choro_col = colors, choro_params = list(breaks = breaks),
     title = "Cumulative COVID cases per 100k people on 2020-12-31") +
  theme(title = element_text(face = "bold"))