Introduction
Responses to infectious disease epidemics use a growing body of data sources to inform decision making ( Cori et al., 2017; Fraser et al., 2009; WHO Ebola Response Team et al., 2014; WHO Ebola Response Team et al., 2015). While new data—such as whole genome pathogen sequences—are increasingly useful complements to epidemiological data ( Gire et al., 2014), epidemic curves—which describe the number of new cases through time (incidence)—remain the most important source of information, particularly early in an outbreak. Specifically epidemic curves(often referred to as ‘epicurves’) represent the number of new cases per time unit based on the date or time of symptom onset.
While conceptually simple, epicurves are useful in many respects. They provide a simple, visual outline of epidemic dynamics, which can be used for assessing the growth or decline of an outbreak ( Barrett et al., 2016; Fitzgerald et al., 2014; Jernberg et al., 2015; Lanini et al., 2014; Nhan et al., 2018) and therefore informing intervention measures ( Meltzer et al., 2014; WHO Ebola Response Team et al., 2014; WHO Ebola Response Team et al., 2015). In addition, epicurves also form the raw material used by a range of modelling techniques for short-term forecasting ( Cori et al., 2013; Funk et al., 2018; Nouvellet et al., 2018; Viboud et al., 2018) as well as in outbreak detection algorithms from syndromic surveillance data ( Farrington & Andrews, 2003; Unkel et al., 2012).
Because of the increasing need to analyse various types of epidemiological data in a single environment using free, transparent and reproducible procedures, the R software ( R Core Team, 2017) has been proposed as a platform of choice for epidemic analysis ( Jombart et al., 2014). But despite the existence of packages dedicated to time series analysis ( Shumway & Stoffer, 2010) as well as surveillance data ( Höhle, 2007), a lightweight and well-tested package solely dedicated to building, handling and plotting epidemic curves directly from linelist data (e.g. a spreadsheet where each row represents an individual case) is still lacking.
Here, we introduce incidence, an R package developed as part of the toolbox for epidemics analysis of the R Epidemics Consortium ( RECON) which aims to fill this gap. In this paper, we outline the package’s design and illustrate its functionalities using a reproducible worked example.
Methods
Package overview
The philosophy underpinning the development of incidence is to ‘do the basics well’. The objective of this package is to provide simple, user-friendly and robust tools for computing, manipulating, and plotting epidemic curves, with some additional facilities for basic models of incidence over time.
The general workflow (
Figure 1) revolves around a single type of object, formalised as the S3 class
incidence.
incidence objects are lists storing separately a matrix of case counts (with dates in rows and groups in columns), dates used as breaks, the time interval used, and an indication of whether incidence is cumulative or not (
Figure 1). The
incidence object is obtained by running the function
Figure 1.
Generalized workflow from incidence object construction to modeling and visualization.
The raw data is depicted in the top left as either a vector of dates for each individual case (typical usage) or a combination of both dates and a matrix of group counts. The incidence object is created from these where it checks and validates the timespan and interval between dates. Data subsetting and export is depicted in the upper right. Data visualization is depicted in the lower right. Addition of log-linear models is depicted in the lower left.
This package facilitates the manipulation of
incidence objects by providing a set of handler functions for the most common tasks. The function
The function
In line with RECON’s development guidelines, the incidence package is thoroughly tested via automatic tests implemented using testthat ( Wickham, 2011), with an overall coverage nearing 100% at all times. We use the continuous integration services travis.ci and appveyor to ensure that new versions of the code maintain all existing functionalities and give expected results on known datasets, including matching reference graphics tested using the visual regression testing implemented in vdiffr ( Henry et al., 2018). Overall, these practices aim to maximise the reliability of the package, and its sustainable development and maintenance over time.
Modeling utilities
Many different approaches can be used to model, and possibly derive predictions from incidence data (e.g. Cori et al., 2013; Nouvellet et al., 2018; Wallinga & Teunis, 2004), and are best implemented in separate packages (e.g. Cori et al., 2013). Here, we highlight three simple functionalities in incidence for estimating parameters via modeling or bootstrap and the two specialized data classes that are used to store the models and parameter estimates.
As a basic model, we implement the simple log-linear regression approach in the function
In the presence of both growing and decreasing phases of an epidemic, the date representing the peak of the epidemic can be estimated. In
incidence, this can be done in two ways. The function
The
Operation
The minimal system requirements for successful operation of this package is R version 3.1.
Use cases
Two worked examples are used to demonstrate the functionality and flexibility of the incidence package. The first example illustrates how to compute and manipulate stratified weekly incidence directly from a line-list, while the second example shows how to import pre-computed daily incidence and fit a log-linear model to estimate growth rate ( r) and doubling time for the growing phase 1.
Example 1: computing and manipulating stratified weekly incidence
In this first example, we use the dataset
1) Importing data
First, we load the dataset
2) Building the incidence object
The weekly incidence stratified by hospitals is computed by running the function
The generic
Note that when weekly incidence is computed from dates, like in this example, the ISO 8601 standard weeks are used by default with the argument
Figure 2.
Weekly epicurves stratified by hospitals for the simulated outbreak of EVD.
3) Manipulate the incidence object
In the above visualisation, it can be difficult to see what the dynamics were in the early stages of the epidemic. If we want to see the first 18 weeks of the outbreak in the four major hospitals, we can use the [ operator to subset the rows and columns, which represent weeks and hospitals, respectively, in this particular incidence object.
Here, because of the few numbers of cases in the first few weeks, we have also highlighted each case using
Figure 3.
Weekly epicurves stratified by hospitals representing the first eight weeks of simulated outbreak of EVD.
As shown in
Figure 2, the missing hospital name (NA) is treated as a separate group, resulting from the default of the argument
Example 2: importing pre-computed daily incidence and fitting log-linear model
The datasets
1) Import pre-computed daily incidence
Figure 4.
( A) stratified and ( B) pooled daily incidence plots of ZVD in Colombia, September 2015 to January 2016.
As shown in
Figure 4B, the pooled daily incidence in Colombia shows approximately exponential phases before and after the epidemic peak. Therefore, we fit two log-linear regression models around the peak to characterize the epidemic dynamics of ZVD in Colombia. Such models can be separately fitted to the two phases of the epicurve of
The returned object
The predictions and their 95% CIs from the two
incidence_fit objects, ‘before’ and ‘after’, can be added to the existing incidence plot of
Figure 5.
Fit two log-linear regression models, before and after the optimal splitting date.
Conclusion
This article has described the package incidence and its features—which include three lightweight data classes and utilities for data manipulation, plotting, and modeling. We have shown that an incidence object can flexibly be defined at different datetime intervals with any number of stratifications and be subset by groups or dates. The most important aspects of this package are use-ability and interoperability. For both field epidemiologists and academic modellers, the data received are often in the form of line-lists where each row represents a single case. We have shown that these data can easily be converted to an incidence object and then plotted with sensible defaults in two lines of code.
We have additionally shown that because the data are aggregated into a matrix of counts, it becomes simple to perform operations related to peak-finding, model-fitting, and exportation (e.g. using
Software availability
incidence available from:
https://www.repidemicsconsortium.org/incidence Code to reproduce all figures can be found by running
Source code available from: https://github.com/reconhub/incidence
Archived source code as at time of publication: https://doi.org/10.5281/zenodo.2540217 ( Jombart et al., 2019)
Software license: MIT
Data availability
Underlying data
Datasets used in the worked examples are from the outbreaks package:
ebola_sim_clean: https://github.com/reconhub/outbreaks/blob/master/data/ebola_ sim_clean.RData
zika_girardot_2015: https://github.com/reconhub/outbreaks/blob/master/data/zika_ girardot_2015.RData
zika_sanandres_2015: https://github.com/reconhub/outbreaks/blob/master/data/zika_sanandres_2015.RData
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright: © 2019 Kamvar ZN et al. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The epidemiological curve (epicurve) is one of the simplest yet most useful tools used by field epidemiologists, modellers, and decision makers for assessing the dynamics of infectious disease epidemics. Here, we present the free, open-source package incidence for the R programming language, which allows users to easily compute, handle, and visualise epicurves from unaggregated linelist data. This package was built in accordance with the development guidelines of the R Epidemics Consortium (RECON), which aim to ensure robustness and reliability through extensive automated testing, documentation, and good coding practices. As such, it fills an important gap in the toolbox for outbreak analytics using the R software, and provides a solid building block for further developments in infectious disease modelling. incidence is available from https://www.repidemicsconsortium.org/incidence.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer