Using a latent Hawkes process for epidemiological

Full text

Turn on search term navigation

Introduction

The novel coronavirus disease (COVID-19) has been declared a Global Health Emergency of International Concern with over 557 million cases and 6.36 million deaths as of 3 August 2022 according to the World Health Organization. In the absence of vaccines, countries initially followed mitigation strategies or countermeasures to prevent the rapid spread of COVID-19, such as social distancing, quarantine, mask wearing, and lock-downs.

A large number of studies have been carried out to understand the spread of COVID-19, forecast new cases and when the peak of the pandemic will occur, and investigate “what-if-scenarios”. For example, Ferguson et al. [1] presented the results of epidemiological modelling looking at a variety of nonpharmaceutical interventions. Several compartmental models [2–5] using ordinary differential equations (ODE) have been proposed for modelling the spread of COVID-19. Various models using Hawkes processes [6–12], widely used to model contagion patterns, have been introduced as an alternative to ODE models. Others have used a Poisson autoregression model of the daily new observed cases [13] and a Bayesian model linking the infection cycle to observed deaths [14].

We introduce a novel epidemic model using a latent Hawkes process [15] with temporal covariates for modelling the infections. Unlike other Hawkes models, the Hawkes process is used as a latent, i.e. for modelling the actual unobserved infection cases. Observations, such as reported infection cases, are then modelled as random quantities driven by the latent Hawkes process. Other models that use the latent processes in epidemiological models (e.g. [14]) usually have time-aggregated counts of infections as latent process, i.e. the latent process works on a discrete scale. We propose using a Kernel Density Particle Filter (KDPF) [16, 17] for inference of both latent cases and reproduction number and for predicting the new cases in the near future. It is feasible to employ particle filter type algorithms, like the KDPF, because the computational effort is linear to the number of infections. Modelling the infections via a Hawkes process allows us to estimate by whom an infected individual was infected. We demonstrate the performance of the proposed algorithm on synthetic data and COVID-19 reported cases in various local authorities in the UK. The methods [10, 18] provide similar estimates of reproduction number to the proposed algorithm. The ability of our model to estimate individual latent cases and reveal epidemic dynamics provides an important advantage over other models.

Related work

The Hawkes process is a well known self-exciting process in which the intensity function depends on all previous events assuming infinite population that allow for parametric or non-parametric estimation of the reproduction number (that is, the expected number of infections triggered per infected individual). Hawkes processes have been widely used in numerous applications such as social media, criminology and earthquake modelling. In this section, we present the application of the Hawkes processes in the modelling of COVID-19.

First, we briefly review basic compartmental models and their connection with Hawkes process and COVID. The Susceptible-Infected-Recovered (SIR) and Susceptible-Exposed-Infected-Recovered (SEIR) models are the two basic compartmental epidemic models for modelling the spread of infectious disease [5, 19]. The SIR model defines three classes of individuals: those susceptible to infection (S), those currently infected (I) and those recovered (R). The SEIR model involves an additional compartment (E) that models the exposed individuals without having obvious symptoms. For many diseases, including COVID-19, there is an incubation period during which exposed individuals to the virus may not be as contagious as the infectious individuals. A variant of the SEIR model called SuEIR was introduced by Zou et al. [5] for modelling and forecasting the spread of COVID. The SuEIR compared to SEIR has an additional compartment (u) that models the unreported cases. Estimates based on compartmental models can be unreliable, as they are highly sensitive to initial conditions and parameters such as transmission and recovery rates [8].

A stochastic formulation of SIR called Stochastic SIR [20] is a point process having events that are either the recovery times or the infection times of individuals with exponentially distributed recovery times. Rizoiu et al. [21] introduced the SIR-Hawkes process (also known as HawkesN), which is a generalization of the Hawkes process concerning finite population. They showed that the conditional intensity of the SIR-Hawkes process with no background events and exponential infectious period distribution is identical to the expected conditional intensity of Stochastic SIR with respect to the recovery period distribution. The Hawkes process with gamma infectious period distribution can approximate stage compartment models if the average waiting times in the compartments follow an independent exponential distribution [12, 22].

Kresin et al. [7] claim that although the SEIR model is mostly used for COVID modelling compared to the Hawkes process, a Hawkes model offers more accurate forecasts. Specifically, they suggest a SEIR-Hawkes model in which the intensity of newly exposed cases is a function of infection times and size of the population. Chiang et al. [12] introduced a Hawkes process model of COVID-19 that estimates the intensity of cases and the reproduction number. The reported cases are modelled via a Hawkes process. The reproduction number is estimated via a Poisson regression with spatial-temporal covariates including mobility indices and demographic features. Based on the branching nature of the Hawkes process, Escobar [8] derived a simple expression for the intensities of reported and unreported COVID-19 cases. The key to this model is that at the beginning of a generation the infectious will either (1) be registered, (2) not be registered but continue being contagious, or (3) recover with fixed probabilities. However, we believe that the probability of remaining contagious and not being registered infectious should be a decreasing function of time and not fixed.

Garetto et al [6] proposed a modulated marked Hawkes process for modelling the spread of COVID-19 under the impact of countermeasures. Each mark corresponds to a different class of infectious individuals with specific kernel functions. Three classes of infectious are considered: symptomatic, asymptomatic and superspreader, for obtaining the average intensity function and the average total number of points up to a specific time. Symptomatic people are those who will develop evident symptoms and by extension they will be quarantined. Asymptomatic people are those who will not develop strong enough symptoms to be quarantined. Superspreaders are individuals who exert a high infection rate but do not get quarantined. The model estimates the reproduction number taking into account the amount of recourses employed by the health service to discover the infected population, the countermeasures, as well as the stages that all infectious go through: random incubation time, presymptomatic period, random disease period and random residual phase.

Koyama et al. [10] developed a discrete-time Hawkes model for estimating the temporally changing reproduction number, and hence detecting the change points via assuming a negative binomial distribution for the daily cases. Further analysis in [9, 23] examined the daily death data to avoid the issues raised from the reported cases. Browning et al. [9] modelled the reported daily deaths using a discrete-time Hawkes process, where the cases are assumed Poisson distributed. They considered one fixed change point that breaks the period of analysis into two phases: the initial period where the virus is spreading rapidly and the period after the introduction of preventative measures. The model provides accurate predictions for short-time intervals.

All the aforementioned stochastic Hawkes models use the Hawkes process for modelling either the reported infections or the newly exposed cases. Herein, we provide a novel epidemic model for the infections using a latent Hawkes process with temporal covariates and, in turn, the reported cases using a probability distribution driven by the underlying Hawkes process. Working on a continuous scale offers the inference of individual latent cases and reveals unobserved transmission paths of the epidemic. We apply particle methods for inferring the latent cases and the reproduction number and predicting observed cases over short time horizons. The simulation analysis shows that the estimated reproduction number and the intensity of latent cases depict the epidemic’s development and capture the trajectory of cases.

Methods

Model

We introduce a novel epidemic model using a latent Hawkes process of infections that then trigger a process of reported infection cases.

We focus on an infinite homogeneous population and restrict our attention to an epidemic process over a horizon [T₀, T), T₀ < T, in which we assume immunity to re-infection. This immunity is a reasonable assumption over the time scales we consider. We break the horizon [T₀, T) into k subintervals for j = 1, .., k with T_k = T. We assume that the epidemic is triggered by a set of infectious individuals at the beginning of the process, the times of their infections denoted by a finite set .

The epidemic process is seen as a counting process N(t) with a set of jump times and intensity given byfor t > 0 with being the set of all infection events prior to time t. The kernel h(t − t_i) represents the relative infectiousness at time t of an infection at time t_i. We assume that the transition kernel h is a probability density function with non-negative real-valued support: h: [0, ∞)→[0, ∞) and . The process R(t) represents the instantaneous reproduction number that is the average number of newly infected people that each infected individual would infect if the conditions, such as interventions and control measures for restriction of epidemic, remained as they were at time t [18].

It is natural to see the reported infections as a counting process M(t) with a set of jump times and intensity of observed cases at time τ as a function of the times of infection up to time τ, namely(1)for τ > 0, where β is the expected number of observed cases per infected individual at time τ (also known as ascertainment rate). The transition kernel g(τ − t_i) represents the relative delay between the infection at time t_i and the time at τ the infection is detected. Similar to the transition kernel of latent cases h, we specify the transition kernel of observed cases g to be a probability density function with non-negative real-valued support.

M(t) is usually only observable in daily or weekly aggregates. We will use as aggregation intervals and let Y_n be the number of reported cases in . We model Y_n via a distribution G having mean μ_n equal to the expected observed cases in given by

The usual options of G are Negative Binomial (NB) [10, 24] and Poisson distribution [9, 18]. We model the reproduction number R(t) as a stepwise function having as many weights as the number of subintervals, that is,where {R_n} is assumed to be a Markov process. Usually, a random walk on a logarithmic scale [25] or a normal scale [10] is imposed as a prior on the weights {R_n}.

The model is described by the equations: (2) (3) (4) (5) (6)

Inference algorithm

Given a set of observed infections {Y₁, .., Y_k}, we seek to infer the counting process N(t) and the reproduction number R(t).

The proposed epidemic model described by the Eqs (2)–(6) is seen as a state-space model with a latent state process {X_n: 1 ≤ n ≤ k} and an observed process {Y_n: 1 ≤ n ≤ k}. Each hidden state X_n consists of the reproduction number’s weight R_n associated to and the set of latent cases falling into . The time-constant parameters are the parameters associated with the distribution G and the prior imposed on the weights . We apply a KDPF [16, 17] for inferring the counting process N(t), the weights , and the time-constant parameters. We consider that the ascertainment rate β is given.

We focus on illustrating the performance of our model on COVID-19. As the COVID-19 reported cases are subject to erroneous observation and for the data we observe that the sample variance is larger than the sample mean, we model the observed cases Y_n via a negative binomial distribution (NB) with mean μ_n and dispersion v > 0. We use the following form of the negative binomial distributionwith mean E(Y_n) = μ_n and variance var(Y_n) = μ_n(1 + vμ_n). Before we discuss the KDPF, we define the transition kernels of the observed and latent cases and the prior on weights for COVID-19. We also suggest a simple method to initialize .

Transition kernels.

The dynamics of the latent and observed cases are determined by the generation interval (GI) and incubation period (IP) [26]. The generation interval is the time interval between the time of infection of the infector (the primary case) and that of the infectee (the secondary case generated by the primary case). The incubation period is the time interval between the infection and the onset of symptoms in a specific case. Zhao et al. [27] assume that the GI and IP follow a gamma distribution. They infer that the mean and SD of GI are equal at 6.7 days and 1.8 days and those of IP at 6.8 and 4.1 days by using a maximum likelihood estimation approach and contact tracing data of COVID-19 cases without considering COVID-19 variants. We follow the same assumption for the GI (namely, the transition kernel of latent cases is a gamma density with a mean at 6.7 days and SD of 1.8 days). We model the time interval between the observed time and actual time of infection as a gamma density with a mean at 8.8 days and SD of 4.1 days (that is, the transition kernel of observed cases is a gamma density having mean equal at 8.8 days and SD of 4.1 days). For the transition kernel of the observed events, we adopt the values inferred by Zhao et al. [27] for IP with a slightly increased mean to consider the necessary time for conducting a test against COVID-19. Fig 1 illustrates the transition kernels. We also conduct a sensitivity analysis in the mean of GI and the period between observed and actual infection times using the real cases in the local authority Ashford (19/12/2021—9/4/2022) [28] available in S1 Appendix.

[Figure omitted. See PDF.]

Set of infectious at the beginning of the process, .

We adopt a heuristic approach to initialize . The transition kernel of latent cases illustrated in Fig 1 shows that a latent case at t_w can influence the latent intensity at t if t_w has occurred at most 21 days before t. Otherwise, the influence of t_w is negligible. Therefore, as the history of the process, we consider the latent cases of 21 days/3 weeks before the beginning of the process. The mode of the transition kernel of the observed cases equal to 6.216 demonstrates that an event is most likely to be observed 7 days after the actual infection time. Considering the observed cases are daily, we initialize the history of latent case, by uniformly spreading on the day −i the number of cases occurred on the day (−i + 7) times 1/β. In simulation analysis, we propose initialization of when we deal with weekly reported cases.

Imposed prior on weights .

A geometric random walk (RW) is imposed as prior on weights :

We impose a gamma prior on the noise of RW ϵ_n with equal shape and rate at d. This induces that the weight R_n is gamma distributed with a mean equal to R_n−1 and standard deviation . The stronger fluctuations in the observed data, the more flexible modelling we need. Smaller values of d have higher standard deviation and lead to a wider range of possible values of R_n increasing the flexibility of the model.

Kernel Density Particle Filter.

We apply a KDPF (Algorithm 2) for inferring the counting process N(t), the weights , and the time-constant parameters. The time-constant parameters for modelling COVID-19 infections are the shape d of the noise ϵ_n and the dispersion parameter v.

The KDPF builds on the auxiliary particle filter (APF) [29–31] by adding small random perturbations to all the parameter particles to reduce the sample degeneracy by modelling the time-constant parameters as random quantities and their posterior via a mixture of normal distributions. We assume independence among the time-constant parameters, and, following Sheinson et al. [16], we use logarithms for the time-constant parameters, as they have positive support:

The posteriors p(log d_n+1|Y_1:(n+1)) and p(log v_n+1|Y_1:(n+1)) are smoothly approximated via a mixture of normal distributions weighted by the sample weights w_jn given bywhere is a Gaussian pdf with mean μ and variance σ². The KDPF uses a tuning parameter Δ ∈ (0, 1] and two quantities as a function of that parameter: h² = 1−((3Δ − 1)/(2/Δ))² and a² = 1 − h². The parameter Δ is typically taken to be between 0.95 and 0.99 for reducing the chance of degeneracy [16, 32].

The mean values and the variances of the posteriors of time-constant parameters are defined as follows [16, 32]: with and .

Following Sheinson et al. [16], we define the initial densities of parameters d and v to be log-normal: considering that d_min ≤ d ≤ d_max and v_min ≤ v ≤ v_max. The transition densities of the time-constant parameters are given by

The initial density of the hidden process is given bywhile the transition density is given by

U(R₁;α, b) denotes that R₁ is uniformly distributed within the interval [α, b].

Sampling the latent cases.

We sample the latent cases falling into the subinterval by applying Algorithm 1, which is a simulation procedure based on the branching structure of the Hawkes process [15]. The proposed algorithm is a superposition of Poisson processes in the interval ; the descendants of each latent event at t_i form an inhomogeneous Poisson process with intensityfor t > t_i and t ∈ [T_n−1, T_n). This induces that:

* The number of events n_i triggered by an event at t_i in the interval is Poisson distributed with parameter

* The arrival times of the n_i descendants are t_i + E_i with E_i being iid random variables with pdf the truncated distribution h(t) in [max(t_i, T_n−1), T_n).

The computational cost of Algorithm 1 is linear to the number of infections falling into (η + 1) consecutive subintervals, that is , with η being the number of former subintervals that influence the latent cases falling into determined by the transition kernel of latent cases. The O-notation denotes the asymptotic upper bound [33].

Algorithm 1 Sample

Who infected whom.

The Hawkes process is an excellent option for modelling the evolution of an epidemic due to its mutually exciting nature, making it feasible to estimate by whom an infected individual was infected. Bertozzi et al. [11] describe how we can infer the primary infection i that triggered a secondary infection j using a self-exciting branching process. The parent of each infection j falling into is assumed to be sampled from a multinomial distribution parameterized by π_j, where withbeing the probability of secondary infection j having been caused by primary infection i, where , and η the number of former subintervals that influence the latent cases falling into determined by the transmission kernel of latent cases (η = 21 days for COVID-19). Alternatively, by recording the parent of each latent infection at step 7 of Algorithm 1, the proposed model can show the branching structure of the process. This approach increases the computational complexity of the algorithm, as more memory units will be required.

Computational complexity.

The computational cost of each propagation step (steps 7 and 12 of Algorithm 1) at state (interval) n is equal to the cost of Algorithm 1 times the number of particles (N), that is . The cost of finding weights (steps 8 and 13 of Algorithm 2) at state (interval) n is also . Hence, the computational cost of Algorithm 2 over all states (intervals) is . is the set of latent cases falling into subinterval and . The algorithm is easily parallelized over N.

Model complexity.

The set of parameters for inference includes the two time-constant parameters, d governing the variability of the noise in the reproduction number and v the dispersion parameter of the observed counts, the latent process, , and the steps of instantaneous reproduction number, . There is a set of model parameters, including the ascertainment rate β, the transition kernels of latent and observed cases, which we consider as given. The set of infectious at the beginning of the process, , is described applying the heuristic approach described above. We rely on the Bayesian paradigm for regularizing the parameters for inference.

Fixed-lag smoothing densities.

As the resampling step leads to path degeneracy, it is difficult to obtain a good approximation of the smoothing density p(x_1:T|y_1:T) for large T via SMC. Therefore, we use SMC to sample from the fixed-lag smoothing densities with lag L. Resampling results in replicating samples, and in the long run results in a lack of diversity called particle degeneracy [34]. We apply the multinomial resampling step when the Effective Sample Size (ESS) is less than the 80% of the number of particles, to avoid unnecessary resampling steps.

Algorithm 2 Kernel density particle filter

Results

Simulation analysis

We carried out a simulation study on synthetic data to illustrate the performance of the KDPF (Algorithm 2) for inferring the intensity of latent cases, the reproduction number and the time-constant parameters.

Two different scenarios illustrated in Fig 2 were simulated as follows:

* Scenario A: The process is triggered by 1745 infectious and the times of their infections, , are uniformly allocated in 3 weeks ([0, 21)) with a day being the time unit. We generate weekly latent and observed cases according to the model Eqs 2–6 for weeks 1–20 ([21, 161)) given , v = 0.014, d = 14.44, β = 0.5 and R₁ = 1.79. We are interested in inferring the latent cases in weeks 4−19 with being the set of times of latent infections in weeks 1–3. Using the generated observed cases in weeks 2–4, we estimate the latent infections in weeks 1–3 as follows: The latent cases in the week i are equal to the number of observed events in the week (i + 1) times 1/β, and are spread uniformly in [(i − 1) × 7 + 21, i × 7 + 21) for 1 ≤ i ≤ 3. We assume α = 1, b = 2, d_min = 10, d_max = 20, v_min = 0.0001 and v_max = 0.5. The ground truth is characterized by consisting of 4855 seeds, while the estimated seeds are 4228. The observed cases in weeks 4–20 are 17540 (see Fig 2).

* Scenario B: The process is triggered by 1176 seeds and generated as described above. We assume d = 15.28, v = 0.001, β = 0.5, R₁ = 1.51, d_min = 10, d_max = 20, v_min = 0.001, v_max = 0.5, α = 1 and b = 2. The observed cases in weeks 4–20 are 15448 (see Fig 2).

[Figure omitted. See PDF.]

We deal with 16 hidden states . Each state X_n is associated with the latent cases falling during the week and the parameter R_n associated with that week. We infer the latent intensity λ^N(t) and the weights as well as the weekly latent cases via the particle sample derived by drawing samples from the smoothing density with lag equal to 4.

Figs 3 and 4 illustrate the estimated latent intensity, the estimated weekly hidden cases and the estimated weights of the reproduction number for both scenarios using 40000 particles. We note that the 99% Credible Intervals (CIs) of the time-constant parameters include the actual values of the parameters. The simulation analysis shows that the KDPF approaches well the ground truth.

[Figure omitted. See PDF.]

The vertical dotted lines show the beginning of each week in the period we examine. (a) . (b) Weekly latent cases. (c) Intensity of latent cases.

[Figure omitted. See PDF.]

The vertical dotted lines show the beginning of each week in the period we examine. (a) . (b) Weekly latent cases. (c) Intensity of latent cases.

To confirm the convergence of posterior estimates of weights and weekly hidden cases concerning the number of particles (N), we find the associated Monte Carlo Standard Errors (MCSEs) that give a sense of the variability of particle mean per state. The MCSEs of the average of posterior means of weights and weekly latent cases are given byandwhere var(z) is the variance of z and Y_i the aggregate latent cases in i_th week. The MCSE verifies the convergence of posterior estimates concerning the number of particles (see Tables 1 and 2).

[Figure omitted. See PDF.]

Finally, we compare the performance of the KDPF (Algorithm 2), APF (Algorithm 3), bootstrap filter (BF) (Algorithm 4) and particle marginal Metropolis-Hastings (PMMH) (Algorithm 5) [35] for inferring the latent intensity λ^N(t) and the reproduction number R(t) illustrated in a new simulation scenario C (see Fig 5). Scenario C concerns a process triggered by 661 infectious and generated similar to scenario A assuming that α = 0.5, b = 2, d = 15.11, v = 0.01, R₁ = 1.57, d_min = 10, d_max = 20, v_min = 0.0001 and v_max = 0.5. The time-constant parameters d and v are known for BF and APF. We used 10000 iterations of the PMMH sampler with a burn-in of 5000 iterations. We use APF using 50 particles as an SMC sampler. The average acceptance ratio is about 0.1844 resulting in a Markov chain that mixes well. For the KDPF, Δ was set to 0.99.

[Figure omitted. See PDF.]

The vertical dotted lines show the beginning of each week in the period we examine. (a) The weekly observed cases. (b) The estimated intensity of latent cases. (c) The estimated weights . (d) The estimated weekly hidden cases.

We find the Average Absolute Error (AAE) and the Root Mean Square Error (RMSE) of the computed estimates defined in Forecasting. Table 3 shows the errors related to KDPF, APF, BF and PMMH for scenario C. The errors associated with KDPF are comparable to those obtained using BF and APF for which the time-constant parameters are known. The performance of KDPF compares well with PMMH, having the advantage that it is a more computationally efficient algorithm than PMMH.

[Figure omitted. See PDF.]

Algorithm 3 Auxiliary particle filter

Algorithm 4 Bootstrap filter

Real data

We apply the KDPF (Algorithm 2) to real cases in the local authorities: Leicester (4/9/2021—24/12/2021) [36], Kingston upon Thames (11/12/2021—8/4/2022) [37] and Ashford (19/12/2021—9/4/2022) [28] available from the government in the UK. Fig 6 illustrates the daily and weekly observed cases in the local authorities. We deal with 16 hidden states and 16 subintervals ; each subinterval corresponds to the duration of one week. We infer the latent intensity λ^N(t), the reproduction number R(t), and the weekly and daily latent cases via the particle sample derived by drawing samples from the smoothing density with lag equal to 4. We demonstrate that the proposed model can be applied to predict the new observed cases over short time horizons.

[Figure omitted. See PDF.]

(a) The weekly observed cases in Ashford. (b) The daily observed cases in Ashford. (c) The weekly observed cases in Kingston upon Thames. (d) The daily observed cases in Kingston upon Thames. (e) The weekly observed cases in Leicester. (f) The daily observed cases in Leicester.

We assume that the initial reproduction number during the first week is uniformly distributed over the interval from 0.5 to 2. Our initialization includes the 90% Confidence Interval published from the government in the UK: 0.9–1.1 on 4/9/2021 and 11/12/2021, 1–1.2 on 19/12/2021 [38]. We also assume d_min = 1, d_max = 10, v_min = 0.0001 and v_max = 0.5.

Figs 7–9 show the estimated latent and observed intensity, the estimated weekly and daily hidden cases, the estimated reproduction number and the time-constant parameters in the local authorities. We illustrate the intensity of observed cases, approximating via Eq 1. We note that the estimated latent intensity and the estimated latent cases are in agreement with the reported cases. According to the analysis, the instantaneous reproduction number R(t) depicts the pandemic’s development and capture dynamics. For the COVID-19 pandemic, there is a maximum delay of 21 days between the reported and actual infection times, which provides information regarding the progression of the epidemic. As a result, estimates have become more uncertain towards the end of the horizon.

[Figure omitted. See PDF.]

The time interval between two successive pink vertical dashed lines corresponds to a week. (a) The estimated weekly latent cases (posterior median (black line); 99% CI (ribbon)), and the weekly observed cases (cyan line). (b) The estimated daily latent cases (posterior median (black line); 99% CI (ribbon)), and the daily observed cases (cyan line). (c) The estimated reproduction number (posterior median (red line); 95% CI (ribbon)). (d) The estimated intensity of latent cases (posterior median (blue line); 99% CI (ribbon)) and the estimated intensity of observed cases (posterior median (red line); 99% CI (ribbon)). (e) The 99% CI of d. (f) The 99% CI of v.

[Figure omitted. See PDF.]

To assess the performance of our algorithm, we compute the mean absolute percentage error (MAPE) of the computed estimate of weekly observed cases (see Algorithm 6):where Y_i and are the true and estimated weekly observed cases via the posterior median in week i, respectively. The metric is 1.46%, 1.08% and 2.09% for Ashford, Leicester and Kingston upon Thames. Fig 10 shows the estimated weekly observed cases for the local authorities. The analysis demonstrates that our algorithm provides a good approximation of the weekly reported cases.

[Figure omitted. See PDF.]

(a) Ashford. (b) Kingston upon Thames. (c) Leicester.

Algorithm 5 Particle marginal Metropolis-Hastings sampler

Algorithm 6 Estimating the weekly observed cases

We compare the proposed algorithm with two methods of estimating the reproduction number. The method suggested by Cori et al. [18] (EpiEstim) estimates the reproduction number from incidence time series using a Bayesian framework with a gamma distributed prior imposed on the reproduction number. An alternative method suggested by Koyama et al. [10] is a state-space method for estimating the daily reproduction number from a time series of daily reported infections using a random walk prior to the reproduction number and log-normal distribution as the distribution of the serial interval (SI). We assume that the mean and standard deviation of the SI distribution is at 6.9 days and 5.6 days following Zhao et al. [27]. We apply EpiEstim by using the gamma and log-normal distribution as the distribution of SI. Both choices lead to identical results.

Fig 11 shows the weekly average of daily estimates of the reproduction number via posterior median derived by the method of Koyama et al. [10] and the posterior medians of R(t) given by EpiEstim and the proposed algorithm following the course of the pandemic. The method of Koyama et al. [10] and EpiEstim provide similar estimates to those of Algorithm 2 most of the time. Koyama et al. [10] and EpiEstim do not build a delay between reported and actual infection time in their models, which is why there are variations in their estimations compared to our algorithm. Therefore, the reproduction number given by EpiEstim responds later to changes compared to our estimation. Koyama et al. [10] shows a bit less of a time lag, which we conjecture to be due to it working with daily reproduction numbers and cases (which are being shown averaged in Fig 11). In the first week, the estimates of Koyama et al. [10] and EpiEstim have essential higher values than one of the proposed algorithm due to different assumptions about the initialization of the epidemic.

[Figure omitted. See PDF.]

(a) Ashford. (b) Kingston upon Thames. (c) Leicester.

We also compare the estimated rate of latent cases λ^N(t) and observed cases λ^M(t) with the estimated daily number of events derived by Koyama et al. [10]. Fig 12 shows that the expected daily number of events is almost identical to λ^M(t) and in agreement with λ^N(t) after the end of the 3rd week. The differences in the first three weeks are due to different initializations of the methods.

[Figure omitted. See PDF.]

(a) Ashford. (b) Kingston upon Thames. (c) Leicester.

Forecasting.

Using the proposed model, it is also possible to predict the number of new observed cases in the near future, by fitting the model with data up to week and forecasting cases for the week using Algorithm 7. To analyse the performance of this algorithm, we conduct a rolling-window analysis and predict the observed cases in weeks . Table 4 shows the estimated numbers in the local authorities by applying Algorithm 7 and the method introduced by Koyama et al. [10], assuming that the reproduction number remains at the value obtained for the last day. Table 5 shows the metrics MAPE and AAE of the estimated cases via posterior median. The empirical coverage probability of our 80% CIs is about 86%. Our estimates are similar to those given by Koyama et al. [10] most of the time.

[Figure omitted. See PDF.]

Algorithm 7 Predicting the new observed cases in week

Discussion

In this paper, we introduce a novel epidemic model using a latent Hawkes process with temporal covariates. Unlike other Hawkes models, we model the infections via a Hawkes process and the aggregated reported cases via a probability distribution G with a mean driven by the underlying Hawkes process. The usual options of G are Negative Binomial and Poisson distribution. We propose a KDPF for inferring the latent cases and the instantaneous reproduction number and for predicting the new observed cases over short time horizons. We demonstrate the performance of the proposed algorithm on COVID-19.

The analysis of synthetic data shows that KDPF compares well with PMMH, having the advantage that it is a more computationally efficient algorithm than PMMH. We also demonstrate that our predicted new cases, and our inference for the latent intensity, the daily and weekly hidden cases are consistent with the observed cases in various local authorities in the UK. The simulation analysis shows that the proposed algorithm provides comparable estimates of observed case fluctuations compared with those of Koyama et al. [10]. The method of Koyama et al. [10] and EpiEstim provide similar estimates of the reproduction number to the proposed algorithm.

The simulation analysis shows that working with daily reported infections leads to better Effective Sample Sizes using a smaller number of particles, as the data spikes are reduced. According to Cori et al. [18], the estimates of the instantaneous reproduction number are expected to be affected by the selection of the time window size. Large sizes result in more smoothing and reductions in statistical noise, whereas small sizes result in faster detection of transmission changes and more statistical noise. They suggest an appropriate way of choosing the time window size. We have selected a weekly time window to analyse the real data in line with Cori et al. [18].

Uncovering disease dynamics and tracing how and by whom an infected individual was infected is challenging due to unobservable transmission routes [39, 40]. Modelling the infections via a Hawkes process allows us to model infection dynamics.

Isham and Medley [41]; Wallinga et al. [42] contend that it is necessary to account for individual heterogeneities while modelling the transmission of an infectious disease. Individuals vary in their tendency to interact with others; personal hygiene is a key factor in the propagation of diseases; individuals’ community structure and location might be significant in spreading epidemics. The proposed epidemic model can be viewed as a turning point in deriving epidemic models that consider individual heterogeneities and provide insight into underlying dynamics that is the subject of our future work. Future work also considers the inference of ascertainment rate (β), using various transition kernels for modelling the latent and reported infection cases, as well as more sophisticated ways for initializing the set of infectious triggering the epidemic process, .

Supporting information

S1 Appendix.

https://doi.org/10.1371/journal.pone.0281370.s001

Acknowledgments

We thank Professor Young for their constructive comments on this manuscript.

Citation: Lamprinakou S, Gandy A, McCoy E (2023) Using a latent Hawkes process for epidemiological modelling. PLoS ONE 18(3): e0281370. https://doi.org/10.1371/journal.pone.0281370

About the Authors:

Stamatina Lamprinakou

Roles: Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

E-mail: [email protected] (SL); [email protected] (AG)

Affiliation: Department of Mathematics, Imperial College London, London, United Kingdom

Axel Gandy

Roles: Conceptualization, Supervision, Validation, Writing – review & editing

E-mail: [email protected] (SL); [email protected] (AG)

Affiliation: Department of Mathematics, Imperial College London, London, United Kingdom

ORICD: https://orcid.org/0000-0002-6777-0451

Emma McCoy

Roles: Supervision, Validation, Writing – review & editing

Affiliation: Department of Mathematics, Imperial College London, London, United Kingdom

ORICD: https://orcid.org/0000-0001-9033-7472

References

1. Ferguson, N., Laydon, D., Nedjati Gilani, G., Imai, N., Ainslie, K., Baguelin, M., et al. Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. (Imperial College in London, 2020). https://doi.org/10.25561/77482

2. Chen, Y., Cheng, J., Jiang, Y. & Liu, K. A time delay dynamic system with external source for the local outbreak of 2019-nCoV. Applicable Analysis. pp. 1-12 (2020)

3. Wangping J., Ke H., Yang S., Wenzhe C., Shengshu W., Shanshan Y., et al. Extended SIR prediction of the epidemics trend of COVID-19 in Italy and compared with Hunan, China. Frontiers In Medicine. 7 pp. 169 (2020) pmid:32435645

4. Roques L., Klein E., Papaix J., Sar A. & Soubeyrand S. Using early data to estimate the actual infection fatality ratio from COVID-19 in France. Biology. 9, 97 (2020) pmid:32397286

5. Zou, D., Wang, L., Xu, P., Chen, J., Zhang, W. & Gu, Q. Epidemic Model Guided Machine Learning for COVID-19 Forecasts in the United States. MedRxiv. (2020), https://www.medrxiv.org/content/early/2020/05/25/2020.05.24.20111989

6. Garetto, M., Leonardi, E. & Torrisi, G. A time-modulated Hawkes process to model the spread of COVID-19 and the impact of countermeasures. ArXiv Preprint ArXiv:2101.00405. (2021)

7. Kresin, C., Schoenberg, F. & Mohler, G. Comparison of the Hawkes and SEIR models for the spread of COVID-19. (2020), To appear in Advances and Applications in Statistics

8. Escobar J. A Hawkes process model for the propagation of COVID-19: Simple analytical results. EPL (Europhysics Letters). 131, 68005 (2020)

9. Browning R., Sulem D., Mengersen K., Rivoirard V. & Rousseau J. Simple discrete-time self-exciting models can describe complex dynamic processes: A case study of COVID-19. Plos One. 16, e0250015 (2021) pmid:33836020

10. Koyama S., Horie T. & Shinomoto S. Estimating the time-varying reproduction number of COVID-19 with a state-space method. PLoS Computational Biology. 17, e1008679 (2021) pmid:33513137

11. Bertozzi A., Franco E., Mohler G., Short M. & Sledge D. The challenges of modeling and forecasting the spread of COVID-19. Proceedings Of The National Academy Of Sciences. 117, 16732–16738 (2020) pmid:32616574

12. Chiang W., Liu X. & Mohler G. Hawkes process modeling of COVID-19 with mobility leading indicators and spatial covariates. International Journal Of Forecasting. (2021), https://www.sciencedirect.com/science/article/pii/S0169207021001126 pmid:34276115

13. Agosto A. & Giudici P. A Poisson autoregressive model to understand COVID-19 contagion dynamics. Risks. 8, 77 (2020)

14. Flaxman S., Mishra S., Gandy A., Unwin H., Mellan T., Coupland H., et al. Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe. Nature. 584, 257–261 (2020) pmid:32512579

15. Laub, P., Taimre, T. & Pollett, P. Hawkes processes. ArXiv Preprint ArXiv:1507.02822. (2015)

16. Sheinson D., Niemi J. & Meiring W. Comparison of the performance of particle filter algorithms applied to tracking of a disease epidemic. Mathematical Biosciences. 255 pp. 21–32 (2014) pmid:25016201

17. Liu J. & West M. Combined parameter and state estimation in simulation-based filtering. Sequential Monte Carlo Methods In Practice. pp. 197–223 (2001)

18. Cori A., Ferguson N., Fraser C. & Cauchemez S. A new framework and software to estimate time-varying reproduction numbers during epidemics. American Journal Of Epidemiology. 178, 1505–1512 (2013) pmid:24043437

19. Jones, J. Notes on R0. Department of Anthropological Sciences: Stanford, CA, USA. (Stanford University Stanford, CA, USA, 2007)

20. Allen L. An introduction to stochastic epidemic models. Mathematical Epidemiology. pp. 81–130 (2008)

21. Rizoiu M., Mishra S., Kong Q., Carman M. & Xie L. SIR-Hawkes: Linking epidemic models and Hawkes processes to model diffusions in finite populations. Proceedings Of The 2018 World Wide Web Conference. pp. 419–428 (2018)

22. Lloyd A. Destabilization of epidemic models with the inclusion of realistic distributions of infectious periods. Proceedings Of The Royal Society Of London. Series B: Biological Sciences. 268, 985–993 (2001)

23. Triambak S. & Mahapatra D. A random walk Monte Carlo simulation study of COVID-19-like infection spread. Physica A: Statistical Mechanics And Its Applications. 574 pp. 126014 (2021) pmid:33875903

24. Stocks T., Britton T. & Höhle M. Model selection and parameter estimation for dynamic epidemic models via iterated filtering: application to rotavirus in Germany. Biostatistics. 21, 400–416 (2020) pmid:30265310

25. Storvik, G., Palomares, A., Engebretsen, S., Rø, G., Engø-Monsen, K., Kristoffersen, A., et al. A sequential Monte Carlo approach to estimate a time varying reproduction number in infectious disease models: the Covid-19 case. ArXiv Preprint ArXiv:2201.07590. (2022)

26. Fine P. The interval between successive cases of an infectious disease. American Journal Of Epidemiology. 158, 1039–1047 (2003) pmid:14630599

27. Zhao S., Tang B., Musa S., Ma S., Zhang J., Zeng M., et al. Estimating the generation interval and inferring the latent period of COVID-19 from the contact tracing data. Epidemics. 36 pp. 100482 (2021) pmid:34175549

28. GOV.UK Ashford Reported Cases. (2022), https://coronavirus.data.gov.uk/details/download, [Online; accessed 29-April-2022]

29. Kantas N., Doucet A., Singh S., Maciejowski J. & Chopin N. On particle methods for parameter estimation in state-space models. Statistical Science. 30, 328–351 (2015)

30. Doucet A., Johansen A. & Others A tutorial on particle filtering and smoothing: Fifteen years later. Handbook Of Nonlinear Filtering. 12, 3 (2009)

31. Pitt M. & Shephard N. Filtering via simulation: Auxiliary particle filters. Journal Of The American Statistical Association. 94, 590–599 (1999)

32. West M. Approximating posterior distributions by mixtures. Journal Of The Royal Statistical Society: Series B (Methodological). 55, 409–422 (1993)

33. Cormen, T., Leiserson, C., Rivest, R. & Stein, C. Introduction to algorithms. (MIT press, 2022)

34. Endo A., Van Leeuwen E. & Baguelin M. Introduction to particle Markov-chain Monte Carlo for disease dynamics modellers. Epidemics. 29 pp. 100363 (2019) pmid:31587877

35. Andrieu C., Doucet A. & Holenstein R. Particle Markov chain Monte Carlo methods. Journal Of The Royal Statistical Society: Series B (Statistical Methodology). 72, 269–342 (2010)

36. GOV.UK Leicester Reported Cases. (2022), https://coronavirus.data.gov.uk/details/download, [Online; accessed 12-January-2022]

37. GOV.UK Kingston upon Thames Reported Cases. (2022), https://coronavirus.data.gov.uk/details/download, [Online; accessed 21-April-2022]

38. GOV.UK The R value and growth rate. (2022), https://www.gov.uk/guidance/the-r-value-and-growth-rate, [Online; last time accessed 24-November-2022]

39. Yang, S. & Zha, H. Mixture of mutually exciting processes for viral diffusion. International Conference On Machine Learning. pp. 1-9 (2013)

40. Kim M., Paini D. & Jurdak R. Modeling stochastic processes in disease spread across a heterogeneous social system. Proceedings Of The National Academy Of Sciences. 116, 401–406 (2019) pmid:30587583

41. Isham, V. & Medley, G. Models for infectious human diseases: their structure and relation to data. (Cambridge University Press, 1996)

42. Wallinga J., Edmunds W. & Kretzschmar M. Perspective: human contact patterns and the spread of airborne infectious diseases. TRENDS In Microbiology. 7, 372–377 (1999) pmid:10470046

Word count: 7377

Show less

© 2023 Lamprinakou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Understanding the spread of COVID-19 has been the subject of numerous studies, highlighting the significance of reliable epidemic models. Here, we introduce a novel epidemic model using a latent Hawkes process with temporal covariates for modelling the infections. Unlike other models, we model the reported cases via a probability distribution driven by the underlying Hawkes process. Modelling the infections via a Hawkes process allows us to estimate by whom an infected individual was infected. We propose a Kernel Density Particle Filter (KDPF) for inference of both latent cases and reproduction number and for predicting the new cases in the near future. The computational effort is proportional to the number of infections making it possible to use particle filter type algorithms, such as the KDPF. We demonstrate the performance of the proposed algorithm on synthetic data sets and COVID-19 reported cases in various local authorities in the UK, and benchmark our model to alternative approaches.

Details

Title

Using a latent Hawkes process for epidemiological modelling

Author

Lamprinakou, Stamatina; Gandy, Axel

; McCoy, Emma

First page

e0281370

Section

Research Article

Publication year

2023

Publication date

Mar 2023

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0281370

ProQuest document ID

2781185955

Using a latent Hawkes process for epidemiological modelling

Jump to:

Full text

Abstract

Details

Suggested sources