1. Introduction
Healthcare services are prone to experiencing periods of high burden and demand for services (‘pressures’) during winter months each year. These pressures can lead to severe problems in delivering critical health services, During winter months, healthcare pressures are exacerbated by factors that can increase demand, including cold weather, respiratory pathogens, gastrointestinal pathogens and subsequent workforce absences [1]. In particular, the role of influenza and respiratory syncytial virus (RSV) in driving winter pressures has been extensively documented.
RSV is a major cause of bronchiolitis and bronchitis amongst young children [2] and although mainly produces mild symptoms, RSV infection can lead to severe illness in the immunocompromised [3] and is a major cause of death in infants globally [4].
During periods of heightened influenza and RSV activity, increases in demand can occur across a range of healthcare services from community physicians (general practitioners; GPs) through to specialist secondary care facilities. In England, RSV accounts for approximately 30,000 paediatric admissions in children aged <5 annually [5].
Identifying the key drivers underlying winter pressures is critical to understanding, managing and responding to the periods of high demand. Surveillance is a cornerstone of public health, monitoring changes in community-based activity of certain pathogens, diseases and conditions. Surveillance can provide a ‘view’ of key metrics that can be used to understand the drivers of pressures. Routinely collected surveillance data provide intelligence on those factors known to cause pressures e.g. monitoring increases in influenza cases. Surveillance data can also provide the opportunity to anticipate these pressures through predictions or forecasting.
Recently, advancements in machine learning have made it possible to develop more powerful and accurate forecasting models, utilising larger and more complex datasets. However, the key to developing accurate and timely models is the availability of suitable surveillance data that inform on healthcare service usage. Here, we use real-time syndromic surveillance data that are routinely collected as part of the UK Health Security Agency (UKHSA) public health surveillance programme to create short term forecasts for peak health care demand during periods of rising seasonal respiratory activity. We calculated forecast reliability to describe uncertainty around forecasts and piloted forecasts during the 2022–23 winter season and compared forecasts to actual activity.
2. Methods
We created two automated machine learning pipelines in R, firstly to select and train forecast models, secondly to create daily forecasts (Fig 1). We describe here the methods used following the flow of the pipelines.
[Figure omitted. See PDF.]
2.1 Data selection
The UKHSA coordinates a programme of real-time syndromic surveillance that supports and augments other UKHSA health surveillance programmes [6]. The UKHSA real-time syndromic surveillance systems monitor anonymised health service contacts from across the National Health Service (NHS) in England. For this pilot study, we used two syndromic indicators that are routinely part of the ongoing UKHSA daily syndromic surveillance service; NHS 111 telehealth calls for ‘cough’ and emergency department (ED) attendances for ‘acute bronchiolitis’. Both syndromic indicators were restricted to children aged five years or under because these indicators are known to be sensitive to seasonal outbreaks of RSV [7–9]. Using established indicators that are well understood aids interpretation and enables comparison with previous years.
We used anonymised health service data that are routinely used by UKHSA for public health surveillance of respiratory illnesses, including RSV. This study was part of ongoing work to improve the capabilities of UKHSA surveillance systems. As such, no specific approvals were required to use the anonymised data included this study.
2.2 Data cleaning and formatting
Firstly, data were smoothed to remove day of the week effects caused by weekends and public holidays [10]. A simple seven-day moving average would fail to account for the impacts of public holidays. Therefore, data were smoothed by weighting activity so that weekends and any public holidays within the week always contributed two sevenths of the weekly activity. Secondly data were normalised, so that the smoothed daily activity were in the range zero to one. Finally, derived variables were created for the forecast models (not all derived variables were used in all the models tested; Table 1) [11].
[Figure omitted. See PDF.]
2.3 Training models
Rather than restrict forecasts to a single methodology, for both indicators we tested a wide range of alternate models, using the data to select the best method for each indicator [11]. Firstly, we choose seven alternate supervised machine learning methods; linear regression, generalised linear models with elastic net regularization (with and without internal optimisation of parameter lambda), k-Nearest-Neighbour regression, random forest regression, support vector machine for regression, and eXtreme Gradient Boosting regression.
We created four options for alternative models, in order to test which of our derived variables were most useful for producing accurate forecasts:
* Option 1: either no seasonality or including a day of month variable or a Fourier Transformation.
* Option 2: don’t include a secular trend, or include a linear or a quadratic trend.
* Option 3: include the square of daily activity as an additional variable to enable the modelling or a quadratic rather than just linear relationship with daily activity.
* Option 4: to protect against the possibility of single-day spikes having an undue influence, use a 3 day moving average to replace the daily activity, trend and rate of change variables.
The combination of seven machine learning methods and the four options above gave (7 * 3 * 3 * 2 * 2) = 252 alternate ‘model specifications’ to be tested. Furthermore, for each model specification 28 models were created and trained to forecast from 1 to 28 days ahead respectively. Models were trained using historical data prior to October 2022, the supervised learning using actual data from either 1 to 28-days ahead of daily forecasts as the ‘labels’ for the target forecast. Historical data were split randomly into training and test data sets, 80% of the historical data being used for training.
2.4 Selection of ensemble model
For each of the 252 alternate model specifications an ensemble forecast was created which forecast when and how high activity was going to peak over the next 28 days. The ‘forecast peak’ for each model specification was defined as the highest value forecast by the individual 1 to 28-day ahead forecast models. Using the test data set, the forecast peaks were compared with the actual highest value or ‘peak’ that occurred in the 28 days following the forecast. An ‘intensity error’ was calculated as the difference between the height of the forecast peak and the actual peak. Similarly, a ‘timing error’ was calculated which was the difference in days between the day when activity peaked and the date when activity was forecast to peak. The intensity and timing errors were combined to give a single ‘forecast peak error’. The forecast peak error includes weighting to emphasise the importance of accurately forecasting peaks when activity is high [12].
The peak error measure can be described aswhere yd is the peak error on day d, xd is the actual smoothed count on day d, fd is the forecast peak intensity on day d, max(x) is the maximum of all actual and predicted smoothed counts, td is the difference in days between the date of the peak forecast and when the actual peak occurred, id is the difference between the peak forecast’s intensity and the actual peak, and max(i) is the maximum error seen in predicting forecast intensity. The peak error measure is zero if the peak forecast correctly identifies both the date and intensity of the peak. The measure increases as the difference between peak forecast intensity and actual peak intensity increases. Also, the measure increases as the difference between the forecast date and actual date of peak increases, but this increase is less if both actual and forecast activity is low.
The model specification which resulted in the smallest mean forecast peak error was selected for daily forecasts.
2.5 Model validation
Once the best model specification has been selected based on the training data, forecasts were retrained using all the available historical data. The historical intensity and timing errors were calculated for forecast peaks and used to estimate forecast uncertainty. To allow for variation in the standard deviation of errors as activity approaches a peak, a generalised additive model for location, scale and shape (gamlss) model was used to estimate standard deviation variation against the level of current activity [13]. Thus, we created uncertainty intervals which could vary as activity approaches a seasonal peak.
2.6 Creating daily reports
To produce daily forecasts, recent data are extracted and formatted using the same data processes as for training models. The validated model for each indicator is used to produce 1 to 28-day ahead forecasts based on the latest data available. These forecasts are used to create daily reports which predict when activity will peak in the next 28 days and at what level, including the uncertainty intervals.
3. Results
3.1 Model selection
The model specifications with the lowest forecast peak errors for both indicators used a random forest learning method, with seasonality modelled by Fourier transformations. However, the other specification options differed between the two indicators. The lowest forecast peak errors for the NHS 111 cough calls data included a quadratic term for activity, a quadratic secular trend and averaging over three consecutive days’ data points. Whilst for ED acute bronchiolitis attendances the lowest errors involved a linear trend and no quadratic term for activity or averaging over consecutive days. S1 Table shows the forecast peak errors for each model specification. In general, errors were lower for NHS 111 calls than for ED bronchiolitis attendances, with 52 NHS 111 model specifications performing better than the best ED specification. Overall, including seasonality improved peak forecasts, with Fourier transformations performing better than seasonality using months. The learning method with the lowest mean errors was random forest, followed by k-nearest neighbour regression (Table 2).
[Figure omitted. See PDF.]
3.2 Model validation
The gamlss models show that the variation in intensity errors increase as actual counts increase (Table 3). By contrast, the variation in timing errors decrease as actual counts increase.
[Figure omitted. See PDF.]
The gamlss model coefficients were used to create confidence intervals around the timing and intensity errors, which varied depending on the number of actual counts at the time of the forecast (Figs 2 and 3). Intensity errors are not symmetric because forecast peaks cannot be negative, thus when actual counts are low a forecast peak can over-estimate by more than it can under-estimate.
[Figure omitted. See PDF.]
Lines show 50% and 95% confidence intervals.
[Figure omitted. See PDF.]
Lines show 50% and 95% confidence intervals.
3.3 Pilot season 2022–23
During October 2022, ED acute bronchiolitis attendances in children aged under 5 years reporting to EDSSS increased until a peak of 220.0 attendances on 31 October. Subsequently, there was a decrease until 5 November before attendances increased again reaching a seasonal high of 311.4 attendances on 29 November. Similarly, NHS 111 calls for cough in children under 5 years rose to a peak of 991.6 calls on 22 October, decreased until 2 November and then started rising. However, whilst the increase in NHS 111 calls slowed prior to 30 November it was then followed a sharp increase in calls, reaching a seasonal high of 1,842.9 calls on 6 December 2022.
The seasonal peak in ED attendances at the end of November coincided with the usual timing of peak RSV activity seen in previous years (as monitored by laboratory reporting) [14]. The additional increase in NHS 111 calls after 30 November 2022 was unprecedented, being 39.3% higher than the previous highest winter peak, 1,323.4 on 7 December 2019. Consequently, the level of activity was outside the range of anything seen in the training data.
The pilot forecast made on 26 November 2022, forecast that ED attendances for acute bronchiolitis would peak at 340.1 attendances on 29 November (Fig 4). The same day forecast for NHS 111 calls predicted that they had already peaked. The timing for the ED forecast was correct but the level of the peak was an over-estimate of 29.3 (9.4%) attendances.
[Figure omitted. See PDF.]
Red squares are 28 day forecast, blue lines show 50% (dark blue) and 95% (light blue) data intervals around the peak forecast.
The NHS 111 forecasts failed to predict the unprecedented rise in NHS 111 cough calls in children in December 2022 until the rise had started. However, a forecast using a linear regression learning method proved to be adaptive, forecasting a later and higher seasonal peak once activity began to rise sharply at the start of December 2022 (Fig 5).
[Figure omitted. See PDF.]
Red squares are 28 day forecast, blue lines show 50% (dark blue) and 95% (light blue) data intervals around the peak forecast.
4. Discussion
4.1 Key findings
Machine learning pipelines can be used to train, select models and create daily forecast reports that predict the peak in demand for RSV activity over the following 28 days. During 2022 our pilot forecasts were able correctly predict the peak in ED acute bronchiolitis attendances in children under 5 years. During November 2022, our forecasts for NHS 111 cough calls in children aged under 5 years, predicted a similar peak was going to occur as in previous pre-pandemic years. However, as cough calls began to increase sharply at the start of December, our forecasts also began to change, predicting a later peak in December that was higher than previous years.
4.2 What was known before
Prior to the COVID-19 pandemic, seasonal activity for RSV was predictable in its seasonality, peaking in England at the end of November to beginning of December [15–17]. However, during 2020 and 2021, the seasonality of RSV was disrupted with no winter peak in 2020 and a deferred peak occurring in summer 2021 [7]. Traditional surveillance methods based on historical data and recurring seasonality [18] continued to predict syndromic indicators would rise in winter 2020 due to RSV. Whilst short-term forecasts based on recent trends are more adaptive, these too would not perform well during atypical seasons, unless seasonality was excluded from model variables [11].
4.3 Interpretation of findings
The unexpected dramatic rise in NHS 111 cough calls in December 2022 coincided with unusual increases in group A streptococcus (GAS) infections in children [19,20]. This unusual seasonal activity combined with the impact of media reporting [21] resulted in an unprecedented increase in NHS 111 calls in children, particularly those indicators linked to symptoms related to GAS infections e.g. sore throat, fever and cough. However, ED attendances were less affected by changes in patient presenting behaviour, and consequently ED syndromic indicators did not have an additional large peak. The differences between syndromic systems illustrate some of the strengths and weaknesses of syndromic surveillance. Syndromic indicators cannot identify specific causal pathogens, thus NHS 111 cough calls although sensitive to RSV are not specific enough to exclude other causal factors. Thus, NHS 111 cough calls were not a reliable indicator for assessing the total burden of health care demand attributable to RSV. However, if policy and decision-makers need to understand the current pressures on health services from all causes then syndromic indicators are more sensitive than pathogen-specific surveillance such as laboratory reporting. Importantly, a syndromic surveillance service that comprises a range of data sources across the spectrum of health services is better able to distinguish between pressures due to changes in underlying disease incidence and those due to changing patient behaviour.
4.4 Limitations
Inevitably, forecasts trained on historical data will perform better when current data are within the range and seasonality seen previously. Previously, we have shown that a model that is not trained to expect a recurring seasonal pattern performs better during atypical seasons [11]. However, the unexpected peak in December 2022 NHS 111 calls was not out of season and so the accuracy of forecasts was not due to inclusion of seasonality variables. In this case, we found that some of the regression methods which performed best using our forecast peak error measure, e.g. random forest, generated forecasts that assumed activity had already peaked. By contrast, using simple linear regression generated forecasts that correctly predicted activity was going to continue to increase in line with current trends. Therefore, the simpler method outperformed the method automatically selected by our machine learning algorithm.
The unprecedented increase in NHS 111 cough calls in children during December 2022 revealed a limitation in the use of this indicator for forecasting peak pressures due to RSV. The exceptional additional winter pressures were not due to RSV but due to reaction to GAS media reports. Thus, without additional intelligence, a report intended to show pressures due to RSV could have misinterpreted as showing that RSV activity was exceptionally high.
4.5 Comparison with existing approaches
Previous research into machine learning forecasts have, similarly, compared a number of different model methods [22–24]. For example, Do et al. compared the mean square errors for 16 different model methods [22]. Whilst Do et al. studied diarrhoeal disease, Castro Blanco et al. considered the onset of the winter influenza season in Spain, they compared random forest, support vector machine and logistic regression, finding logistic regression to be the most accurate [23].
Machine learning methods have been used for forecasting where the analysis is complicated due to the large number of predictive variables [22–24]. By contrast, we have deliberately restricted our predictive variables to the syndromic indicators readily available during ongoing daily surveillance. Similarly our approach did not require mechanistic assumptions about the development of infectious diseases, whilst these approaches can provide accurate forecasts they require considerably more work to develop [25]. Our approach is in line with a US study into 22 rival influenza forecasting methods which found that the timeliness of reporting and integration into real-time public health decision-making are as important as forecast accuracy [26].
Our approach was to compare different machine learning models and use the method that best predicted our data, an alternative approach used elsewhere is to take a weighted average of all models [22]. Whilst these ‘ensemble forecast methods’ inevitably fit the data better, they lack transparency, and it is difficult to explain the theoretical justification for individual forecasts.
4.6 Public health implications
Short-term forecasts can provide additional information compared to existing surveillance baselines based on previous years because they are more adaptive to recent changes in trends. Our automated pipeline for creating short-term forecasts of seasonal peaks is useful in identifying the timing and intensity during typical seasons. Furthermore, the process of using machine learning methods to produce a reproducible automated pipeline means new indicators can easily and quickly be added to syndromic reports. However, automated reporting of trends and forecasts for syndromic data should always be accompanied with expert interpretations which can warn of emerging events. For instance, where a real-time change in patient behaviour means one or more indicators is no longer comparable with previous years. Improved automation and real-time interpretation are important as we may need to create forecasts quickly when notified that there is an increase in disease incidence. Also, the same pipeline can be used to assess other causal pathogens, including influenza and SARS-Cov2. Similarly, it may be possible to model non-infectious diseases such as allergic rhinitis (hay fever) where the historical data includes recurring seasonal peaks.
4.7 Recommendations and future work
Importantly, when current data are outside the range of training data, or seasonality does not match the training data forecasts should be interpreted in caution. We’d recommend that any forecasts used for routine surveillance include tests for data that are outside the range of testing data. Also, we suggest that when increases start to occur out-of-season forecast models are selected that do not include seasonality variables.
In future, it may be possible to provide better forecasts during atypical seasons by weighting the training data to give more emphasis to the rare events. Also, synthetic data could be incorporated in training data to allow for plausible events that have not yet occurred in the training data, e.g. out-of-season outbreaks, or more virulent pathogens.
Supporting information
S1 Table. Forecast peak errors for each model specification.
https://doi.org/10.1371/journal.pone.0292829.s001
(DOCX)
Acknowledgments
The authors would like to thank the syndromic data providers including: NHS 111 and NHS England (NHS 111 telehealth), and emergency department clinicians and NHS Trusts and NHS England supporting emergency department syndromic surveillance. Roger A. Morbey, Dan Todkill and Alex J Elliot are affiliated with the NIHR Health Protection Research Unit (HPRU) in Emergency Preparedness and Response at King’s College London. Alex J Elliot is affiliated with the NIHR HPRU in Gastrointestinal Infections at University of Liverpool. Dan Todkill is supported by the NIHR Applied Research Collaboration (ARC) West Midlands. The views expressed are those of the author(s) and not necessarily those of the NIHR, the UK Health Security Agency or the Department of Health and Social Care.
References
1. 1. Scobie S. Snowed under: understanding the effects of winter on the NHS https://www.nuffieldtrust.org.uk/resource/snowed-under-understanding-the-effects-of-winter-on-the-nhs accessed 2 April 2023.
2. 2. Li Y, Wang X, Blau DM, Caballero MT, Feikin DR, Gill CJ et al. Global, regional, and national disease burden estimates of acute lower respiratory infections due to respiratory syncytial virus in children younger than 5 years in 2019: a systematic analysis. Lancet 2022, 399(10340):2047–64. Epub 20220519. pmid:35598608.
* View Article
* PubMed/NCBI
* Google Scholar
3. 3. Obando-Pacheco P, Justicia-Grande AJ, Rivero-Calle I, Rodriguez-Tenreiro C, Sly P, Ramilo O et al. Respiratory syncytial virus seasonality: a global overview. J Infect Dis 2018, 217(9):1356–64. Epub 20181104. pmid:29390105.
* View Article
* PubMed/NCBI
* Google Scholar
4. 4. Zhang S, Akmar LZ, Bailey F, Rath BA, Alchikh M, Schweiger B et al. Cost of respiratory syncytial virus-associated acute lower respiratory infection management in young children at the regional and global level: a systematic review and meta-analysis. J Infect Dis 2020, 222(Suppl 7):S680–S87. Epub 20201007. pmid:32227101.
* View Article
* PubMed/NCBI
* Google Scholar
5. 5. Reeves RM, Hardelid P, Gilbert R, Warburton F, Ellis J, Pebody RG. Estimating the burden of respiratory syncytial virus (RSV) on respiratory hospital admissions in children less than five years of age in England, 2007–2012. Influenza Other Respir Viruses 2017, 11(2):122–29. Epub 20170121. pmid:28058797.
* View Article
* PubMed/NCBI
* Google Scholar
6. 6. UK Health Security Agency. Syndromic surveillance: systems and analyses https://www.gov.uk/government/collections/syndromic-surveillance-systems-and-analyses accessed 10 May 2023.
7. 7. Bardsley M, Morbey RA, Hughes HE, Beck CR, Watson CH, Zhao H et al. Epidemiology of respiratory syncytial virus in children younger than 5 years in England during the COVID-19 pandemic, measured by laboratory, clinical, and syndromic surveillance: a retrospective observational study. Lancet Infect Dis 2023, 23(1):56–66. Epub 20220906. pmid:36063828.
* View Article
* PubMed/NCBI
* Google Scholar
8. 8. Hughes HE, Morbey R, Hughes TC, Locker TE, Pebody R, Green HK et al. Emergency department syndromic surveillance providing early warning of seasonal respiratory activity in England. Epidemiol Infect 2016, 144(5):1052–64. pmid:26415918.
* View Article
* PubMed/NCBI
* Google Scholar
9. 9. Morbey RA, Harcourt S, Pebody R, Zambon M, Hutchison J, Rutter J et al. The burden of seasonal respiratory infections on a national telehealth service in England. Epidemiol Infect 2017, 145:1922–32. pmid:28413995.
* View Article
* PubMed/NCBI
* Google Scholar
10. 10. Buckingham-Jeffery E, Morbey R, House T, Elliot AJ, Harcourt S, Smith GE. Correcting for day of the week and public holiday effects: improving a national daily syndromic surveillance service for detecting public health threats. BMC Public Health 2017, 17(1):477. pmid:28525991.
* View Article
* PubMed/NCBI
* Google Scholar
11. 11. Morbey R, Todkill D, DeAngelis D, Charlett A, Elliot A. DiD IT?: A differences-in-differences investigation tool to quantify the impact of local incidents on public health using real-time syndromic surveillance health data. Epidemiol Infect 2023, 151:e56. Epub 20230315. pmid:36919204.
* View Article
* PubMed/NCBI
* Google Scholar
12. 12. Morbey RA, Todkill D, Watson C, Elliot AJ. Machine learning forecasts for seasonal epidemic peaks: Lessons learnt from an atypical respiratory syncytial virus season. PLoS ONE 2023, 18(9):e0291932. Epub 20230922. pmid:37738241.
* View Article
* PubMed/NCBI
* Google Scholar
13. 13. Nelder JA, Wedderburn RWM. Generalized Linear Models. J R Stat Soc Ser A 1972, 135(3):370–84.
* View Article
* Google Scholar
14. 14. UK Health Security Agency. Respiratory infections: laboratory reports 2022 https://www.gov.uk/government/publications/respiratory-infections-laboratory-reports-2022 accessed 29 November 2024.
15. 15. Goddard NL, Cooke MC, Gupta RK, Nguyen-Van-Tam JS. Timing of monoclonal antibody for seasonal RSV prophylaxis in the United Kingdom. Epidemiol Infect 2007, 135(1):159–62. Epub 20060607. pmid:16753078.
* View Article
* PubMed/NCBI
* Google Scholar
16. 16. Fleming DM, Taylor RJ, Lustig RL, Schuck-Paim C, Haguinet F, Webb DJ et al. Modelling estimates of the burden of respiratory syncytial virus infection in adults and the elderly in the United Kingdom. BMC Infect Dis 2015, 15:443. pmid:26497750.
* View Article
* PubMed/NCBI
* Google Scholar
17. 17. Hardelid P, Pebody R, Andrews N. Mortality caused by influenza and respiratory syncytial virus by age group in England and Wales 1999–2010. Influenza Other Respir Viruses 2013, 7(1):35–45. pmid:22405488.
* View Article
* PubMed/NCBI
* Google Scholar
18. 18. Morbey RA, Elliot AJ, Charlett A, Verlander NQ, Andrews N, Smith GE. The application of a novel ’rising activity, multi-level mixed effects, indicator emphasis’ (RAMMIE) method for syndromic surveillance in England. Bioinformatics 2015, 31(22):3660–65. Epub 20151115. pmid:26198105.
* View Article
* PubMed/NCBI
* Google Scholar
19. 19. Guy R, Henderson KL, Coelho J, Hughes H, Mason EL, Gerver SM et al. Increase in invasive group A streptococcal infection notifications, England, 2022. Eurosurveillance 2023, 28(1):pii = 2200942. pmid:36695450.
* View Article
* PubMed/NCBI
* Google Scholar
20. 20. UK Health Security Agency. Group A streptococcal infections: activity during the 2022 to 2023 season https://www.gov.uk/government/publications/group-a-streptococcal-infections-activity-during-the-2022-to-2023-season/group-a-streptococcal-infections-12th-update-on-seasonal-activity-in-england accessed 31 March 2023.
21. 21. Nikhab A, Morbey R, Todkill D, Elliot AJ. Using a novel ‘difference-in-differences’ method and syndromic surveillance to estimate the change in local healthcare utilisation during periods of media reporting in the early stages of the COVID-19 pandemic in England. Public Health 2024, 232:132–37. pmid:38776588.
* View Article
* PubMed/NCBI
* Google Scholar
22. 22. Do TD, Nguyen TD, Ta VC, Anh DT, Tran Thi T-H, Phan D, Mai ST. Dynamic weighted ensemble for diarrhoea incidence predictions. Mach Learn 2023, 113:2129–52.
* View Article
* Google Scholar
23. 23. Castro Blanco E, Dalmau Llorca MR, Aguilar Martín C, Carrasco-Querol N, Gonçalves AQ, Hernández Rojas Z et al. A predictive model of the start of annual influenza epidemics. Microorganisms 2024, 12(7):1257. pmid:39065025.
* View Article
* PubMed/NCBI
* Google Scholar
24. 24. Su K, Xu L, Li G, Ruan X, Li X, Deng P et al. Forecasting influenza activity using self-adaptive AI model and multi-source data in Chongqing, China. EBioMedicine 2019, 47:284–92. Epub 20190830. pmid:31477561.
* View Article
* PubMed/NCBI
* Google Scholar
25. 25. Andronico A, Paireau J, Cauchemez S. Integrating information from historical data into mechanistic models for influenza forecasting. PLoS Comput Biol 2024, 20(10):e1012523. pmid:39475955.
* View Article
* PubMed/NCBI
* Google Scholar
26. 26. Reich NG, Brooks LC, Fox SJ, Kandula S, McGowan CJ, Moore E et al. A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States. Proc Natl Acad Sci U S A 2019, 116(8):3146–54. Epub 20190117. pmid:30647115.
* View Article
* PubMed/NCBI
* Google Scholar
Citation: Morbey RA, Todkill D, Moura P, Tollinton L, Charlett A, Watson C, et al. (2025) Using machine learning to forecast peak health care service demand in real-time during the 2022–23 winter season: A pilot in England, UK. PLoS ONE 20(1): e0292829. https://doi.org/10.1371/journal.pone.0292829
About the Authors:
Roger A. Morbey
Roles: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliation: Real-time Syndromic Surveillance Team, Field Services, Health Protection Operations, UK Health Security Agency, Birmingham, United Kingdom
ORICD: https://orcid.org/0000-0001-8543-477X
Dan Todkill
Roles: Conceptualization, Funding acquisition, Project administration, Writing – review & editing
Affiliations: Real-time Syndromic Surveillance Team, Field Services, Health Protection Operations, UK Health Security Agency, Birmingham, United Kingdom, Health Sciences, Warwick Medical School, University of Warwick, Coventry, United Kingdom
Phil Moura
Roles: Conceptualization, Methodology, Software, Writing – review & editing
Affiliation: Department of Health and Social Care, London, United Kingdom
Liam Tollinton
Roles: Software, Validation, Writing – review & editing
Affiliation: Health Analytics and Automation, Data Analytics and Surveillance, UK Health Security Agency, London, United Kingdom
Andre Charlett
Roles: Validation, Writing – review & editing
Affiliation: Statistics, Modelling and Economics Division, UK Health Security Agency, London, United Kingdom
Conall Watson
Roles: Validation, Writing – review & editing
Affiliation: Immunisation and Vaccine Preventable Diseases Division, UK Health Security Agency, London, United Kingdom
Alex J. Elliot
Roles: Conceptualization, Supervision, Validation, Writing – review & editing
Affiliation: Real-time Syndromic Surveillance Team, Field Services, Health Protection Operations, UK Health Security Agency, Birmingham, United Kingdom
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
1. Scobie S. Snowed under: understanding the effects of winter on the NHS https://www.nuffieldtrust.org.uk/resource/snowed-under-understanding-the-effects-of-winter-on-the-nhs accessed 2 April 2023.
2. Li Y, Wang X, Blau DM, Caballero MT, Feikin DR, Gill CJ et al. Global, regional, and national disease burden estimates of acute lower respiratory infections due to respiratory syncytial virus in children younger than 5 years in 2019: a systematic analysis. Lancet 2022, 399(10340):2047–64. Epub 20220519. pmid:35598608.
3. Obando-Pacheco P, Justicia-Grande AJ, Rivero-Calle I, Rodriguez-Tenreiro C, Sly P, Ramilo O et al. Respiratory syncytial virus seasonality: a global overview. J Infect Dis 2018, 217(9):1356–64. Epub 20181104. pmid:29390105.
4. Zhang S, Akmar LZ, Bailey F, Rath BA, Alchikh M, Schweiger B et al. Cost of respiratory syncytial virus-associated acute lower respiratory infection management in young children at the regional and global level: a systematic review and meta-analysis. J Infect Dis 2020, 222(Suppl 7):S680–S87. Epub 20201007. pmid:32227101.
5. Reeves RM, Hardelid P, Gilbert R, Warburton F, Ellis J, Pebody RG. Estimating the burden of respiratory syncytial virus (RSV) on respiratory hospital admissions in children less than five years of age in England, 2007–2012. Influenza Other Respir Viruses 2017, 11(2):122–29. Epub 20170121. pmid:28058797.
6. UK Health Security Agency. Syndromic surveillance: systems and analyses https://www.gov.uk/government/collections/syndromic-surveillance-systems-and-analyses accessed 10 May 2023.
7. Bardsley M, Morbey RA, Hughes HE, Beck CR, Watson CH, Zhao H et al. Epidemiology of respiratory syncytial virus in children younger than 5 years in England during the COVID-19 pandemic, measured by laboratory, clinical, and syndromic surveillance: a retrospective observational study. Lancet Infect Dis 2023, 23(1):56–66. Epub 20220906. pmid:36063828.
8. Hughes HE, Morbey R, Hughes TC, Locker TE, Pebody R, Green HK et al. Emergency department syndromic surveillance providing early warning of seasonal respiratory activity in England. Epidemiol Infect 2016, 144(5):1052–64. pmid:26415918.
9. Morbey RA, Harcourt S, Pebody R, Zambon M, Hutchison J, Rutter J et al. The burden of seasonal respiratory infections on a national telehealth service in England. Epidemiol Infect 2017, 145:1922–32. pmid:28413995.
10. Buckingham-Jeffery E, Morbey R, House T, Elliot AJ, Harcourt S, Smith GE. Correcting for day of the week and public holiday effects: improving a national daily syndromic surveillance service for detecting public health threats. BMC Public Health 2017, 17(1):477. pmid:28525991.
11. Morbey R, Todkill D, DeAngelis D, Charlett A, Elliot A. DiD IT?: A differences-in-differences investigation tool to quantify the impact of local incidents on public health using real-time syndromic surveillance health data. Epidemiol Infect 2023, 151:e56. Epub 20230315. pmid:36919204.
12. Morbey RA, Todkill D, Watson C, Elliot AJ. Machine learning forecasts for seasonal epidemic peaks: Lessons learnt from an atypical respiratory syncytial virus season. PLoS ONE 2023, 18(9):e0291932. Epub 20230922. pmid:37738241.
13. Nelder JA, Wedderburn RWM. Generalized Linear Models. J R Stat Soc Ser A 1972, 135(3):370–84.
14. UK Health Security Agency. Respiratory infections: laboratory reports 2022 https://www.gov.uk/government/publications/respiratory-infections-laboratory-reports-2022 accessed 29 November 2024.
15. Goddard NL, Cooke MC, Gupta RK, Nguyen-Van-Tam JS. Timing of monoclonal antibody for seasonal RSV prophylaxis in the United Kingdom. Epidemiol Infect 2007, 135(1):159–62. Epub 20060607. pmid:16753078.
16. Fleming DM, Taylor RJ, Lustig RL, Schuck-Paim C, Haguinet F, Webb DJ et al. Modelling estimates of the burden of respiratory syncytial virus infection in adults and the elderly in the United Kingdom. BMC Infect Dis 2015, 15:443. pmid:26497750.
17. Hardelid P, Pebody R, Andrews N. Mortality caused by influenza and respiratory syncytial virus by age group in England and Wales 1999–2010. Influenza Other Respir Viruses 2013, 7(1):35–45. pmid:22405488.
18. Morbey RA, Elliot AJ, Charlett A, Verlander NQ, Andrews N, Smith GE. The application of a novel ’rising activity, multi-level mixed effects, indicator emphasis’ (RAMMIE) method for syndromic surveillance in England. Bioinformatics 2015, 31(22):3660–65. Epub 20151115. pmid:26198105.
19. Guy R, Henderson KL, Coelho J, Hughes H, Mason EL, Gerver SM et al. Increase in invasive group A streptococcal infection notifications, England, 2022. Eurosurveillance 2023, 28(1):pii = 2200942. pmid:36695450.
20. UK Health Security Agency. Group A streptococcal infections: activity during the 2022 to 2023 season https://www.gov.uk/government/publications/group-a-streptococcal-infections-activity-during-the-2022-to-2023-season/group-a-streptococcal-infections-12th-update-on-seasonal-activity-in-england accessed 31 March 2023.
21. Nikhab A, Morbey R, Todkill D, Elliot AJ. Using a novel ‘difference-in-differences’ method and syndromic surveillance to estimate the change in local healthcare utilisation during periods of media reporting in the early stages of the COVID-19 pandemic in England. Public Health 2024, 232:132–37. pmid:38776588.
22. Do TD, Nguyen TD, Ta VC, Anh DT, Tran Thi T-H, Phan D, Mai ST. Dynamic weighted ensemble for diarrhoea incidence predictions. Mach Learn 2023, 113:2129–52.
23. Castro Blanco E, Dalmau Llorca MR, Aguilar Martín C, Carrasco-Querol N, Gonçalves AQ, Hernández Rojas Z et al. A predictive model of the start of annual influenza epidemics. Microorganisms 2024, 12(7):1257. pmid:39065025.
24. Su K, Xu L, Li G, Ruan X, Li X, Deng P et al. Forecasting influenza activity using self-adaptive AI model and multi-source data in Chongqing, China. EBioMedicine 2019, 47:284–92. Epub 20190830. pmid:31477561.
25. Andronico A, Paireau J, Cauchemez S. Integrating information from historical data into mechanistic models for influenza forecasting. PLoS Comput Biol 2024, 20(10):e1012523. pmid:39475955.
26. Reich NG, Brooks LC, Fox SJ, Kandula S, McGowan CJ, Moore E et al. A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States. Proc Natl Acad Sci U S A 2019, 116(8):3146–54. Epub 20190117. pmid:30647115.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Morbey et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
During winter months, there is increased pressure on health care systems in temperature climates due to seasonal increases in respiratory illnesses. Providing real-time short-term forecasts of the demand for health care services helps managers plan their services. During the Winter of 2022–23 we piloted a new forecasting pipeline, using existing surveillance indicators which are sensitive to increases in respiratory syncytial virus (RSV). Indicators including telehealth cough calls and emergency department (ED) bronchiolitis attendances, both in children under 5 years. We utilised machine learning techniques to train and select models that would best forecast the timing and intensity of peaks up to 28 days ahead. Forecast uncertainty was modelled usings a novel generalised additive model for location, scale and shape (gamlss) approach which enabled prediction intervals to vary according to the level of the forecast activity. The winter of 2022–23 was atypical because the demand for healthcare services in children was exceptionally high, due to RSV circulating in the community and increased concerns around invasive group A streptococcal (iGAS) infections. However, our short-term forecasts proved to be adaptive forecasting a new higher peak once the increasing demand due to iGAS started. Thus, we have demonstrated the utility of our approach, adding forecasts to existing surveillance systems.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer