Abstract

Responsive, informed public health policy decisions for infectious diseases rely on understanding pathogen and population dynamics. Modeling that utilizes reported case data, such as the number of positive Covid diagnostic tests, is susceptible to biases due to unrepresentative sampling. Phylodynamics is an alternative modeling framework that uses pathogen genetic sequences from infected individuals to infer population dynamics, and can be more resilient to sampling biases. Real-time phylodynamic analysis, where population dynamics are inferred up to the present day, suffers from a missing data problem due to reporting delays, the time between when a sample is collected and when the pathogen genetic sequence obtained from the sample is available for analysis. Reporting delay length varies across time and space, but it can commonly take weeks to months for a sample to be reported, resulting in right truncation of recently collected samples which are not reported, and therefore unobserved by the time of analysis. Intuitively the bias induced from the missing data will increase in severity for estimates closer to present day. This is an even larger concern when the model assumes the frequency at which samples are collected is related to the number of infections in the population. In chapter 3 of this dissertation we propose a method that accounts for reporting delays in phylodynamic analyses by using available information on historic delays. In chapter 4 we explore different methods of inferring reporting delays from historical delay data. Both chapters test the theory with simulations and apply the methodology to real SARS-CoV-2 data. Chapter 5 shifts to an educational study centered around using real data with real context and giving students autonomy to choose the context they explore through homework assignments. In this study we explore if there is a difference in homework grades associated with students having a choice and how students perceive having a choice of data context. We also investigate characteristics of data contexts that led students to choose one data context over the other options. From these findings we make recommendations for educators of statistics and data science about the use of real data with real context and present the benefits and drawbacks of providing students with a choice.

Details

Title
Accounting for Reporting Delays in Real-Time Infectious Disease Analyses and Engaging Students Through Choice of Data Context in Statistics and Data Science Education
Author
Medina, Catalina
Publication year
2025
Publisher
ProQuest Dissertations & Theses
ISBN
9798291542668
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
3241424008
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.