The question “who infected whom” is a perennial one in the study of infectious disease dynamics. To understand characteristics of infectious diseases such as how many people will one case produce over the course of infection (the reproductive number), how much time between the infection of two connected cases (the generation interval), and what factors are associated with transmission, one must ascertain who infected whom. The current best practices for linking cases are contact investigations and pathogen whole genome sequencing (WGS). However, these data sources cannot perfectly link cases, are expensive to obtain, and are often not available for all cases in a study. This lack of discriminatory data limits the use of established methods in many existing infectious disease datasets.
We developed a method to estimate the relative probability of direct transmission between any two infectious disease cases. We used a subset of cases that have pathogen WGS or contact investigation data to train a model and then used demographic, spatial, clinical, and temporal data to predict the relative transmission probabilities for all case-pairs using a simple machine learning algorithm called naive Bayes. We adapted existing methods to estimate the reproductive number and generation interval to use these probabilities. Finally, we explored the associations between various covariates and transmission and how they related to the associations between covariates and pathogen genetic relatedness. We applied these methods to a tuberculosis outbreak in Hamburg, Germany and to surveillance data in Massachusetts, USA.
Through simulations we found that our estimated transmission probabilities accurately classified pairs as links and nonlinks and were able to accurately estimate the reproductive number and the generation interval. We also found that the association between covariates and genetic relatedness captures the direction but not absolute magnitude of the association between covariates and transmission, but the bias was improved by using effect estimates from the naive Bayes algorithm. The methods developed in this dissertation can be used to explore transmission dynamics and estimate infectious disease parameters in established datasets where this was not previously feasible because of a lack of highly discriminatory information, and therefore expand our understanding of many infectious diseases.
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/41503 |
Date | 06 October 2020 |
Creators | Leavitt, Sarah Van Ness |
Contributors | White, Laura F., Jenkins, Helen E. |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Page generated in 0.002 seconds