Return to search

A Machine-Learning Based Approach to Predicting Waterborne Disease Outbreaks Caused by Hurricanes

Climate change is increasing the frequency and intensity of (extra-) tropical cyclones including hurricanes and winter storms worldwide. Waterborne diseases, resulting from flood-related impacts, affect public health and are of major concern for society. Previous research studies have highlighted a statistically significant linear correlation between waterborne diseases and climate variables, especially precipitation and temperature. However, to the best of our knowledge, no studies have explored nonlinear models (e.g., machine learning) to predict waterborne disease outbreaks in the aftermath of hurricanes and winter storms. Here, we aim at predicting waterborne disease counts as well as disease outbreaks using historic climate demographic, and public health data of Florida, U.S. that date back to 1992. For this, we first predicted diseases in aggregated coastal counties using multiple linear (MLR) and random forest regression (RFR) models. Then, we developed a binary random forest classifier (RFC) model to predict waterborne disease outbreaks (e.g., 0: no outbreak and 1: outbreak). Results of this study showed that the highest coefficient of determination (R2) for the MLR model was 0.65 for two aggregated county groups, namely St. Johns-Duval-Nassau and Sarasota-Charlotte-Lee. The RFR model showed the highest R2 of 0.69 for the county group Sarasota-Charlotte-Lee. The highest Root Mean Square Error (RMSE) was found for the county group Miami Dade-Broward- Palm Beach with a value of 15 and 16 people for both the MLR and RFR models. St. Johns-Duval-Nassau and Sarasota-Charlotte-Lee groups achieved the highest Kling-Gupta Efficiency (KGE) of 0.76 for the MLR model. Sarasota-Charlotte-Lee also performed the best in terms of KGE for the RFR model with a score of 0.69. On the other hand, the binary RFC model for Pinellas-Hillsborough-Manatee achieved a model's accuracy of 0.93 and f1-score of 0.48. We anticipate that the models' performance can substantially be improved with access to higher spatial resolution climate data as well as longer demographic and public health records. Nevertheless, we here provide a solid methodology that can inform local authorities about imminent public health impacts and mitigate negative effects on society, economy, and environment. / Master of Science / Climate change is increasing the frequency and intensity of tropical storms, which include hurricanes and winter storms worldwide. Extreme weather events have been shown to increase the risk of waterborne disease outbreaks (i.e. diseases that are transmitted by water), especially due to increased flooding. Previous studies showed a correlation between climate factors, such as precipitation and temperature, and waterborne diseases, but no concrete models have been developed to predict these outbreaks. Advanced prediction models can help predict where disease outbreaks are most likely to occur and can help in preparing for and mitigating the severity of these outbreaks to help save lives, protect the environment, and reduce the damage done to infrastructure. Our research focused on developing a model framework using climate and demographic data from coastal Florida counties dating back to 1992 to predict Salmonellosis, a common waterborne bacterial infection, after a hurricane event. We created two regression models, one a multiple linear regression (MLR) and the other a random forest regression (RFR) to predict the number of Salmonellosis cases. Additionally, we created a random forest classifier model (RFC) to predict whether an outbreak would occur. After running analyses for these three models on groups of three counties, we found that the MLR and RFR showed similar accuracies at predicting cases, with the MLR performing slightly better for most counties. For the Sarasota-Charlotte-Lee county group, the RFR performed the best. The RFC model performed the best with the highest accuracy of 93% for Pinellas-Hillsborough-Manatee. Future improvements can help make these models more reliable, such as using better and more data, along with adding more variables.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/119552
Date27 June 2024
CreatorsMansky, Christopher Immanuel
ContributorsCivil and Environmental Engineering, Munoz Pauta, David Fernando, Young, Kevin David, Shealy, Earl Wade
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.002 seconds