Harmful Algae Blooms (HABs) in inland waterbodies (e.g., lakes and ponds) pose serious threat to human health and natural ecosystem. Thus, it is imperative to assess HABs and their potential triggering factors over broader spatiotemporal scales. This study utilizes Chlorophyll-a (Chl-a) concentration in water samples collected from lakes in Illinois as an indirect measurement of HABs. The major objectives were to assess the spatiotemporal pattern of HABs over Illinois regions in recent decades, and to examine different machine learning models for predicting the Chl-a concentration based on publicly available water quality datasets. The Chl-a dataset was compiled from two different sources, the regular monitoring program by Illinois Environmental Protection Agency (IEPA) and the Voluntary Lake Monitoring Program (VLMP), the latter of which was primarily collected by citizen participants. Seven environmental and water quality zones were selected for spatial analyses. Additionally, the temporal patterns were assessed using time-series decomposition of monthly Chl-a concentration datasets. The machine learning pipeline includes two tasks: a regression modeling task for predicting Chl-a concentration, and a classification task for estimating lake trophic status. Different meteorological, land use and land cover, and lake morphometry variables were used as independent variables. Four regression models, i.e., Partial Least Squares Regression (PLSR), Support Vector Machine Regression (SVR), Artificial Neural Network Regression (ANNR), and Random Forest Regression (RFR) were used for the first task of the modeling pipeline, and four classification models, i.e., Logistic Regression Classification (LRC), Support Vector Machine Classification (SVC), Artificial Neural Network Classification (ANNC), and Random Forest Classification (RFC), were used for the second task. Results indicate that: a) the Collinsville region in southwestern part of Illinois exhibited higher mean concentration of Chl-a in its lakes than any other regions from 1998 to 2018; b) the lakes that showed increasing trends in their monthly mean Chl-a concentrations were also clustered in the southwestern region; c) Random Forest outperformed all other models in both classification (Accuracy=60.06%) and regression (R2=38.88%); and d) the land use and land cover variables were found as the most important set of variables in Random Forest models.
Identifer | oai:union.ndltd.org:siu.edu/oai:opensiuc.lib.siu.edu:theses-3887 |
Date | 01 September 2021 |
Creators | Sarkar, Supria |
Publisher | OpenSIUC |
Source Sets | Southern Illinois University Carbondale |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses |
Page generated in 0.0017 seconds