Return to search

Using Mobile Monitoring and Vehicle Emissions to Develop and Validate Machine Learning Empirical Models of Particulate Air Pollution

Increasing levels of air pollution are prompting researchers to develop more reliable air pollution modeling approaches in order to protect the public and the environment from toxic contaminants and airborne pathogens. Although land use regression has long been used to assess exposure to air pollution, researchers are increasingly using machine learning algorithms to quantify the concentration of harmful pollutants—for this study black carbon (BC) and particle number (PN). Additionally, researchers are moving away from using fixed-site data in favor of using mobile monitoring data in a variety of locations to develop hourly empirical models of particulate air pollution.
This study uses secondary data describing BC and PN pollutant levels, which are obtained from roads that bikers share in the more rural location of Blacksburg (VA). Machine learning (ML) algorithms are then built to develop accurate and reliable short-term empirical prediction models. Different pre-processing methods for the mobile monitoring data and various input variables are tested to assess how ML can be used effectively in this process. Three types of time-average models are developed (daytime, hourly average, and one second models). Various combinations of spatial and temporal input variables are used in the short-term models. The impact of adding more spatiotemporal variables (e.g., emissions) to machine learning models to improve model performance is assessed in the short-term models. Incorporating spatial and temporal autocorrelation is intended to develop more sophisticated validation approaches for identifying ML performance patterns—the goal of which is to predict concentration levels more accurately in comparison to using raw data without data reprocessing. The results show that the model developed using refined disaggregated data is able to detect the spatial distribution of the pollutant concentration at equivalent levels as the smoothed data models, although the latter display fewer errors. The performance of the short-term model including all variables is equivalent to the model omitting emissions. The ML results are compared to earlier stepwise regression model results, suggesting that ML has the ability to improve both long-term and short-term model accuracy.
Our findings indicate that ML demonstrates higher predictive capacity in comparison to stepwise regression. The results from this study may be useful in enhancing the performance of ML through the incorporation of different data preprocessing tasks, as well as showing how different input variables contribute to the ML modeling process. The findings from this study could be used toward the development of environmental/eco-friendly routes that would decrease the risk for exposure to harmful vehicle-related emissions. / Doctor of Philosophy / Air pollution is a major environmental threat to human health, claiming the lives of millions of people each year, primarily as a result of fine particulate matter entering the respiratory system. As such, it is important to develop reliable and accurate air pollution modeling approaches in order to protect the public and the environment from toxic contaminants and pathogens in the air. Although an approach known as land use regression has long been used to assess exposure to air pollution, researchers are increasingly using machine learning (ML) algorithms to quantify the concentration of harmful pollutants—for this study black carbon and particle number, which is a generic assessment that captures a number of known airborne hazards. Additionally, researchers are moving away from using fixed-site data in favor of using mobile monitoring data in a variety of locations to develop hourly empirical models of particulate air pollution.
In this study, machine learning algorithms are developed using secondary data collected from roads that bikers share, which are representative of pollution levels of particle number and black carbon in the more rural location of Blacksburg (VA), in order to develop accurate and reliable short-term empirical prediction models. Different pre-processing methods of the mobile monitoring data and various input variables are tested to assess how machine learning can be efficiently used in this process. Our findings indicate that machine learning demonstrates higher predictive capacity in comparison to stepwise regression. The results from this study are expected to be useful in enhancing the performance of machine learning through the incorporation of different data preprocessing tasks, as well as how different input variables contribute to the machine learning modeling process. The findings from this study could assist transportation planners and other stakeholders better assess pollution risks for bike riders and pedestrians. As such, this study's findings could be used toward the development of environmental/eco-friendly routes that would decrease the risk for exposure to harmful vehicle-related emissions.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/113757
Date18 August 2021
CreatorsAlazmi, Asmaa Salem
ContributorsCivil and Environmental Engineering, Rakha, Hesham A., Hankey, Steven C., Marr, Linsey C., Zhang, Wenwen
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0026 seconds