Return to search

Physics Guided Machine Learning algorithm for MAX-DOAS retrieval

Multi Axis Differential Optical Absorption Spectroscopy (MAX-DOAS) is a passive remote sensing technique that has been widely used to derive aerosol extinction coefficient profiles and trace gas concentrations. The ill-posed nature of the MAX-DOAS inversion problem makes it almost impossible to design an inversion algorithm providing a definite solution. A possible way to find a low-error inversion algorithm is incorporating the machine learning (ML) technique into the MAX-DOAS retrieval.
This dissertation serves as the author's exploration of designing such an ML-based inversion algorithm. The inversion problem is formulated as a supervised learning problem and the ML models are trained on synthetic datasets simulated by radiative transfer models.newline By starting with a feasibility study, it is first shown that a ML model with appropriate architecture (CNN+LSTM) is capable of extracting aerosol extinction coefficient profile, single scattering albedo and asymmetry factor from one MAX-DOAS scan.
Then more realistic atmosphere states were used for generating the training set. Due to the high time cost of radiative transfer simulations, a data augmentation strategy was put forward to increase the number of samples in the training set. A physics-guided machine learning (PGML) algorithm was designed to retrieve aerosol information and trace gas concentrations simultaneously. The model is named as PGML model because: (1) its prediction is based on the physical laws it has learnt from the radiative transfer simulations and (2) introduction of the physical constraints and the pseudo-inverse layer.
The PGML model was tested on both a synthetic test set and real MAX-DOAS measurements from Pandora instruments. Evaluation on the synthetic dataset suggests that with similar data distribution, the PGML model is capable of retrieving aerosol extinction coefficient profile, trace gas concentration profile and the box-AMFs with good accuracy. Validation on real data was done via comparisons with inversion results given by other algorithms. Generally, moderate linear correlation were found between the inversion results. Limitation of current version of the PGML model and factors might lead to the discrepancies between inversion results given by the PGML model and other algorithms were discussed. / Doctor of Philosophy / Multi Axis Differential Optical Absorption Spectroscopy (MAX-DOAS) is a passive remote sensing technique for deriving aerosol and trace gas information in the lower atmosphere. A MAX-DOAS instrument is a ground-based system consists of a scanning telescope, a stepping motor and a spectrometer. It collects scattered solar photons at multiple elevation angles. And from spectrum analysis and inversion algorithms, aerosol properties such as aerosol extinction coefficient profile (a vertical profile describing how much the solar radiation is weakened by the atmosphere), single scattering albedo (the ratio of scattered light to incoming light) and trace gas concentrations can be retrieved. The ill-posed nature of the MAX-DOAS inversion problem makes it almost impossible to design an inversion algorithm providing a definite solution. A possible way to find a low-error inversion algorithm is incorporating the machine learning (ML) technique into the MAX-DOAS retrieval.
This dissertation serves as the author's exploration of designing such an ML-based inversion algorithm. The inversion problem is formulated as a supervised learning problem. In supervised learning, a training set is used to teach the ML model to yield the desired output. And the ML models are trained on synthetic datasets simulated by radiative transfer models for two reasons: (1) There is no reliable dataset combining real MAX-DOAS measurements and observations of aerosol properties (macrophysical properties and aerosol extinction coefficient profiles) and trace gas concentrations. (2) Most of the existing algorithms somewhat rely on empirical knowledge (e.g.: a priori information (optimal estimation methods), introduction of parameters for representing the state vector (parameterized retrieval algorithms)). However, the method purely relies on the rules it has learned from the training set. By using simulated data, it is expected that the ML model to capture the radiative transfer theory and give predictions based on the physical laws.newline By starting with a feasibility study, it is first shown that by applying a machine learning model with appropriate architecture (combination of convolutional layers and long short-term memory layer), it is possible to extract aerosol extinction coefficient profile, single scattering albedo and asymmetry factor from one MAX-DOAS scan. And this architecture is capable of retrieving elevated layers of aerosol extinction coefficient profiles.
Then more realistic atmosphere states were used for generating the training set and designed a physics-guided machine learning (PGML) model to retrieve aerosol information and trace gas concentrations simultaneously. The model is named as PGML model because: (1) its prediction is based on the physical laws it has learnt from the radiative transfer simulations and (2) introduction of the physical constraints and the pseudo-inverse layer. Due to the high time cost of running radiative transfer simulations, a data augmentation strategy was put forward to increase the number of samples in the training set.
The PGML model was tested on both a synthetic test set and real MAX-DOAS measurements from Pandora instruments. Evaluation on the synthetic dataset suggests that with similar data distribution, the PGML model is capable of retrieving aerosol extinction coefficient profile, trace gas concentration profile and the box-AMFs with good accuracy. Validation on real data was done via comparisons with inversion results given by other algorithms. Generally, moderate linear correlation were found between the inversion results. Limitation of current version of the PGML model and factors might lead to the discrepancies between inversion results given by the PGML model and other algorithms were discussed.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/113251
Date18 January 2023
CreatorsDong, Yun
ContributorsElectrical Engineering, Lind, Elena Spinei, Bailey, Scott M., Jia, Xiaoting, England, Scott L., Karpatne, Anuj
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.002 seconds