Spelling suggestions: "subject:"misguided cachine 1earning"" "subject:"misguided cachine c1earning""
1 |
Physics Guided Machine Learning algorithm for MAX-DOAS retrievalDong, Yun 18 January 2023 (has links)
Multi Axis Differential Optical Absorption Spectroscopy (MAX-DOAS) is a passive remote sensing technique that has been widely used to derive aerosol extinction coefficient profiles and trace gas concentrations. The ill-posed nature of the MAX-DOAS inversion problem makes it almost impossible to design an inversion algorithm providing a definite solution. A possible way to find a low-error inversion algorithm is incorporating the machine learning (ML) technique into the MAX-DOAS retrieval.
This dissertation serves as the author's exploration of designing such an ML-based inversion algorithm. The inversion problem is formulated as a supervised learning problem and the ML models are trained on synthetic datasets simulated by radiative transfer models.newline By starting with a feasibility study, it is first shown that a ML model with appropriate architecture (CNN+LSTM) is capable of extracting aerosol extinction coefficient profile, single scattering albedo and asymmetry factor from one MAX-DOAS scan.
Then more realistic atmosphere states were used for generating the training set. Due to the high time cost of radiative transfer simulations, a data augmentation strategy was put forward to increase the number of samples in the training set. A physics-guided machine learning (PGML) algorithm was designed to retrieve aerosol information and trace gas concentrations simultaneously. The model is named as PGML model because: (1) its prediction is based on the physical laws it has learnt from the radiative transfer simulations and (2) introduction of the physical constraints and the pseudo-inverse layer.
The PGML model was tested on both a synthetic test set and real MAX-DOAS measurements from Pandora instruments. Evaluation on the synthetic dataset suggests that with similar data distribution, the PGML model is capable of retrieving aerosol extinction coefficient profile, trace gas concentration profile and the box-AMFs with good accuracy. Validation on real data was done via comparisons with inversion results given by other algorithms. Generally, moderate linear correlation were found between the inversion results. Limitation of current version of the PGML model and factors might lead to the discrepancies between inversion results given by the PGML model and other algorithms were discussed. / Doctor of Philosophy / Multi Axis Differential Optical Absorption Spectroscopy (MAX-DOAS) is a passive remote sensing technique for deriving aerosol and trace gas information in the lower atmosphere. A MAX-DOAS instrument is a ground-based system consists of a scanning telescope, a stepping motor and a spectrometer. It collects scattered solar photons at multiple elevation angles. And from spectrum analysis and inversion algorithms, aerosol properties such as aerosol extinction coefficient profile (a vertical profile describing how much the solar radiation is weakened by the atmosphere), single scattering albedo (the ratio of scattered light to incoming light) and trace gas concentrations can be retrieved. The ill-posed nature of the MAX-DOAS inversion problem makes it almost impossible to design an inversion algorithm providing a definite solution. A possible way to find a low-error inversion algorithm is incorporating the machine learning (ML) technique into the MAX-DOAS retrieval.
This dissertation serves as the author's exploration of designing such an ML-based inversion algorithm. The inversion problem is formulated as a supervised learning problem. In supervised learning, a training set is used to teach the ML model to yield the desired output. And the ML models are trained on synthetic datasets simulated by radiative transfer models for two reasons: (1) There is no reliable dataset combining real MAX-DOAS measurements and observations of aerosol properties (macrophysical properties and aerosol extinction coefficient profiles) and trace gas concentrations. (2) Most of the existing algorithms somewhat rely on empirical knowledge (e.g.: a priori information (optimal estimation methods), introduction of parameters for representing the state vector (parameterized retrieval algorithms)). However, the method purely relies on the rules it has learned from the training set. By using simulated data, it is expected that the ML model to capture the radiative transfer theory and give predictions based on the physical laws.newline By starting with a feasibility study, it is first shown that by applying a machine learning model with appropriate architecture (combination of convolutional layers and long short-term memory layer), it is possible to extract aerosol extinction coefficient profile, single scattering albedo and asymmetry factor from one MAX-DOAS scan. And this architecture is capable of retrieving elevated layers of aerosol extinction coefficient profiles.
Then more realistic atmosphere states were used for generating the training set and designed a physics-guided machine learning (PGML) model to retrieve aerosol information and trace gas concentrations simultaneously. The model is named as PGML model because: (1) its prediction is based on the physical laws it has learnt from the radiative transfer simulations and (2) introduction of the physical constraints and the pseudo-inverse layer. Due to the high time cost of running radiative transfer simulations, a data augmentation strategy was put forward to increase the number of samples in the training set.
The PGML model was tested on both a synthetic test set and real MAX-DOAS measurements from Pandora instruments. Evaluation on the synthetic dataset suggests that with similar data distribution, the PGML model is capable of retrieving aerosol extinction coefficient profile, trace gas concentration profile and the box-AMFs with good accuracy. Validation on real data was done via comparisons with inversion results given by other algorithms. Generally, moderate linear correlation were found between the inversion results. Limitation of current version of the PGML model and factors might lead to the discrepancies between inversion results given by the PGML model and other algorithms were discussed.
|
2 |
Science Guided Machine Learning: Incorporating Scientific Domain Knowledge for Learning Under Data Paucity and Noisy ContextsMuralidhar, Nikhil 18 August 2022 (has links)
In recent years, the large amount of labeled data available has helped tend machine learning (ML) research toward using purely data driven end-to-end pipelines, e.g., in deep neural network research. However, in many situations, data is limited and of poor quality. Traditional ML pipelines are known to be susceptible to various issues when trained on low volumes of non-representative, noisy datasets. We investigate the question of whether prior domain knowledge about the problem being modeled can be employed within the ML pipeline to improve model performance under data paucity and in noisy contexts? This report presents recent developments as well as details, novel contributions in the context of incorporating prior domain knowledge in various data-driven modeling (i.e., machine learning - ML) pipelines particularly geared towards scientific applications. Such domain knowledge exists in various forms and can be incorporated into the machine learning pipeline using different implicit and explicit methods (termed: science-guided machine learning (SGML)). All the novel techniques proposed in this report have been presented in the context of developing SGML to model fluid dynamics applications, but can be easily generalized to other applications. Specifically, we present SGML pipelines to (i) incorporate prior domain knowledge into the ML model architecture (ii) incorporate knowledge about the distribution of the target process as statistical priors for improved prediction performance (iii) leverage prior knowledge to quantify consistency of ML decisions with scientific principles (iv) explicitly incorporate known mathematical relationships of scientific phenomena to influence the ML pipeline (v) develop science-guided transfer learning to improve performance under data paucity. Each technique that is presented, has been designed with a focus on simplicity and minimal cost of implementation with a goal of yielding significant improvements in model performance especially under low data volumes or under noisy data conditions. In each application, we demonstrate through rigorous qualitative and quantitative experiments that our SGML pipelines achieve significant improvements in performance and interpretability over corresponding models that are purely data-driven and agnostic to scientific knowledge. / Doctor of Philosophy / In this work, we present techniques for incorporating scientific knowledge into machine learning (ML) pipelines. We demonstrate these techniques with ML models trained with low data volumes as well as with non-representative, noisy datasets. In both these cases, we demonstrate through rigorous experimentation that incorporating scientific domain knowledge into the ML pipeline using our proposed science guided machine learning (SGML) techniques, leads to significant performance improvement.
|
3 |
Achieving More with Less: Learning Generalizable Neural Networks With Less Labeled Data and Computational OverheadsBu, Jie 15 March 2023 (has links)
Recent advancements in deep learning have demonstrated its incredible ability to learn generalizable patterns and relationships automatically from data in a number of mainstream applications. However, the generalization power of deep learning methods largely comes at the costs of working with very large datasets and using highly compute-intensive models. Many applications cannot afford these costs needed to ensure generalizability of deep learning models. For instance, obtaining labeled data can be costly in scientific applications, and using large models may not be feasible in resource-constrained environments involving portable devices. This dissertation aims to improve efficiency in machine learning by exploring different ways to learn generalizable neural networks that require less labeled data and computational resources. We demonstrate that using physics supervision in scientific problems can reduce the need for labeled data, thereby improving data efficiency without compromising model generalizability. Additionally, we investigate the potential of transfer learning powered by transformers in scientific applications as a promising direction for further improving data efficiency. On the computational efficiency side, we present two efforts for increasing parameter efficiency of neural networks through novel architectures and structured network pruning. / Doctor of Philosophy / Deep learning is a powerful technique that can help us solve complex problems, but it often requires a lot of data and resources. This research aims to make deep learning more efficient, so it can be applied in more situations. We propose ways to make the deep learning models require less data and less computer power. For example, we leverage the physics rules as additional information for training the neural network to learn from less labeled data and we use a technique called transfer learning to leverage knowledge from data that is from other distribution. Transfer learning may allow us to further reduce the need for labeled data in scientific applications. We also look at ways to make the deep learning models use less computational resources, by effectively reducing their sizes via novel architectures or pruning out redundant structures.
|
4 |
Towards Green AI: Cost-Efficient Deep Learning using Domain KnowledgeSrivastava, Sangeeta 12 August 2022 (has links)
No description available.
|
Page generated in 0.0957 seconds