Global ETD Search

241	Development and Utilization of Big Bridge Data for Predicting Deck Condition Rating Using Machine Learning Algorithms Fard, Fariba 05 1900 (has links) Accurately predicting the deck condition rating of a bridge is crucial for effective maintenance and repair planning. Despite significant research efforts to develop deterioration models, a nationwide model has not been developed. This study aims to identify an appropriate machine learning (ML) algorithm that can accurately predict the deck condition ratings of the nation's bridges. To achieve this, the study collected big bridge data (BBD), which includes NBI, traffic, climate, and hazard data gathered using geospatial information science (GIS) and remote sensing techniques. Two sets of data were collected: a BBD for a single year of 2020 and a historical BBD covering a five-year period from 2016 to 2020. Three ML algorithms, including random forest, eXtreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN), were trained using 319,404 and 1,246,261 bridge decks in the BBD and the historical BBD, respectively. Results showed that the use of historical BBD significantly improved the performance of the models compared to BBD. Additionally, random forest and XGBoost, trained using the historical BBD, demonstrated higher overall accuracies and average F1 scores than the ANN model. Specifically, the random forest and XGBoost models achieved overall accuracies of 83.4% and 79.4%, respectively, and average F1 scores of 79.7% and 77.5%, respectively, while the ANN model achieved an overall accuracy of 58.8% and an average F1 score of 46.1%. The permutation-based variable importance revealed that the hazard data related to earthquakes did not significantly contribute to model development. In conclusion, tree-based ensemble learning algorithms, such as random forest and XGBoost, trained using updated historical bridge data, including NBI, traffic, and climate data, provide a useful tool for accurately predicting the deck condition ratings of bridges in the United States, allowing infrastructure managers to efficiently schedule inspections and allocate maintenance resources. Machine learning Big Bridge Data
242	Vision Approach for Position Estimation Using Moiré Patterns and Convolutional Neural Networks Alotaibi, Nawaf 05 1900 (has links) In order for a robot to operate autonomously in an environment, it must be able to locate itself within it. A robot's position and orientation cannot be directly measured by physical sensors, so estimating it is a non-trivial problem. Some sensors provide this information, such as the Global Navigation Satellite System (GNSS) and Motion capture (Mo-cap). Nevertheless, these sensors are expensive to set up, or they are not useful in environments where autonomous vehicles are often deployed. Our proposal explores a new approach to sensing for relative motion and position estimation. It consists of one vision sensor and a marker that utilizes moiré phenomenon to estimate the position of the vision sensor by using Convolutional Neural Networks (CNN) trained to estimate the position from the pattern shown on the marker. We share the process of data collection and training of the network and share the hyperparameter search method used to optimize the structure of the network. We test the trained network in a setup to evaluate its ability in estimating position. The system achieved an average absolute error of 1 cm, showcasing a method that could be used to overcome the current limitations of vision approaches in pose estimation. Vision Position Estimation Machine Learning
243	Investigating the Use of Convolutional Neural Networks for Prenatal Hydronephrosis Ultrasound Image Classification / Convolutional Neural Networks for Ultrasound Classification Smail, Lauren January 2018 (has links) Prenatal hydronephrosis is a common condition that involves the accumulation of urine with consequent dilatation of the collecting system in fetal infants. There are several hydronephrosis classifications, however, all grading systems suffer from reliability issues as they contain subjective criteria. The severity of hydronephrosis impacts treatment and follow up times and can therefore directly influence a patient’s well-being and quality of care. Considering the importance of accurate diagnosis, it is concerning that no accurate, reliable or objective grading system exists. We believe that developing a convolutional neural network (CNN) based diagnostic aid for hydronephrosis will improve physicians’ objectivity, inter-rater reliability and accuracy. Developing CNN based diagnostic aid for ultrasound images has not been done before. Therefore, the current thesis conducted two studies using a database of 4670 renal ultrasound images to investigate two important methodological considerations: ultrasound image preprocessing and model architecture. We first investigated whether image segmentation and textural extraction are beneficial and improve performance when they are applied to CNN input images. Our results showed that neither preprocessing technique improved performance, and therefore might not be required when using CNN for ultrasound image classification. Our search for an optimal architecture resulted in a model with 49% 5-way classification accuracy. Further investigation revealed that images in our database had been mislabelled, and thus impacted model training and testing. Although our current best model is not ready for use as diagnostic aid, it can be used to verify the accuracy of our labels. Overall, these studies have provided insight into developing a diagnostic aid for hydronephrosis. Once our images and their respective labels have been verified, we can further optimize our model architecture by conducting an exhaustive search. We hypothesize that these two changes will significantly improve model performance and bring our diagnostic aid closer to clinical application. / Thesis / Master of Science (MSc) / Prenatal hydronephrosis is a serious condition that affects the kidneys of fetal infants and is graded using renal ultrasound. The severity of hydronephrosis impacts treatment and follow-up times. However, all grading systems suffer from reliability issues. Improving diagnostic reliability is important for patient well-being. We believe that developing a computer-based diagnostic aid is a promising option to do so. We conducted two studies to investigate how ultrasound images should be processed, and how the algorithm that produces the functionality of the aid should be designed. We found that two common recommendations for ultrasound processing did not improve model performance and therefore need not be applied. Our best performing algorithm had a classification accuracy of 49%. However, we found that several images in our database were mislabelled, which impacted accuracy metrics. Once our images and their labels have been verified, we can further optimize our algorithm’s design to improve its accuracy. Machine learning Medical imaging Ultrasound
244	Toward Designing Active ORR Catalysts via Interpretable and Explainable Machine Learning Omidvar, Noushin 22 September 2022 (has links) The electrochemical oxygen reduction reaction (ORR) is a very important catalytic process that is directly used in carbon-free energy systems like fuel cells. However, the lack of active, stable, and cost-effective ORR cathode materials has been a major impediment to the broad adoption of these technologies. So, the challenge for researchers in catalysis is to find catalysts that are electrochemically efficient to drive the reaction, made of earth-abundant elements to lower material costs and allow scalability, and stable to make them last longer. The majority of commercial catalysts that are now being used have been found through trial and error techniques that rely on the chemical intuition of experts. This method of empirical discovery is, however, very challenging, slow, and complicated because the performance of the catalyst depends on a myriad of factors. Researchers have recently turned to machine learning (ML) to find and design heterogeneous catalysts faster with emerging catalysis databases. Black-box models make up a lot of the ML models that are used in the field to predict the properties of catalysts that are important to their performance, such as their adsorption energies to reaction intermediates. However, as these black-box models are based on very complicated mathematical formulas, it is very hard to figure out how they work and the underlying physics of the desired catalyst properties remains hidden. As a way to open up these black boxes and make them easier to understand, more attention is being paid to interpretable and explainable ML. This work aims to speed up the process of screening and optimizing Pt monolayer alloys for ORR while gaining physical insights. We use a theory-infused machine learning framework in combination with a high-throughput active screening approach to effectively find promising ORR Pt monolayer catalysts. Furthermore, an explainability game-theory approach is employed to find electronic factors that control surface reactivity. The novel insights in this study can provide new design strategies that could shape the paradigm of catalyst discovery. / Doctor of Philosophy / The electrochemical oxygen reduction reaction (ORR) is a very important catalytic process that is used directly in carbon-free energy systems like fuel cells. But the lack of ORR cathode materials that are active, stable, and cheap has made it hard for these technologies to be widely used. Most of the commercially used catalysts have been found through trial-and-error methods that rely on the chemical intuition of experts. This method of finding out through experience is hard, slow, and complicated, though, because the performance of the catalyst depends on a variety of factors. Researchers are now using machine learning (ML) and new catalysis databases to find and design heterogeneous catalysts faster. But because black-box ML models are based on very complicated mathematical formulas, it is very hard to figure out how they work, and the physics behind the desired catalyst properties remains hidden. In recent years, more attention has been paid to ML that can be understood and explained as a way to decode these "black boxes" and make them easier to understand. The goal of this work is to speed up the screening and optimization of Pt monolayer alloys for ORR. We find promising ORR Pt monolayer catalysts by using a machine learning framework that is based on theory and a high-throughput active screening method. A game-theory approach is also used to find the electronic factors that control surface reactivity. The new ideas in this study can lead to new ways of designing that could alter how researchers find catalysts. Catalysis Machine Learning Explainable AI
245	Accelerating Catalytic Materials Discovery for Sustainable Nitrogen Transformations by Interpretable Machine Learning Pillai, Hemanth Somarajan 12 January 2023 (has links) Computational chemistry and machine learning approaches are combined to understand the mechanisms, derive activity trends, and ultimately to search for active electrocatalysts for the electrochemical oxidation of ammonia (AOR) and nitrate reduction (NO3RR). Both re- actions play vital roles within the nitrogen cycle and have important applications within tackling current environmental issues. Mechanisms are studied through the use of density functional theory (DFT) for AOR and NO3RR, subsequently a descriptor based approach is used to understand activity trends on a wide range of electrocatalysts. For AOR inter- pretable machine learning is used in conjunction with active learning to screen for active and stable ternary electrocatalysts. We find Pt3RuCo, Pt3RuNi and Pt3RuFe show great activity, and are further validated via experimental results. By leveraging the advantages of the interpretible machine learning model we elucidate the underlying electronic factors for the stronger N binding which leads to the observed improved activity. For NO3RR an interpretible machine learning model is used to understand ways to bypass the stringent limitations put on the electrocatalytic activity due to the N vs NO3 scaling relations. It is found that the N binding energy can be tuned while leaving the NO3 binding energy unaffected by ensuring that the subsurface atom interacts strongly with the N. Based on this analysis we suggest the B2 CuPd as a potential active electrocatalyst for this reaction, which is further validated by experiments / Doctor of Philosophy / The chemical reactions that makeup the nitrogen cycle have played a pivotal role in human society, consider the fact that one of the most impactful achievements of the 20th century was the conversion of nitrogen (N2) to ammonia (NH3) via the Haber-Bosch process. The key class of materials to facilitate such transformations are called catalysts, which provide a reactive surface for the reaction to occur at reasonable reaction rates. Using quantum chemistry we can understand how various reactions proceed on the catalyst surface and how the catalyst can be designed to maximize the reaction rate. Specifically here we are interested in the electrochemical oxidation of ammonia (AOR) and reduction of nitrate (NO3RR), which have important energy and environmental applications. The atomistic insight provided by quantum chemistry helps us understand the reaction mechanism and key hurdles in developing new catalysts. Machine learning can then be leveraged in various ways to find novel catalysts. For AOR machine learning finds novel active catalysts from a diverse design space, which are then experimentally tested and verified. Through the use of our machine learning algorithm (TinNet) we also provide new insights into why the catalysts are more active, and suggest novel physics that can help design active catalysts. For NO3RR we use machine learning as a tool to help us understand the hurdles in catalyst design better which then guides our catalyst discovery. It is shown that CuPd could be a potential candidate and is also verified via experimental synthesis and performance testing. DFT Catalysis nitrogen machine learning
246	Coupling Computationally Expensive Radiative Hydrodynamic Simulations with Machine Learning for Graded Inner Shell Design Optimization in Double Shell Capsules Vazirani, Nomita Nirmal 29 December 2022 (has links) High energy density experiments rely heavily on predictive physics simulations in the design process. Specifically in inertial confinement fusion (ICF), predictive physics simulations, such as in the radiation-hydrodynamics code xRAGE, are computationally expensive, limiting the design process and ability to find an optimal design. Machine learning provides a mechanism to leverage expensive simulation data and alleviate limitations on computational time and resources in the search for an optimal design. Machine learning efficiently identifies regions of design space with high predicted performance as well as regions with high uncertainty to focus simulations, which may lead to unexpected designs with great potential. This dissertation focuses on the application of Bayesian optimization to design optimization for ICF experiments conducted by the double shell campaign at Los Alamos National Lab (LANL). The double shell campaign is interested in implementing graded inner shell layers to their capsule geometry. Graded inner shell layers are expected to improve stability in the implosions with fewer sharp density jumps, but at the cost of lower yields, in comparison to the nominal bilayer inner shell targets. This work explores minimizing hydrodynamic instability and maximizing yield for the graded inner shell targets by building and coupling a multi-fidelity Bayesian optimization framework with multi-dimensional xRAGE simulations for an improved design process. / Doctor of Philosophy / Inertial confinement fusion (ICF) is an active field of research in which a fuel is compressed to extreme temperatures and densities to achieve thermonuclear ignition. Ignition is achieved when the fuel can continuously heat itself and sustain its reactions. These fusion reactions would produce large amounts of energy. Power plants using fusion could solve many of the world's energy concerns with far less pollution than current energy sources. Although ignition has not been achieved in the lab, ICF researchers are actively working towards this goal. At Los Alamos National Lab (LANL), ICF researchers are focused on studying ignition-relevant conditions for "double shell" targets through experiments at laser facilities, such at the National Ignition Facility (NIF). These experiments are extremely expensive to field, design, and analyze. To obtain the maximum information from each experiment, researchers rely on predictive physics simulations, which are computationally intensive, making it difficult to find optimal target designs. In this dissertation, better use of simulations is made by focusing on using machine learning along with simulation data to find optimal target designs. Machine learning allows for efficient use of limited computational time and resources on simulations, such that an optimal target design can be found in a reasonable amount of time before an ICF experiment. This dissertation specifically looks at using Bayesian optimization for design optimization of LANL's double shell capsules with graded material inner shells. Several Bayesian optimization frameworks are presented, along with a discussion of optimal designs and physics mechanisms that lead to high performing capsule designs. The work from this dissertation will create an improved design process for the LANL double shell (and other) campaigns, providing high fidelity optimization of ICF targets. inertial confinement fusion machine learning
247	Developing machine learning tools to understand transcriptional regulation in plants Song, Qi 09 September 2019 (has links) Abiotic stresses constitute a major category of stresses that negatively impact plant growth and development. It is important to understand how plants cope with environmental stresses and reprogram gene responses which in turn confers stress tolerance. Recent advances of genomic technologies have led to the generation of much genomic data for the model plant, Arabidopsis. To understand gene responses activated by specific external stress signals, these large-scale data sets need to be analyzed to generate new insight of gene functions in stress responses. This poses new computational challenges of mining gene associations and reconstructing regulatory interactions from large-scale data sets. In this dissertation, several computational tools were developed to address the challenges. In Chapter 2, ConSReg was developed to infer condition-specific regulatory interactions and prioritize transcription factors (TFs) that are likely to play condition specific regulatory roles. Comprehensive investigation was performed to optimize the performance of ConSReg and a systematic recovery of nitrogen response TFs was performed to evaluate ConSReg. In Chapter 3, CoReg was developed to infer co-regulation between genes, using only regulatory networks as input. CoReg was compared to other computational methods and the results showed that CoReg outperformed other methods. CoReg was further applied to identified modules in regulatory network generated from DAP-seq (DNA affinity purification sequencing). Using a large expression dataset generated under many abiotic stress treatments, many regulatory modules with common regulatory edges were found to be highly co-expressed, suggesting that target modules are structurally stable modules under abiotic stress conditions. In Chapter 4, exploratory analysis was performed to classify cell types for Arabidopsis root single cell RNA-seq data. This is a first step towards construction of a cell-type-specific regulatory network for Arabidopsis root cells, which is important for improving current understanding of stress response. / Doctor of Philosophy / Abiotic stresses constitute a major category of stresses that negatively impact plant growth and development. It is important to understand how plants cope with environmental stresses and reprogram gene responses which in turn confers stress tolerance to plants. Genomics technology has been used in past decade to generate gene expression data under different abiotic stresses for the model plant, Arabidopsis. Recent new genomic technologies, such as DAP-seq, have generated large scale regulatory maps that provide information regarding which gene has the potential to regulate other genes in the genome. However, this technology does not provide context specific interactions. It is unknown which transcription factor can regulate which gene under a specific abiotic stress condition. To address this challenge, several computational tools were developed to identify regulatory interactions and co-regulating genes for stress response. In addition, using single cell RNA-seq data generated from the model plant organism Arabidopsis, preliminary analysis was performed to build model that classifies Arabidopsis root cell types. This analysis is the first step towards the ultimate goal of constructing cell-typespecific regulatory network for Arabidopsis, which is important for improving current understanding of stress response in plants. regulatory network Machine learning genomics
248	Privacy Preservation for Cloud-Based Data Sharing and Data Analytics Zheng, Yao 21 December 2016 (has links) Data privacy is a globally recognized human right for individuals to control the access to their personal information, and bar the negative consequences from the use of this information. As communication technologies progress, the means to protect data privacy must also evolve to address new challenges come into view. Our research goal in this dissertation is to develop privacy protection frameworks and techniques suitable for the emerging cloud-based data services, in particular privacy-preserving algorithms and protocols for the cloud-based data sharing and data analytics services. Cloud computing has enabled users to store, process, and communicate their personal information through third-party services. It has also raised privacy issues regarding losing control over data, mass harvesting of information, and un-consented disclosure of personal content. Above all, the main concern is the lack of understanding about data privacy in cloud environments. Currently, the cloud service providers either advocate the principle of third-party doctrine and deny users' rights to protect their data stored in the cloud; or rely the notice-and-choice framework and present users with ambiguous, incomprehensible privacy statements without any meaningful privacy guarantee. In this regard, our research has three main contributions. First, to capture users' privacy expectations in cloud environments, we conceptually divide personal data into two categories, i.e., visible data and invisible data. The visible data refer to information users intentionally create, upload to, and share through the cloud; the invisible data refer to users' information retained in the cloud that is aggregated, analyzed, and repurposed without their knowledge or understanding. Second, to address users' privacy concerns raised by cloud computing, we propose two privacy protection frameworks, namely individual control and use limitation. The individual control framework emphasizes users' capability to govern the access to the visible data stored in the cloud. The use limitation framework emphasizes users' expectation to remain anonymous when the invisible data are aggregated and analyzed by cloud-based data services. Finally, we investigate various techniques to accommodate the new privacy protection frameworks, in the context of four cloud-based data services: personal health record sharing, location-based proximity test, link recommendation for social networks, and face tagging in photo management applications. For the first case, we develop a key-based protection technique to enforce fine-grained access control to users' digital health records. For the second case, we develop a key-less protection technique to achieve location-specific user selection. For latter two cases, we develop distributed learning algorithms to prevent large scale data harvesting. We further combine these algorithms with query regulation techniques to achieve user anonymity. The picture that is emerging from the above works is a bleak one. Regarding to personal data, the reality is we can no longer control them all. As communication technologies evolve, the scope of personal data has expanded beyond local, discrete silos, and integrated into the Internet. The traditional understanding of privacy must be updated to reflect these changes. In addition, because privacy is a particularly nuanced problem that is governed by context, there is no one-size-fit-all solution. While some cases can be salvaged either by cryptography or by other means, in others a rethinking of the trade-offs between utility and privacy appears to be necessary. / Ph. D. information privacy cryptography Machine learning
249	VIP: Finding Important People in Images Mathialagan, Clint Solomon 25 June 2015 (has links) People preserve memories of events such as birthdays, weddings, or vacations by capturing photos, often depicting groups of people. Invariably, some individuals in the image are more important than others given the context of the event. This work analyzes the concept of the importance of individuals in group photographs. We address two specific questions - Given an image, who are the most important individuals in it? Given multiple images of a person, which image depicts the person in the most important role? We introduce a measure of importance of people in images and investigate the correlation between importance and visual saliency. We find that not only can we automatically predict the importance of people from purely visual cues, incorporating this predicted importance results in significant improvement in applications such as im2text (generating sentences that describe images of groups of people). / Master of Science Computer Vision Machine Learning Importance
250	Detecting Bots using Stream-based System with Data Synthesis Hu, Tianrui 28 May 2020 (has links) Machine learning has shown great success in building security applications including bot detection. However, many machine learning models are difficult to deploy since model training requires the continuous supply of representative labeled data, which are expensive and time-consuming to obtain in practice. In this thesis, we build a bot detection system with a data synthesis method to explore detecting bots with limited data to address this problem. We collected the network traffic from 3 online services in three different months within a year (23 million network requests). We develop a novel stream-based feature encoding scheme to support our model to perform real-time bot detection on anonymized network data. We propose a data synthesis method to synthesize unseen (or future) bot behavior distributions to enable our system to detect bots with extremely limited labeled data. The synthesis method is distribution-aware, using two different generators in a Generative Adversarial Network to synthesize data for the clustered regions and the outlier regions in the feature space. We evaluate this idea and show our method can train a model that outperforms existing methods with only 1% of the labeled data. We show that data synthesis also improves the model's sustainability over time and speeds up the retraining. Finally, we compare data synthesis and adversarial retraining and show they can work complementary with each other to improve the model generalizability. / Master of Science / An internet bot is a computer-controlled software performing simple and automated tasks over the internet. Although some bots are legitimate, many bots are operated to perform malicious behaviors causing severe security and privacy issues. To address this problem, machine learning (ML) models that have shown great success in building security applications are widely used in detecting bots since they can identify hidden patterns learning from data. However, many ML-based approaches are difficult to deploy since model training requires labeled data, which are expensive and time-consuming to obtain in practice, especially for security tasks. Meanwhile, the dynamic-changing nature of malicious bots means bot detection models need the continuous supply of representative labeled data to keep the models up-to-date, which makes bot detection more challenging. In this thesis, we build an ML-based bot detection system to detect advanced malicious bots in real-time by processing network traffic data. We explore using a data synthesis method to detect bots with limited training data to address the limited and unrepresentative labeled data problem. Our proposed data synthesis method synthesizes unseen (or future) bot behavior distributions to enable our system to detect bots with extremely limited labeled data. We evaluate our approach using real-world datasets we collected and show that our model outperforms existing methods using only 1% of the labeled data. We show that data synthesis also improves the model's sustainability over time and helps to keep it up-to-date easier. Finally, we show that our method can work complementary with adversarial retraining to improve the model generalizability. Bot Detection Security Machine learning

Search results