Global ETD Search

141	A Strategy Oriented, Machine Learning Approach to Automatic Quality Assessment of Wikipedia Articles De La Calzada, Gabriel 01 April 2009 (has links) (PDF) This work discusses an approach to modeling and measuring information quality of Wikipedia articles. The approach is based on the idea that the quality of Wikipedia articles with distinctly diﬀerent proﬁles needs to be measured using diﬀerent information quality models. To implement this approach, a software framework written in the Java language was developed to collect and analyze information of Wikipedia articles. We report on our initial study, which involved two categories of Wikipedia articles: ”stabilized” (those, whose content has not undergone major changes for a signiﬁcant period of time) and ”controversial” (articles that have undergone vandalism, revert wars, or whose content is subject to internal discussions between Wikipedia editors). In addition, we present simple information quality models and compare their performance on a subset of Wikipedia articles with the information quality evaluations provided by human users. Our experiment shows that using special-purpose models for information quality captures user sentiment about Wikipedia articles better than using a single model for both categories of articles. wikipedia quality assessment machine learning
142	Towards General Mental Health Biomarkers : Machine Learning Analysis of Multi-Disorder EEG Data Talekar, Akshay 17 April 2023 (has links) Several studies have made use of EEG features to detect specific mental health illnesses such as epilepsy or schizophrenia, as supplementary diagnosis to the usual symptom-based diagnoses. At the same time general mental health diagnostic tools (biomarker or symptom-based) to identify individuals who are manifesting early signs of mental health disorders are not commonly available. This thesis seeks to explore the potential use of EEG features as a biomarker-based tool for general mental health diagnosis. Specifically, the predictive ability using machine learning of a general biomarker derived from EEG readings elicited from an oddball auditory experiment to predict someone’s mental health status (mentally ill or healthy) is investigated in this study. Given that mindfulness exercises are regularly provided as treatment for a wide range of mental illnesses, the features of interest seek to quantify it as a measure of mental health. The 2 feature sets developed and tested in this study were collected from a traumatic brain injury (TBI) and healthy controls dataset. Further testing of these feature sets was done on the Bipolar and Schizophrenia Network on Intermediate Phenotypes (BSNIP) dataset containing multiple mental illnesses and healthy controls to test the features for generalizability. Feature Set 1 consisted of the average and variance of P300 and N200 ERP component peak amplitudes and latencies across the centroparietal and fronto-central EEG channels respectively. Feature Set 2 contains the average and variance of P300 and N200 ERP component mean amplitudes across the centro-parietal andfronto-central EEG channels respectively. The predictive ability of these 2 feature sets was tested. Logistic regression, support vector machines, decision trees, random forests, KNN classification algorithms were used, and random forest and KNN were used in combination with oversampling to predict the mental health status of the subjects (whether they were cases or healthy controls). The model performance was tested using accuracy, precision, sensitivity, specificity, f1 score, confusion matrices, and AUC of the ROC. The results of this thesis show promise on the use of EEG features as biomarkers to diagnose mental illnesses or to get a better understanding of mental wellness. The use of this technology opens doors for more accurate, biomarker-based diagnosis of mental health conditions, lowering the cost of mental health care, and making mental health care accessible for more people. EEG Machine Learning Mental Illness
143	<strong>MODELING ACUTE CARE UTILIZATION FOR INSOMNIA PATIENTS </strong> Zitong Zhu (16629747) 30 August 2023 (has links) <p> </p> <p>Machine learning (ML) models can help improve health care services. However, they need to be practical to gain wide adoption. A methodology is proposed in this study to evaluate the utility of different data modalities and cohort segmentation strategies when designing these models. The methodology is used to compare models that predict emergency department (ED) and inpatient hospital (IH) visits. The data modalities include socio-demographics, diagnosis and medications and cohort segmentation is based on age group and disease severity. The proposed methodology is applied to models developed using a cohort of insomnia patients and a cohort of general non- insomnia patients under different data modalities and segmentation strategies. All models are evaluated using the traditional intra-cohort testing. In addition, to establish the need for disease- specific segmentation, transfer testing is recommended where the same insomnia test patients used for intra-cohort testing are submitted to the general-patient model. The results indicate that using both diagnosis and medications as a source of data does not generally improve model performance and may increase its overhead. For insomnia patients, the best ED and IH models using both data modalities or either one of the modalities achieved an area under the receiver operating curve (AUC) of 0.71 and 78, respectively. Our results also show that an insomnia-specific model is not necessary when predicting future ED visits but may have merit when predicting IH visits. As such, we recommend the evaluation of disease-specific models using transfer testing. Based on these initial findings, a language model was pretrained using diagnosis codes. This model can be used for the prediction of future ED and IH visits for insomnia and non-insomnia patients. </p> Applications in health insomnia machine learning
144	Vision Approach for Position Estimation Using Moiré Patterns and Convolutional Neural Networks Alotaibi, Nawaf 05 1900 (has links) In order for a robot to operate autonomously in an environment, it must be able to locate itself within it. A robot's position and orientation cannot be directly measured by physical sensors, so estimating it is a non-trivial problem. Some sensors provide this information, such as the Global Navigation Satellite System (GNSS) and Motion capture (Mo-cap). Nevertheless, these sensors are expensive to set up, or they are not useful in environments where autonomous vehicles are often deployed. Our proposal explores a new approach to sensing for relative motion and position estimation. It consists of one vision sensor and a marker that utilizes moiré phenomenon to estimate the position of the vision sensor by using Convolutional Neural Networks (CNN) trained to estimate the position from the pattern shown on the marker. We share the process of data collection and training of the network and share the hyperparameter search method used to optimize the structure of the network. We test the trained network in a setup to evaluate its ability in estimating position. The system achieved an average absolute error of 1 cm, showcasing a method that could be used to overcome the current limitations of vision approaches in pose estimation. Vision Position Estimation Machine Learning
145	Toward Designing Active ORR Catalysts via Interpretable and Explainable Machine Learning Omidvar, Noushin 22 September 2022 (has links) The electrochemical oxygen reduction reaction (ORR) is a very important catalytic process that is directly used in carbon-free energy systems like fuel cells. However, the lack of active, stable, and cost-effective ORR cathode materials has been a major impediment to the broad adoption of these technologies. So, the challenge for researchers in catalysis is to find catalysts that are electrochemically efficient to drive the reaction, made of earth-abundant elements to lower material costs and allow scalability, and stable to make them last longer. The majority of commercial catalysts that are now being used have been found through trial and error techniques that rely on the chemical intuition of experts. This method of empirical discovery is, however, very challenging, slow, and complicated because the performance of the catalyst depends on a myriad of factors. Researchers have recently turned to machine learning (ML) to find and design heterogeneous catalysts faster with emerging catalysis databases. Black-box models make up a lot of the ML models that are used in the field to predict the properties of catalysts that are important to their performance, such as their adsorption energies to reaction intermediates. However, as these black-box models are based on very complicated mathematical formulas, it is very hard to figure out how they work and the underlying physics of the desired catalyst properties remains hidden. As a way to open up these black boxes and make them easier to understand, more attention is being paid to interpretable and explainable ML. This work aims to speed up the process of screening and optimizing Pt monolayer alloys for ORR while gaining physical insights. We use a theory-infused machine learning framework in combination with a high-throughput active screening approach to effectively find promising ORR Pt monolayer catalysts. Furthermore, an explainability game-theory approach is employed to find electronic factors that control surface reactivity. The novel insights in this study can provide new design strategies that could shape the paradigm of catalyst discovery. / Doctor of Philosophy / The electrochemical oxygen reduction reaction (ORR) is a very important catalytic process that is used directly in carbon-free energy systems like fuel cells. But the lack of ORR cathode materials that are active, stable, and cheap has made it hard for these technologies to be widely used. Most of the commercially used catalysts have been found through trial-and-error methods that rely on the chemical intuition of experts. This method of finding out through experience is hard, slow, and complicated, though, because the performance of the catalyst depends on a variety of factors. Researchers are now using machine learning (ML) and new catalysis databases to find and design heterogeneous catalysts faster. But because black-box ML models are based on very complicated mathematical formulas, it is very hard to figure out how they work, and the physics behind the desired catalyst properties remains hidden. In recent years, more attention has been paid to ML that can be understood and explained as a way to decode these "black boxes" and make them easier to understand. The goal of this work is to speed up the screening and optimization of Pt monolayer alloys for ORR. We find promising ORR Pt monolayer catalysts by using a machine learning framework that is based on theory and a high-throughput active screening method. A game-theory approach is also used to find the electronic factors that control surface reactivity. The new ideas in this study can lead to new ways of designing that could alter how researchers find catalysts. Catalysis Machine Learning Explainable AI
146	Accelerating Catalytic Materials Discovery for Sustainable Nitrogen Transformations by Interpretable Machine Learning Pillai, Hemanth Somarajan 12 January 2023 (has links) Computational chemistry and machine learning approaches are combined to understand the mechanisms, derive activity trends, and ultimately to search for active electrocatalysts for the electrochemical oxidation of ammonia (AOR) and nitrate reduction (NO3RR). Both re- actions play vital roles within the nitrogen cycle and have important applications within tackling current environmental issues. Mechanisms are studied through the use of density functional theory (DFT) for AOR and NO3RR, subsequently a descriptor based approach is used to understand activity trends on a wide range of electrocatalysts. For AOR inter- pretable machine learning is used in conjunction with active learning to screen for active and stable ternary electrocatalysts. We find Pt3RuCo, Pt3RuNi and Pt3RuFe show great activity, and are further validated via experimental results. By leveraging the advantages of the interpretible machine learning model we elucidate the underlying electronic factors for the stronger N binding which leads to the observed improved activity. For NO3RR an interpretible machine learning model is used to understand ways to bypass the stringent limitations put on the electrocatalytic activity due to the N vs NO3 scaling relations. It is found that the N binding energy can be tuned while leaving the NO3 binding energy unaffected by ensuring that the subsurface atom interacts strongly with the N. Based on this analysis we suggest the B2 CuPd as a potential active electrocatalyst for this reaction, which is further validated by experiments / Doctor of Philosophy / The chemical reactions that makeup the nitrogen cycle have played a pivotal role in human society, consider the fact that one of the most impactful achievements of the 20th century was the conversion of nitrogen (N2) to ammonia (NH3) via the Haber-Bosch process. The key class of materials to facilitate such transformations are called catalysts, which provide a reactive surface for the reaction to occur at reasonable reaction rates. Using quantum chemistry we can understand how various reactions proceed on the catalyst surface and how the catalyst can be designed to maximize the reaction rate. Specifically here we are interested in the electrochemical oxidation of ammonia (AOR) and reduction of nitrate (NO3RR), which have important energy and environmental applications. The atomistic insight provided by quantum chemistry helps us understand the reaction mechanism and key hurdles in developing new catalysts. Machine learning can then be leveraged in various ways to find novel catalysts. For AOR machine learning finds novel active catalysts from a diverse design space, which are then experimentally tested and verified. Through the use of our machine learning algorithm (TinNet) we also provide new insights into why the catalysts are more active, and suggest novel physics that can help design active catalysts. For NO3RR we use machine learning as a tool to help us understand the hurdles in catalyst design better which then guides our catalyst discovery. It is shown that CuPd could be a potential candidate and is also verified via experimental synthesis and performance testing. DFT Catalysis nitrogen machine learning
147	Coupling Computationally Expensive Radiative Hydrodynamic Simulations with Machine Learning for Graded Inner Shell Design Optimization in Double Shell Capsules Vazirani, Nomita Nirmal 29 December 2022 (has links) High energy density experiments rely heavily on predictive physics simulations in the design process. Specifically in inertial confinement fusion (ICF), predictive physics simulations, such as in the radiation-hydrodynamics code xRAGE, are computationally expensive, limiting the design process and ability to find an optimal design. Machine learning provides a mechanism to leverage expensive simulation data and alleviate limitations on computational time and resources in the search for an optimal design. Machine learning efficiently identifies regions of design space with high predicted performance as well as regions with high uncertainty to focus simulations, which may lead to unexpected designs with great potential. This dissertation focuses on the application of Bayesian optimization to design optimization for ICF experiments conducted by the double shell campaign at Los Alamos National Lab (LANL). The double shell campaign is interested in implementing graded inner shell layers to their capsule geometry. Graded inner shell layers are expected to improve stability in the implosions with fewer sharp density jumps, but at the cost of lower yields, in comparison to the nominal bilayer inner shell targets. This work explores minimizing hydrodynamic instability and maximizing yield for the graded inner shell targets by building and coupling a multi-fidelity Bayesian optimization framework with multi-dimensional xRAGE simulations for an improved design process. / Doctor of Philosophy / Inertial confinement fusion (ICF) is an active field of research in which a fuel is compressed to extreme temperatures and densities to achieve thermonuclear ignition. Ignition is achieved when the fuel can continuously heat itself and sustain its reactions. These fusion reactions would produce large amounts of energy. Power plants using fusion could solve many of the world's energy concerns with far less pollution than current energy sources. Although ignition has not been achieved in the lab, ICF researchers are actively working towards this goal. At Los Alamos National Lab (LANL), ICF researchers are focused on studying ignition-relevant conditions for "double shell" targets through experiments at laser facilities, such at the National Ignition Facility (NIF). These experiments are extremely expensive to field, design, and analyze. To obtain the maximum information from each experiment, researchers rely on predictive physics simulations, which are computationally intensive, making it difficult to find optimal target designs. In this dissertation, better use of simulations is made by focusing on using machine learning along with simulation data to find optimal target designs. Machine learning allows for efficient use of limited computational time and resources on simulations, such that an optimal target design can be found in a reasonable amount of time before an ICF experiment. This dissertation specifically looks at using Bayesian optimization for design optimization of LANL's double shell capsules with graded material inner shells. Several Bayesian optimization frameworks are presented, along with a discussion of optimal designs and physics mechanisms that lead to high performing capsule designs. The work from this dissertation will create an improved design process for the LANL double shell (and other) campaigns, providing high fidelity optimization of ICF targets. inertial confinement fusion machine learning
148	Developing machine learning tools to understand transcriptional regulation in plants Song, Qi 09 September 2019 (has links) Abiotic stresses constitute a major category of stresses that negatively impact plant growth and development. It is important to understand how plants cope with environmental stresses and reprogram gene responses which in turn confers stress tolerance. Recent advances of genomic technologies have led to the generation of much genomic data for the model plant, Arabidopsis. To understand gene responses activated by specific external stress signals, these large-scale data sets need to be analyzed to generate new insight of gene functions in stress responses. This poses new computational challenges of mining gene associations and reconstructing regulatory interactions from large-scale data sets. In this dissertation, several computational tools were developed to address the challenges. In Chapter 2, ConSReg was developed to infer condition-specific regulatory interactions and prioritize transcription factors (TFs) that are likely to play condition specific regulatory roles. Comprehensive investigation was performed to optimize the performance of ConSReg and a systematic recovery of nitrogen response TFs was performed to evaluate ConSReg. In Chapter 3, CoReg was developed to infer co-regulation between genes, using only regulatory networks as input. CoReg was compared to other computational methods and the results showed that CoReg outperformed other methods. CoReg was further applied to identified modules in regulatory network generated from DAP-seq (DNA affinity purification sequencing). Using a large expression dataset generated under many abiotic stress treatments, many regulatory modules with common regulatory edges were found to be highly co-expressed, suggesting that target modules are structurally stable modules under abiotic stress conditions. In Chapter 4, exploratory analysis was performed to classify cell types for Arabidopsis root single cell RNA-seq data. This is a first step towards construction of a cell-type-specific regulatory network for Arabidopsis root cells, which is important for improving current understanding of stress response. / Doctor of Philosophy / Abiotic stresses constitute a major category of stresses that negatively impact plant growth and development. It is important to understand how plants cope with environmental stresses and reprogram gene responses which in turn confers stress tolerance to plants. Genomics technology has been used in past decade to generate gene expression data under different abiotic stresses for the model plant, Arabidopsis. Recent new genomic technologies, such as DAP-seq, have generated large scale regulatory maps that provide information regarding which gene has the potential to regulate other genes in the genome. However, this technology does not provide context specific interactions. It is unknown which transcription factor can regulate which gene under a specific abiotic stress condition. To address this challenge, several computational tools were developed to identify regulatory interactions and co-regulating genes for stress response. In addition, using single cell RNA-seq data generated from the model plant organism Arabidopsis, preliminary analysis was performed to build model that classifies Arabidopsis root cell types. This analysis is the first step towards the ultimate goal of constructing cell-typespecific regulatory network for Arabidopsis, which is important for improving current understanding of stress response in plants. regulatory network Machine learning genomics
149	Privacy Preservation for Cloud-Based Data Sharing and Data Analytics Zheng, Yao 21 December 2016 (has links) Data privacy is a globally recognized human right for individuals to control the access to their personal information, and bar the negative consequences from the use of this information. As communication technologies progress, the means to protect data privacy must also evolve to address new challenges come into view. Our research goal in this dissertation is to develop privacy protection frameworks and techniques suitable for the emerging cloud-based data services, in particular privacy-preserving algorithms and protocols for the cloud-based data sharing and data analytics services. Cloud computing has enabled users to store, process, and communicate their personal information through third-party services. It has also raised privacy issues regarding losing control over data, mass harvesting of information, and un-consented disclosure of personal content. Above all, the main concern is the lack of understanding about data privacy in cloud environments. Currently, the cloud service providers either advocate the principle of third-party doctrine and deny users' rights to protect their data stored in the cloud; or rely the notice-and-choice framework and present users with ambiguous, incomprehensible privacy statements without any meaningful privacy guarantee. In this regard, our research has three main contributions. First, to capture users' privacy expectations in cloud environments, we conceptually divide personal data into two categories, i.e., visible data and invisible data. The visible data refer to information users intentionally create, upload to, and share through the cloud; the invisible data refer to users' information retained in the cloud that is aggregated, analyzed, and repurposed without their knowledge or understanding. Second, to address users' privacy concerns raised by cloud computing, we propose two privacy protection frameworks, namely individual control and use limitation. The individual control framework emphasizes users' capability to govern the access to the visible data stored in the cloud. The use limitation framework emphasizes users' expectation to remain anonymous when the invisible data are aggregated and analyzed by cloud-based data services. Finally, we investigate various techniques to accommodate the new privacy protection frameworks, in the context of four cloud-based data services: personal health record sharing, location-based proximity test, link recommendation for social networks, and face tagging in photo management applications. For the first case, we develop a key-based protection technique to enforce fine-grained access control to users' digital health records. For the second case, we develop a key-less protection technique to achieve location-specific user selection. For latter two cases, we develop distributed learning algorithms to prevent large scale data harvesting. We further combine these algorithms with query regulation techniques to achieve user anonymity. The picture that is emerging from the above works is a bleak one. Regarding to personal data, the reality is we can no longer control them all. As communication technologies evolve, the scope of personal data has expanded beyond local, discrete silos, and integrated into the Internet. The traditional understanding of privacy must be updated to reflect these changes. In addition, because privacy is a particularly nuanced problem that is governed by context, there is no one-size-fit-all solution. While some cases can be salvaged either by cryptography or by other means, in others a rethinking of the trade-offs between utility and privacy appears to be necessary. / Ph. D. information privacy cryptography Machine learning
150	VIP: Finding Important People in Images Mathialagan, Clint Solomon 25 June 2015 (has links) People preserve memories of events such as birthdays, weddings, or vacations by capturing photos, often depicting groups of people. Invariably, some individuals in the image are more important than others given the context of the event. This work analyzes the concept of the importance of individuals in group photographs. We address two specific questions - Given an image, who are the most important individuals in it? Given multiple images of a person, which image depicts the person in the most important role? We introduce a measure of importance of people in images and investigate the correlation between importance and visual saliency. We find that not only can we automatically predict the importance of people from purely visual cues, incorporating this predicted importance results in significant improvement in applications such as im2text (generating sentences that describe images of groups of people). / Master of Science Computer Vision Machine Learning Importance

Search results