Global ETD Search

251	Lifetime and Degradation Science of Polymeric Encapsulant in Photovoltaic Systems: Investigating the Role of Ethylene Vinyl Acetate in Photovoltaic Module Performance Loss with Semi-gSEM Analytics Wheeler, Nicholas Robert 08 February 2017 (has links) No description available. Plastics Polymers Materials Science
252	Temporal Change in the Power Production of Real-world Photovoltaic Systems Under Diverse Climatic Conditions Hu, Yang 08 February 2017 (has links) No description available. Engineering Energy
253	Node Centric Community Detection and Evolutional Prediction in Dynamic Networks Oluwafolake A Ayano (13161288) 27 July 2022 (has links) <p> </p> <p>Advances in technology have led to the availability of data from different platforms such as the web and social media platforms. Much of this data can be represented in the form of a network consisting of a set of nodes connected by edges. The nodes represent the items in the networks while the edges represent the interactions between the nodes. Community detection methods have been used extensively in analyzing these networks. However, community detection in evolving networks has been a significant challenge because of the frequent changes to the networks and the need for real-time analysis. Using Static community detection methods for analyzing dynamic networks will not be appropriate because static methods do not retain a network’s history and cannot provide real-time information about the communities in the network.</p> <p>Existing incremental methods treat changes to the network as a sequence of edge additions and/or removals; however, in many real-world networks, changes occur when a node is added with all its edges connecting simultaneously. </p> <p>For efficient processing of such large networks in a timely manner, there is a need for an adaptive analytical method that can process large networks without recomputing the entire network after its evolution and treat all the edges involved with a node equally. </p> <p>We proposed a node-centric community detection method that incrementally updates the community structure in the network using the already known structure of the network to avoid recomputing the entire network from the scratch and consequently achieve a high-quality community structure. The results from our experiments suggest that our approach is efficient for incremental community detection of node-centric evolving networks. </p> Data engineering and data science Data mining and knowledge discovery Graph, social and multimedia data Community Detection Dynamic Networks IP Networks Clustering Big Data Analytics
254	DATA INTEGRITY IN THE HEALTHCARE INDUSTRY: ANALYZING THE EFFECTIVENESS OF DATA SECURITY IN GOOD DATA AND RECORD MANAGEMENT PRACTICES (A CASE STUDY OF COMPUTERIZING THE COMPETENCE MATRIX FOR A QUALITY CONTROL DRUG LABORATORY) Marcel C Okezue (12522565) 06 October 2022 (has links) <p> </p> <p>This project analyzes the concept of time efficiency in the data management process associated with the personnel training and competence assessments in the quality control (QC) laboratory of Nigeria’s foods and drugs authority (NAFDAC). The laboratory administrators are encumbered with a lot of mental and paper-based record keeping because the personnel training data is managed manually. Consequently, the personnel training and competence assessments in the laboratory are not efficiently conducted. The Microsoft Excel spreadsheet provided by a Purdue doctoral dissertation as a remedial to this is found to be deficient in handling operations in database tables. As a result, hence doctoral dissertation did not appropriately address the inefficiencies.</p> <p>The problem addressed by this study is the operational inefficiency that results from the manual or Excel-based personnel training data management process in the NAFDAC laboratory. The purpose, therefore, is to reduce the time it essentially takes to generate, obtain, manipulate, exchange, and securely store the personnel competence training and assessment data. To do this, the study developed a software system that is integrated with a relational database management system (RDBMS) to improve the manual/Microsoft Excel-based data management procedure. This project examines the operational (time) efficiencies in using manual or Excel-based format in comparison with the new system that this project developed, as a method to ascertain its validity.</p> <p>The data used in this qualitative research is from literary sources and from simulating the distinction between the times spent in administering personnel training and competence assessment using the New system developed by this study and the Excel system by another project, respectively. The fundamental finding of this study is that the idea of improving the operational (time) efficiency in the personnel training and competence assessment process in the QC laboratory is valid. Doing that will reduce human errors, achieve enhanced time-efficient operation, and improve personnel training and competence assessment processes.</p> <p>Recommendations are made as to the procedure the laboratory administrator must adopt to take advantage of the new system. The study also recommended the steps for the potential research to extend the capability of this project. </p> Data quality Database systems RDBMS Data Management System Personnel data PROFICIENCY competence matrix manual data processing NAFDAC
255	Uncertainty-aware deep learning for prediction of remaining useful life of mechanical systems Cornelius, Samuel J 10 December 2021 (has links) Remaining useful life (RUL) prediction is a problem that researchers in the prognostics and health management (PHM) community have been studying for decades. Both physics-based and data-driven methods have been investigated, and in recent years, deep learning has gained significant attention. When sufficiently large and diverse datasets are available, deep neural networks can achieve state-of-the-art performance in RUL prediction for a variety of systems. However, for end users to trust the results of these models, especially as they are integrated into safety-critical systems, RUL prediction uncertainty must be captured. This work explores an approach for estimating both epistemic and heteroscedastic aleatoric uncertainties that emerge in RUL prediction deep neural networks and demonstrates that quantifying the overall impact of these uncertainties on predictions reveal valuable insight into model performance. Additionally, a study is carried out to observe the effects of RUL truth data augmentation on perceived uncertainties in the model. deep learning heteroscedastic aleatoric uncertainty epistemic uncertainty remaining useful life prediction prognostics and health management C-MAPSS Data Science Other Mechanical Engineering
256	Causal Inference in the Face of Assumption Violations Yuki Ohnishi (18423810) 26 April 2024 (has links) <p dir="ltr">This dissertation advances the field of causal inference by developing methodologies in the face of assumption violations. Traditional causal inference methodologies hinge on a core set of assumptions, which are often violated in the complex landscape of modern experiments and observational studies. This dissertation proposes novel methodologies designed to address the challenges posed by single or multiple assumption violations. By applying these innovative approaches to real-world datasets, this research uncovers valuable insights that were previously inaccessible with existing methods. </p><p><br></p><p dir="ltr">First, three significant sources of complications in causal inference that are increasingly of interest are interference among individuals, nonadherence of individuals to their assigned treatments, and unintended missing outcomes. Interference exists if the outcome of an individual depends not only on its assigned treatment, but also on the assigned treatments for other units. It commonly arises when limited controls are placed on the interactions of individuals with one another during the course of an experiment. Treatment nonadherence frequently occurs in human subject experiments, as it can be unethical to force an individual to take their assigned treatment. Clinical trials, in particular, typically have subjects that do not adhere to their assigned treatments due to adverse side effects or intercurrent events. Missing values also commonly occur in clinical studies. For example, some patients may drop out of the study due to the side effects of the treatment. Failing to account for these considerations will generally yield unstable and biased inferences on treatment effects even in randomized experiments, but existing methodologies lack the ability to address all these challenges simultaneously. We propose a novel Bayesian methodology to fill this gap. </p><p><br></p><p dir="ltr">My subsequent research further addresses one of the limitations of the first project: a set of assumptions about interference structures that may be too restrictive in some practical settings. We introduce a concept of the ``degree of interference" (DoI), a latent variable capturing the interference structure. This concept allows for handling arbitrary, unknown interference structures to facilitate inference on causal estimands. </p><p><br></p><p dir="ltr">While randomized experiments offer a solid foundation for valid causal analysis, people are also interested in conducting causal inference using observational data due to the cost and difficulty of randomized experiments and the wide availability of observational data. Nonetheless, using observational data to infer causality requires us to rely on additional assumptions. A central assumption is that of \emph{ignorability}, which posits that the treatment is randomly assigned based on the variables (covariates) included in the dataset. While crucial, this assumption is often debatable, especially when treatments are assigned sequentially to optimize future outcomes. For instance, marketers typically adjust subsequent promotions based on responses to earlier ones and speculate on how customers might have reacted to alternative past promotions. This speculative behavior introduces latent confounders, which must be carefully addressed to prevent biased conclusions. </p><p dir="ltr">In the third project, we investigate these issues by studying sequences of promotional emails sent by a US retailer. We develop a novel Bayesian approach for causal inference from longitudinal observational data that accommodates noncompliance and latent sequential confounding. </p><p><br></p><p dir="ltr">Finally, we formulate the causal inference problem for the privatized data. In the era of digital expansion, the secure handling of sensitive data poses an intricate challenge that significantly influences research, policy-making, and technological innovation. As the collection of sensitive data becomes more widespread across academic, governmental, and corporate sectors, addressing the complex balance between making data accessible and safeguarding private information requires the development of sophisticated methods for analysis and reporting, which must include stringent privacy protections. Currently, the gold standard for maintaining this balance is Differential privacy. </p><p dir="ltr">Local differential privacy is a differential privacy paradigm in which individuals first apply a privacy mechanism to their data (often by adding noise) before transmitting the result to a curator. The noise for privacy results in additional bias and variance in their analyses. Thus, it is of great importance for analysts to incorporate the privacy noise into valid inference.</p><p dir="ltr">In this final project, we develop methodologies to infer causal effects from locally privatized data under randomized experiments. We present frequentist and Bayesian approaches and discuss the statistical properties of the estimators, such as consistency and optimality under various privacy scenarios.</p> Econometric and statistical methods Applied statistics Computational statistics Statistical data science Statistical theory Causal Inference Bayesian statistics Interference Noncompliance Missing not at random (MNAR) Bayesian Nonparametrics Differential privacy
257	Three Essays on Analysis of U.S. Infant Mortality Using Systems and Data Science Approaches Ebrahimvandi, Alireza 02 January 2020 (has links) High infant mortality (IM) rates in the U.S. have been a major public health concern for decades. Many studies have focused on understanding causes, risk factors, and interventions that can reduce IM. However, death of an infant is the result of the interplay between many risk factors, which in some cases can be traced to the infancy of their parents. Consequently, these complex interactions challenge the effectiveness of many interventions. The long-term goal of this study is to advance the common understanding of effective interventions for improving health outcomes and, in particular, infant mortality. To achieve this goal, I implemented systems and data science methods in three essays to contribute to the understanding of IM causes and risk factors. In the first study, the goal was to identify patterns in the leading causes of infant mortality across states that successfully reduced their IM rates. I explore the trends at the state-level between 2000 and 2015 to identify patterns in the leading causes of IM. This study shows that the main drivers of IM rate reduction is the preterm-related mortality rate. The second study builds on these findings and investigates the risk factors of preterm birth (PTB) in the largest obstetric population that has ever been studied in this field. By applying the latest statistical and machine learning techniques, I study the PTB risk factors that are both generalizable and identifiable during the early stages of pregnancy. A major finding of this study is that socioeconomic factors such as parent education are more important than generally known factors such as race in the prediction of PTB. This finding is significant evidence for theories like Lifecourse, which postulate that the main determinants of a health trajectory are the social scaffolding that addresses the upstream roots of health. These results point to the need for more comprehensive approaches that change the focus from medical interventions during pregnancy to the time where mothers become vulnerable to the risk factors of PTB. Therefore, in the third study, I take an aggregate approach to study the dynamics of population health that results in undesirable outcomes in major indicators like infant mortality. Based on these new explanations, I offer a systematic approach that can help in addressing adverse birth outcomes—including high infant mortality and preterm birth rates—which is the central contribution of this dissertation. In conclusion, this dissertation contributes to a better understanding of the complexities in infant mortality and health-related policies. This work contributes to the body of literature both in terms of the application of statistical and machine learning techniques, as well as in advancing health-related theories. / Doctor of Philosophy / The U.S. infant mortality rate (IMR) is 71% higher than the average rate for comparable countries in the Organization for Economic Co-operation and Development (OECD). High infant mortality and preterm birth rates (PBR) are major public health concerns in the U.S. A wide range of studies have focused on understanding the causes and risk factors of infant mortality and interventions that can reduce it. However, infant mortality is a complex phenomenon that challenges the effectiveness of the interventions, and the IMR and PBR in the U.S. are still higher than any other advanced OECD nation. I believe that systems and data science methods can help in enhancing our understanding of infant mortality causes, risk factors, and effective interventions. There are more than 130 diagnoses—causes—for infant mortality. Therefore, for 50 states tracking the causes of infant mortality trends over a long time period is very challenging. In the first essay, I focus on the medical aspects of infant mortality to find the causes that helped the reduction of the infant mortality rates in certain states from 2000 to 2015. In addition, I investigate the relationship between different risk factors with infant mortality in a regression model to investigate and find significant correlations. This study provides critical recommendations to policymakers in states with high infant mortality rates and guides them on leveraging appropriate interventions. Preterm birth (PTB) is the most significant contributor to the IMR. The first study showed that a reduction in infant mortality happened in states that reduced their preterm birth. There exists a considerable body of literature on identifying the PTB risk factors in order to find possible explanations for consistently high rates of PTB and IMR in the U.S. However, they have fallen short in two key areas: generalizability and being able to detect PTB in early pregnancy. In the second essay, I investigate a wide range of risk factors in the largest obstetric population that has ever been studied in PTB research. The predictors in this study consist of a wide range of variables from environmental (e.g., air pollution) to medical (e.g., history of hypertension) factors. Our objective is to increase the understanding of factors that are both generalizable and identifiable during the early stage of pregnancy. I implemented state-of-the-art statistical and machine learning techniques and improved the performance measures compared to the previous studies. The results of this study reveal the importance of socioeconomic factors such as, parent education, which can be as important as biomedical indicators like the mother's body mass index in predicting preterm delivery. The second study showed an important relationship between socioeconomic factors such as, education and major health outcomes such as preterm birth. Short-term interventions that focus on improving the socioeconomic status of a mother during pregnancy have limited to no effect on birth outcomes. Therefore, we need to implement more comprehensive approaches and change the focus from medical interventions during pregnancy to the time where mothers become vulnerable to the risk factors of PTB. Hence, we use a systematic approach in the third study to explore the dynamics of health over time. This is a novel study, which enhances our understanding of the complex interactions between health and socioeconomic factors over time. I explore why some communities experience the downward spiral of health deterioration, how resources are generated and allocated, how the generation and allocation mechanisms are interconnected, and why we can see significantly different health outcomes across otherwise similar states. I use Ohio as the case study, because it suffers from poor health outcomes despite having one of the best healthcare systems in the nation. The results identify the trap of health expenditure and how an external financial shock can exacerbate health and socioeconomic factors in such a community. I demonstrate how overspending or underspending in healthcare can affect health outcomes in a society in the long-term. Overall, this dissertation contributes to a better understanding of the complexities associated with major health issues of the U.S. I provide health professionals with theoretical and empirical foundations of risk assessment for reducing infant mortality and preterm birth. In addition, this study provides a systematic perspective on the issue of health deterioration that many communities in the US are experiencing, and hope that this perspective improves policymakers' decision-making. infant mortality preterm birth risk factors health disparities predictive modeling health interventions data science statistical analysis Machine learning system dynamics Optimization Simulation endogenous dynamics
258	Deep Learning One-Class Classification With Support Vector Methods Hampton, Hayden D 01 January 2024 (has links) (PDF) Through the specialized lens of one-class classification, anomalies–irregular observations that uncharacteristically diverge from normative data patterns–are comprehensively studied. This dissertation focuses on advancing boundary-based methods in one-class classification, a critical approach to anomaly detection. These methodologies delineate optimal decision boundaries, thereby facilitating a distinct separation between normal and anomalous observations. Encompassing traditional approaches such as One-Class Support Vector Machine and Support Vector Data Description, recent adaptations in deep learning offer a rich ground for innovation in anomaly detection. This dissertation proposes three novel deep learning methods for one-class classification, aiming to enhance the efficacy and accuracy of anomaly detection in an era where data volume and complexity present unprecedented challenges. The first two methods are designed for tabular data from a least squares perspective. Formulating these optimization problems within a least squares framework offers notable advantages. It facilitates the derivation of closed-form solutions for critical gradients that largely influence the optimization procedure. Moreover, this approach circumvents the prevalent issue of degenerate or uninformative solutions, a challenge often associated with these types of deep learning algorithms. The third method is designed for second-order tensors. This proposed method has certain computational advantages and alleviates the need for vectorization, which can lead to structural information loss when spatial or contextual relationships exist in the data structure. The performance of the three proposed methods are demonstrated with simulation studies and real-world datasets. Compared to kernel-based one-class classification methods, the proposed deep learning methods achieve significantly better performance under the settings considered. one-class classification deep learning neural network support vector data description anomaly detection one-class support vector machine Categorical Data Analysis Data Science
259	[en] ON THE INTERACTION BETWEEN SOFTWARE ENGINEERS AND DATA SCIENTISTS WHEN BUILDING MACHINE LEARNING-ENABLED SYSTEMS / [pt] SOBRE A INTERAÇÃO ENTRE ENGENHEIROS DE SOFTWARE E CIENTISTAS DE DADOS CONSTRUINDO SISTEMAS HABILITADOS POR APRENDIZADO DE MÁQUINA GABRIEL DE ANDRADE BUSQUIM 18 June 2024 (has links) [pt] Nos últimos anos, componentes de aprendizado de máquina têm sido cada vez mais integrados aos sistemas principais de organizações. A construção desses sistemas apresenta diversos desafios, tanto do ponto de vista teórico quanto prático. Um dos principais desafios é a interação eficaz entre atores com diferentes formações que precisam trabalhar em conjunto, como engenheiros de software e cientistas de dados. Este trabalho apresenta três estudos distintos que investigam as dinâmicas de colaboração entre esses dois atores em projetos de aprendizado de máquina. Primeiramente, realizamos um estudo de caso exploratório com quatro profissionais com experiência em engenharia de software e ciência de dados de um grande projeto de sistema habilitado por aprendizado de máquina. Em nosso segundo estudo, realizamos entrevistas complementares com membros de duas equipes que trabalham em sistemas habilitados por aprendizado de máquina para obter mais percepções sobre como cientistas de dados e engenheiros de software compartilham responsabilidades e se comunicam. Por fim, nosso terceiro estudo consiste em um grupo focal onde validamos a relevância dessa colaboração durante várias tarefas relacionadas à sistemas habilitados por aprendizado de máquina e avaliamos recomendações que podem melhorar a interação entre os atores. Nossos estudos revelaram vários desafios que podem dificultar a colaboração entre engenheiros de software e cientistas de dados, incluindo diferenças de conhecimento técnico, definições pouco claras das funções de cada um, e a falta de documentos que apoiem a especificação do sistema habilitado por aprendizado de máquina. Possíveis soluções para enfrentar esses desafios incluem incentivar a comunicação na equipe, definir claramente responsabilidades, e produzir uma documentação concisa do sistema. Nossa pesquisa contribui para a compreensão da complexa dinâmica entre engenheiros de software e cientistas de dados em projetos de aprendizado de máquina e fornece recomendações para melhorar a colaboração e a comunicação nesse contexto. Incentivamos novos estudos que investiguem essa interação em outros projetos. / [en] In recent years, Machine Learning (ML) components have been increasingly integrated into the core systems of organizations. Engineering such systems presents various challenges from both a theoretical and practical perspective. One of the key challenges is the effective interaction between actors with different backgrounds who need to work closely together, such as software engineers and data scientists. This work presents three studies investigating the current interaction and collaboration dynamics between these two roles in ML projects. Our first study depicts an exploratory case study with four practitioners with experience in software engineering and data science of a large ML-enabled system project. In our second study, we performed complementary interviews with members of two teams working on ML-enabled systems to acquire more insights into how data scientists and software engineers share responsibilities and communicate. Finally, our third study consists of a focus group where we validated the relevance of this collaboration during multiple tasks related to ML-enabled systems and assessed recommendations that can foster the interaction between the actors. Our studies revealed several challenges that can hinder collaboration between software engineers and data scientists, including differences in technical expertise, unclear definitions of each role s duties, and the lack of documents that support the specification of the ML-enabled system. Potential solutions to address these challenges include encouraging team communication, clearly defining responsibilities, and producing concise system documentation. Our research contributes to understanding the complex dynamics between software engineers and data scientists in ML projects and provides insights for improving collaboration and communication in this context. We encourage future studies investigating this interaction in other projects. [pt] ENGENHARIA DE SOFTWARE [pt] APRENDIZADO DE MAQUINA [pt] COLABORACAO [pt] CIENCIA DE DADOS [en] SOFTWARE ENGINEERING [en] ML-ENABLED SYSTEM [en] MACHINE LEARNING [en] COLLABORATION [en] DATA SCIENCE
260	Using machine learning to identify the occurrence of changing air masses Bergfors, Anund January 2018 (has links) In the forecast data post-processing at the Swedish Meteorological and Hydrological Institute (SMHI) a regular Kalman filter is used to debias the two meter air temperature forecast of the physical models by controlling towards air temperature observations. The Kalman filter however diverges when encountering greater nonlinearities in shifting weather patterns, and can only be manually reset when a new air mass has stabilized itself within its operating region. This project aimed to automate this process by means of a machine learning approach. The methodology was at its base supervised learning, by first algorithmically labelling the air mass shift occurrences in the data, followed by training a logistic regression model. Observational data from the latest twenty years of the Uppsala automatic meteorological station was used for the analysis. A simple pipeline for loading, labelling, training on and visualizing the data was built. As a work in progress the operating regime was more of a semi-supervised one - which also in the long run could be a necessary and fruitful strategy. Conclusively the logistic regression appeared to be quite able to handle and infer from the dynamics of air temperatures - albeit non-robustly tested - being able to correctly classify 77% of the labelled data. This work was presented at Uppsala University in June 1st of 2018, and later in June 20th at SMHI. machine learning meteorology data science visualization time series pattern recognition logistic regression data mining statistics ai physics signal processing control theory automatic control systems theory information technology maskininlärning meteorologi data science visualisering tidsserie mönsterigenkänning logistisk regression datautvinning statistik ai fysik signalbehandling reglerteknik systemteknik informationsteknologi Engineering and Technology Teknik och teknologier

Search results