Spelling suggestions: "subject:"data bcience"" "subject:"data escience""
251 |
Lifetime and Degradation Science of Polymeric Encapsulant in Photovoltaic Systems: Investigating the Role of Ethylene Vinyl Acetate in Photovoltaic Module Performance Loss with Semi-gSEM AnalyticsWheeler, Nicholas Robert 08 February 2017 (has links)
No description available.
|
252 |
Temporal Change in the Power Production of Real-world Photovoltaic Systems Under Diverse Climatic ConditionsHu, Yang 08 February 2017 (has links)
No description available.
|
253 |
Node Centric Community Detection and Evolutional Prediction in Dynamic NetworksOluwafolake A Ayano (13161288) 27 July 2022 (has links)
<p> </p>
<p>Advances in technology have led to the availability of data from different platforms such as the web and social media platforms. Much of this data can be represented in the form of a network consisting of a set of nodes connected by edges. The nodes represent the items in the networks while the edges represent the interactions between the nodes. Community detection methods have been used extensively in analyzing these networks. However, community detection in evolving networks has been a significant challenge because of the frequent changes to the networks and the need for real-time analysis. Using Static community detection methods for analyzing dynamic networks will not be appropriate because static methods do not retain a network’s history and cannot provide real-time information about the communities in the network.</p>
<p>Existing incremental methods treat changes to the network as a sequence of edge additions and/or removals; however, in many real-world networks, changes occur when a node is added with all its edges connecting simultaneously. </p>
<p>For efficient processing of such large networks in a timely manner, there is a need for an adaptive analytical method that can process large networks without recomputing the entire network after its evolution and treat all the edges involved with a node equally. </p>
<p>We proposed a node-centric community detection method that incrementally updates the community structure in the network using the already known structure of the network to avoid recomputing the entire network from the scratch and consequently achieve a high-quality community structure. The results from our experiments suggest that our approach is efficient for incremental community detection of node-centric evolving networks. </p>
|
254 |
DATA INTEGRITY IN THE HEALTHCARE INDUSTRY: ANALYZING THE EFFECTIVENESS OF DATA SECURITY IN GOOD DATA AND RECORD MANAGEMENT PRACTICES (A CASE STUDY OF COMPUTERIZING THE COMPETENCE MATRIX FOR A QUALITY CONTROL DRUG LABORATORY)Marcel C Okezue (12522565) 06 October 2022 (has links)
<p> </p>
<p>This project analyzes the concept of time efficiency in the data management process associated with the personnel training and competence assessments in the quality control (QC) laboratory of Nigeria’s foods and drugs authority (NAFDAC). The laboratory administrators are encumbered with a lot of mental and paper-based record keeping because the personnel training data is managed manually. Consequently, the personnel training and competence assessments in the laboratory are not efficiently conducted. The Microsoft Excel spreadsheet provided by a Purdue doctoral dissertation as a remedial to this is found to be deficient in handling operations in database tables. As a result, hence doctoral dissertation did not appropriately address the inefficiencies.</p>
<p>The problem addressed by this study is the operational inefficiency that results from the manual or Excel-based personnel training data management process in the NAFDAC laboratory. The purpose, therefore, is to reduce the time it essentially takes to generate, obtain, manipulate, exchange, and securely store the personnel competence training and assessment data. To do this, the study developed a software system that is integrated with a relational database management system (RDBMS) to improve the manual/Microsoft Excel-based data management procedure. This project examines the operational (time) efficiencies in using manual or Excel-based format in comparison with the new system that this project developed, as a method to ascertain its validity.</p>
<p>The data used in this qualitative research is from literary sources and from simulating the distinction between the times spent in administering personnel training and competence assessment using the New system developed by this study and the Excel system by another project, respectively. The fundamental finding of this study is that the idea of improving the operational (time) efficiency in the personnel training and competence assessment process in the QC laboratory is valid. Doing that will reduce human errors, achieve enhanced time-efficient operation, and improve personnel training and competence assessment processes.</p>
<p>Recommendations are made as to the procedure the laboratory administrator must adopt to take advantage of the new system. The study also recommended the steps for the potential research to extend the capability of this project. </p>
|
255 |
Uncertainty-aware deep learning for prediction of remaining useful life of mechanical systemsCornelius, Samuel J 10 December 2021 (has links)
Remaining useful life (RUL) prediction is a problem that researchers in the prognostics and health management (PHM) community have been studying for decades. Both physics-based and data-driven methods have been investigated, and in recent years, deep learning has gained significant attention. When sufficiently large and diverse datasets are available, deep neural networks can achieve state-of-the-art performance in RUL prediction for a variety of systems. However, for end users to trust the results of these models, especially as they are integrated into safety-critical systems, RUL prediction uncertainty must be captured. This work explores an approach for estimating both epistemic and heteroscedastic aleatoric uncertainties that emerge in RUL prediction deep neural networks and demonstrates that quantifying the overall impact of these uncertainties on predictions reveal valuable insight into model performance. Additionally, a study is carried out to observe the effects of RUL truth data augmentation on perceived uncertainties in the model.
|
256 |
Causal Inference in the Face of Assumption ViolationsYuki Ohnishi (18423810) 26 April 2024 (has links)
<p dir="ltr">This dissertation advances the field of causal inference by developing methodologies in the face of assumption violations. Traditional causal inference methodologies hinge on a core set of assumptions, which are often violated in the complex landscape of modern experiments and observational studies. This dissertation proposes novel methodologies designed to address the challenges posed by single or multiple assumption violations. By applying these innovative approaches to real-world datasets, this research uncovers valuable insights that were previously inaccessible with existing methods. </p><p><br></p><p dir="ltr">First, three significant sources of complications in causal inference that are increasingly of interest are interference among individuals, nonadherence of individuals to their assigned treatments, and unintended missing outcomes. Interference exists if the outcome of an individual depends not only on its assigned treatment, but also on the assigned treatments for other units. It commonly arises when limited controls are placed on the interactions of individuals with one another during the course of an experiment. Treatment nonadherence frequently occurs in human subject experiments, as it can be unethical to force an individual to take their assigned treatment. Clinical trials, in particular, typically have subjects that do not adhere to their assigned treatments due to adverse side effects or intercurrent events. Missing values also commonly occur in clinical studies. For example, some patients may drop out of the study due to the side effects of the treatment. Failing to account for these considerations will generally yield unstable and biased inferences on treatment effects even in randomized experiments, but existing methodologies lack the ability to address all these challenges simultaneously. We propose a novel Bayesian methodology to fill this gap. </p><p><br></p><p dir="ltr">My subsequent research further addresses one of the limitations of the first project: a set of assumptions about interference structures that may be too restrictive in some practical settings. We introduce a concept of the ``degree of interference" (DoI), a latent variable capturing the interference structure. This concept allows for handling arbitrary, unknown interference structures to facilitate inference on causal estimands. </p><p><br></p><p dir="ltr">While randomized experiments offer a solid foundation for valid causal analysis, people are also interested in conducting causal inference using observational data due to the cost and difficulty of randomized experiments and the wide availability of observational data. Nonetheless, using observational data to infer causality requires us to rely on additional assumptions. A central assumption is that of \emph{ignorability}, which posits that the treatment is randomly assigned based on the variables (covariates) included in the dataset. While crucial, this assumption is often debatable, especially when treatments are assigned sequentially to optimize future outcomes. For instance, marketers typically adjust subsequent promotions based on responses to earlier ones and speculate on how customers might have reacted to alternative past promotions. This speculative behavior introduces latent confounders, which must be carefully addressed to prevent biased conclusions. </p><p dir="ltr">In the third project, we investigate these issues by studying sequences of promotional emails sent by a US retailer. We develop a novel Bayesian approach for causal inference from longitudinal observational data that accommodates noncompliance and latent sequential confounding. </p><p><br></p><p dir="ltr">Finally, we formulate the causal inference problem for the privatized data. In the era of digital expansion, the secure handling of sensitive data poses an intricate challenge that significantly influences research, policy-making, and technological innovation. As the collection of sensitive data becomes more widespread across academic, governmental, and corporate sectors, addressing the complex balance between making data accessible and safeguarding private information requires the development of sophisticated methods for analysis and reporting, which must include stringent privacy protections. Currently, the gold standard for maintaining this balance is Differential privacy. </p><p dir="ltr">Local differential privacy is a differential privacy paradigm in which individuals first apply a privacy mechanism to their data (often by adding noise) before transmitting the result to a curator. The noise for privacy results in additional bias and variance in their analyses. Thus, it is of great importance for analysts to incorporate the privacy noise into valid inference.</p><p dir="ltr">In this final project, we develop methodologies to infer causal effects from locally privatized data under randomized experiments. We present frequentist and Bayesian approaches and discuss the statistical properties of the estimators, such as consistency and optimality under various privacy scenarios.</p>
|
257 |
Three Essays on Analysis of U.S. Infant Mortality Using Systems and Data Science ApproachesEbrahimvandi, Alireza 02 January 2020 (has links)
High infant mortality (IM) rates in the U.S. have been a major public health concern for decades. Many studies have focused on understanding causes, risk factors, and interventions that can reduce IM. However, death of an infant is the result of the interplay between many risk factors, which in some cases can be traced to the infancy of their parents. Consequently, these complex interactions challenge the effectiveness of many interventions. The long-term goal of this study is to advance the common understanding of effective interventions for improving health outcomes and, in particular, infant mortality. To achieve this goal, I implemented systems and data science methods in three essays to contribute to the understanding of IM causes and risk factors.
In the first study, the goal was to identify patterns in the leading causes of infant mortality across states that successfully reduced their IM rates. I explore the trends at the state-level between 2000 and 2015 to identify patterns in the leading causes of IM. This study shows that the main drivers of IM rate reduction is the preterm-related mortality rate. The second study builds on these findings and investigates the risk factors of preterm birth (PTB) in the largest obstetric population that has ever been studied in this field. By applying the latest statistical and machine learning techniques, I study the PTB risk factors that are both generalizable and identifiable during the early stages of pregnancy. A major finding of this study is that socioeconomic factors such as parent education are more important than generally known factors such as race in the prediction of PTB. This finding is significant evidence for theories like Lifecourse, which postulate that the main determinants of a health trajectory are the social scaffolding that addresses the upstream roots of health. These results point to the need for more comprehensive approaches that change the focus from medical interventions during pregnancy to the time where mothers become vulnerable to the risk factors of PTB. Therefore, in the third study, I take an aggregate approach to study the dynamics of population health that results in undesirable outcomes in major indicators like infant mortality. Based on these new explanations, I offer a systematic approach that can help in addressing adverse birth outcomes—including high infant mortality and preterm birth rates—which is the central contribution of this dissertation.
In conclusion, this dissertation contributes to a better understanding of the complexities in infant mortality and health-related policies. This work contributes to the body of literature both in terms of the application of statistical and machine learning techniques, as well as in advancing health-related theories. / Doctor of Philosophy / The U.S. infant mortality rate (IMR) is 71% higher than the average rate for comparable countries in the Organization for Economic Co-operation and Development (OECD). High infant mortality and preterm birth rates (PBR) are major public health concerns in the U.S. A wide range of studies have focused on understanding the causes and risk factors of infant mortality and interventions that can reduce it. However, infant mortality is a complex phenomenon that challenges the effectiveness of the interventions, and the IMR and PBR in the U.S. are still higher than any other advanced OECD nation. I believe that systems and data science methods can help in enhancing our understanding of infant mortality causes, risk factors, and effective interventions.
There are more than 130 diagnoses—causes—for infant mortality. Therefore, for 50 states tracking the causes of infant mortality trends over a long time period is very challenging. In the first essay, I focus on the medical aspects of infant mortality to find the causes that helped the reduction of the infant mortality rates in certain states from 2000 to 2015. In addition, I investigate the relationship between different risk factors with infant mortality in a regression model to investigate and find significant correlations. This study provides critical recommendations to policymakers in states with high infant mortality rates and guides them on leveraging appropriate interventions.
Preterm birth (PTB) is the most significant contributor to the IMR. The first study showed that a reduction in infant mortality happened in states that reduced their preterm birth. There exists a considerable body of literature on identifying the PTB risk factors in order to find possible explanations for consistently high rates of PTB and IMR in the U.S. However, they have fallen short in two key areas: generalizability and being able to detect PTB in early pregnancy. In the second essay, I investigate a wide range of risk factors in the largest obstetric population that has ever been studied in PTB research. The predictors in this study consist of a wide range of variables from environmental (e.g., air pollution) to medical (e.g., history of hypertension) factors. Our objective is to increase the understanding of factors that are both generalizable and identifiable during the early stage of pregnancy. I implemented state-of-the-art statistical and machine learning techniques and improved the performance measures compared to the previous studies. The results of this study reveal the importance of socioeconomic factors such as, parent education, which can be as important as biomedical indicators like the mother's body mass index in predicting preterm delivery.
The second study showed an important relationship between socioeconomic factors such as, education and major health outcomes such as preterm birth. Short-term interventions that focus on improving the socioeconomic status of a mother during pregnancy have limited to no effect on birth outcomes. Therefore, we need to implement more comprehensive approaches and change the focus from medical interventions during pregnancy to the time where mothers become vulnerable to the risk factors of PTB. Hence, we use a systematic approach in the third study to explore the dynamics of health over time. This is a novel study, which enhances our understanding of the complex interactions between health and socioeconomic factors over time. I explore why some communities experience the downward spiral of health deterioration, how resources are generated and allocated, how the generation and allocation mechanisms are interconnected, and why we can see significantly different health outcomes across otherwise similar states. I use Ohio as the case study, because it suffers from poor health outcomes despite having one of the best healthcare systems in the nation. The results identify the trap of health expenditure and how an external financial shock can exacerbate health and socioeconomic factors in such a community. I demonstrate how overspending or underspending in healthcare can affect health outcomes in a society in the long-term.
Overall, this dissertation contributes to a better understanding of the complexities associated with major health issues of the U.S. I provide health professionals with theoretical and empirical foundations of risk assessment for reducing infant mortality and preterm birth. In addition, this study provides a systematic perspective on the issue of health deterioration that many communities in the US are experiencing, and hope that this perspective improves policymakers' decision-making.
|
258 |
Deep Learning One-Class Classification With Support Vector MethodsHampton, Hayden D 01 January 2024 (has links) (PDF)
Through the specialized lens of one-class classification, anomalies–irregular observations that uncharacteristically diverge from normative data patterns–are comprehensively studied. This dissertation focuses on advancing boundary-based methods in one-class classification, a critical approach to anomaly detection. These methodologies delineate optimal decision boundaries, thereby facilitating a distinct separation between normal and anomalous observations. Encompassing traditional approaches such as One-Class Support Vector Machine and Support Vector Data Description, recent adaptations in deep learning offer a rich ground for innovation in anomaly detection. This dissertation proposes three novel deep learning methods for one-class classification, aiming to enhance the efficacy and accuracy of anomaly detection in an era where data volume and complexity present unprecedented challenges. The first two methods are designed for tabular data from a least squares perspective. Formulating these optimization problems within a least squares framework offers notable advantages. It facilitates the derivation of closed-form solutions for critical gradients that largely influence the optimization procedure. Moreover, this approach circumvents the prevalent issue of degenerate or uninformative solutions, a challenge often associated with these types of deep learning algorithms. The third method is designed for second-order tensors. This proposed method has certain computational advantages and alleviates the need for vectorization, which can lead to structural information loss when spatial or contextual relationships exist in the data structure. The performance of the three proposed methods are demonstrated with simulation studies and real-world datasets. Compared to kernel-based one-class classification methods, the proposed deep learning methods achieve significantly better performance under the settings considered.
|
259 |
[en] ON THE INTERACTION BETWEEN SOFTWARE ENGINEERS AND DATA SCIENTISTS WHEN BUILDING MACHINE LEARNING-ENABLED SYSTEMS / [pt] SOBRE A INTERAÇÃO ENTRE ENGENHEIROS DE SOFTWARE E CIENTISTAS DE DADOS CONSTRUINDO SISTEMAS HABILITADOS POR APRENDIZADO DE MÁQUINAGABRIEL DE ANDRADE BUSQUIM 18 June 2024 (has links)
[pt] Nos últimos anos, componentes de aprendizado de máquina têm sido cada
vez mais integrados aos sistemas principais de organizações. A construção desses sistemas apresenta diversos desafios, tanto do ponto de vista teórico quanto
prático. Um dos principais desafios é a interação eficaz entre atores com diferentes formações que precisam trabalhar em conjunto, como engenheiros de
software e cientistas de dados. Este trabalho apresenta três estudos distintos
que investigam as dinâmicas de colaboração entre esses dois atores em projetos
de aprendizado de máquina. Primeiramente, realizamos um estudo de caso exploratório com quatro profissionais com experiência em engenharia de software
e ciência de dados de um grande projeto de sistema habilitado por aprendizado
de máquina. Em nosso segundo estudo, realizamos entrevistas complementares com membros de duas equipes que trabalham em sistemas habilitados por
aprendizado de máquina para obter mais percepções sobre como cientistas de
dados e engenheiros de software compartilham responsabilidades e se comunicam. Por fim, nosso terceiro estudo consiste em um grupo focal onde validamos
a relevância dessa colaboração durante várias tarefas relacionadas à sistemas
habilitados por aprendizado de máquina e avaliamos recomendações que podem melhorar a interação entre os atores. Nossos estudos revelaram vários
desafios que podem dificultar a colaboração entre engenheiros de software e
cientistas de dados, incluindo diferenças de conhecimento técnico, definições
pouco claras das funções de cada um, e a falta de documentos que apoiem
a especificação do sistema habilitado por aprendizado de máquina. Possíveis
soluções para enfrentar esses desafios incluem incentivar a comunicação na
equipe, definir claramente responsabilidades, e produzir uma documentação
concisa do sistema. Nossa pesquisa contribui para a compreensão da complexa
dinâmica entre engenheiros de software e cientistas de dados em projetos de
aprendizado de máquina e fornece recomendações para melhorar a colaboração
e a comunicação nesse contexto. Incentivamos novos estudos que investiguem
essa interação em outros projetos. / [en] In recent years, Machine Learning (ML) components have been increasingly integrated into the core systems of organizations. Engineering such systems
presents various challenges from both a theoretical and practical perspective.
One of the key challenges is the effective interaction between actors with different backgrounds who need to work closely together, such as software engineers
and data scientists. This work presents three studies investigating the current
interaction and collaboration dynamics between these two roles in ML projects. Our first study depicts an exploratory case study with four practitioners
with experience in software engineering and data science of a large ML-enabled
system project. In our second study, we performed complementary interviews
with members of two teams working on ML-enabled systems to acquire more
insights into how data scientists and software engineers share responsibilities
and communicate. Finally, our third study consists of a focus group where we
validated the relevance of this collaboration during multiple tasks related to
ML-enabled systems and assessed recommendations that can foster the interaction between the actors. Our studies revealed several challenges that can
hinder collaboration between software engineers and data scientists, including
differences in technical expertise, unclear definitions of each role s duties, and
the lack of documents that support the specification of the ML-enabled system. Potential solutions to address these challenges include encouraging team
communication, clearly defining responsibilities, and producing concise system
documentation. Our research contributes to understanding the complex dynamics between software engineers and data scientists in ML projects and provides insights for improving collaboration and communication in this context.
We encourage future studies investigating this interaction in other projects.
|
260 |
Using machine learning to identify the occurrence of changing air massesBergfors, Anund January 2018 (has links)
In the forecast data post-processing at the Swedish Meteorological and Hydrological Institute (SMHI) a regular Kalman filter is used to debias the two meter air temperature forecast of the physical models by controlling towards air temperature observations. The Kalman filter however diverges when encountering greater nonlinearities in shifting weather patterns, and can only be manually reset when a new air mass has stabilized itself within its operating region. This project aimed to automate this process by means of a machine learning approach. The methodology was at its base supervised learning, by first algorithmically labelling the air mass shift occurrences in the data, followed by training a logistic regression model. Observational data from the latest twenty years of the Uppsala automatic meteorological station was used for the analysis. A simple pipeline for loading, labelling, training on and visualizing the data was built. As a work in progress the operating regime was more of a semi-supervised one - which also in the long run could be a necessary and fruitful strategy. Conclusively the logistic regression appeared to be quite able to handle and infer from the dynamics of air temperatures - albeit non-robustly tested - being able to correctly classify 77% of the labelled data. This work was presented at Uppsala University in June 1st of 2018, and later in June 20th at SMHI.
|
Page generated in 0.0944 seconds