Global ETD Search

21	<b>Explaining Generative Adversarial Network Time Series Anomaly Detection using Shapley Additive Explanations</b> Cher Simon (18324174) 10 July 2024 (has links) <p dir="ltr">Anomaly detection is an active research field that widely applies to commercial applications to detect unusual patterns or outliers. Time series anomaly detection provides valuable insights into mission and safety-critical applications using ever-growing temporal data, including continuous streaming time series data from the Internet of Things (IoT), sensor networks, healthcare, stock prices, computer metrics, and application monitoring. While Generative Adversarial Networks (GANs) demonstrate promising results in time series anomaly detection, the opaque nature of generative deep learning models lacks explainability and hinders broader adoption. Understanding the rationale behind model predictions and providing human-interpretable explanations are vital for increasing confidence and trust in machine learning (ML) frameworks such as GANs. This study conducted a structured and comprehensive assessment of post-hoc local explainability in GAN-based time series anomaly detection using SHapley Additive exPlanations (SHAP). Using publicly available benchmarking datasets approved by Purdue’s Institutional Review Board (IRB), this study evaluated state-of-the-art GAN frameworks identifying their advantages and limitations for time series anomaly detection. This study demonstrated a systematic approach in quantifying the extent of GAN-based time series anomaly explainability, providing insights for businesses when considering adopting generative deep learning models. The presented results show that GANs capture complex time series temporal distribution and are applicable for anomaly detection. The analysis from this study shows SHAP can identify the significance of contributing features within time series data and derive post-hoc explanations to quantify GAN-detected time series anomalies.</p> Time-series analysis Data engineering and data science Adversarial machine learning Deep learning Generative Adversarial Networks (GANs) SHapley Additive exPlanations (SHAP) Time Series Anomaly Detection Explainability
22	Privacy preserving software engineering for data driven development Tongay, Karan Naresh 14 December 2020 (has links) The exponential rise in the generation of data has introduced many new areas of research including data science, data engineering, machine learning, artificial in- telligence to name a few. It has become important for any industry or organization to precisely understand and analyze the data in order to extract value out of the data. The value of the data can only be realized when it is put into practice in the real world and the most common approach to do this in the technology industry is through software engineering. This brings into picture the area of privacy oriented software engineering and thus there is a rise of data protection regulation acts such as GDPR (General Data Protection Regulation), PDPA (Personal Data Protection Act), etc. Many organizations, governments and companies who have accumulated huge amounts of data over time may conveniently use the data for increasing business value but at the same time the privacy aspects associated with the sensitivity of data especially in terms of personal information of the people can easily be circumvented while designing a software engineering model for these types of applications. Even before the software engineering phase for any data processing application, often times there can be one or many data sharing agreements or privacy policies in place. Every organization may have their own way of maintaining data privacy practices for data driven development. There is a need to generalize or categorize their approaches into tactics which could be referred by other practitioners who are trying to integrate data privacy practices into their development. This qualitative study provides an understanding of various approaches and tactics that are being practised within the industry for privacy preserving data science in software engineering, and discusses a tool for data usage monitoring to identify unethical data access. Finally, we studied strategies for secure data publishing and conducted experiments using sample data to demonstrate how these techniques can be helpful for securing private data before publishing. / Graduate Data Privacy Privacy Data Engineering Software Engineering Data Driven Developers Data Science Privacy Preserving Data Driven Development Machine Learning One class SVM Data Usage Monitoring Health data k-anonymity l-diversity differential privacy Information management Secure data sharing Survey Audits and access control Data Privacy Tactics
23	Multi-fidelity Machine Learning for Perovskite Band Gap Predictions Panayotis Thalis Manganaris (16384500) 16 June 2023 (has links) <p>A wide range of optoelectronic applications demand semiconductors optimized for purpose.</p> <p>My research focused on data-driven identification of ABX3 Halide perovskite compositions for optimum photovoltaic absorption in solar cells.</p> <p>I trained machine learning models on previously reported datasets of halide perovskite band gaps based on first principles computations performed at different fidelities.</p> <p>Using these, I identified mixtures of candidate constituents at the A, B or X sites of the perovskite supercell which leveraged how mixed perovskite band gaps deviate from the linear interpolations predicted by Vegard's law of mixing to obtain a selection of stable perovskites with band gaps in the ideal range of 1 to 2 eV for visible light spectrum absorption.</p> <p>These models predict the perovskite band gap using the composition and inherent elemental properties as descriptors.</p> <p>This enables accurate, high fidelity prediction and screening of the much larger chemical space from which the data samples were drawn.</p> <p><br></p> <p>I utilized a recently published density functional theory (DFT) dataset of more than 1300 perovskite band gaps from four different levels of theory, added to an experimental perovskite band gap dataset of \textasciitilde{}100 points, to train random forest regression (RFR), Gaussian process regression (GPR), and Sure Independence Screening and Sparsifying Operator (SISSO) regression models, with data fidelity added as one-hot encoded features.</p> <p>I found that RFR yields the best model with a band gap root mean square error of 0.12 eV on the total dataset and 0.15 eV on the experimental points.</p> <p>SISSO provided compound features and functions for direct prediction of band gap, but errors were larger than from RFR and GPR.</p> <p>Additional insights gained from Pearson correlation and Shapley additive explanation (SHAP) analysis of learned descriptors suggest the RFR models performed best because of (a) their focus on identifying and capturing relevant feature interactions and (b) their flexibility to represent nonlinear relationships between such interactions and the band gap.</p> <p>The best model was deployed for predicting experimental band gap of 37785 hypothetical compounds.</p> <p>Based on this, we identified 1251 stable compounds with band gap predicted to be between 1 and 2 eV at experimental accuracy, successfully narrowing the candidates to about 3% of the screened compositions.</p> Compound semiconductors Organic semiconductors Data engineering and data science halide perovskites band gap feature extraction and representation SISSO random forest regression analysis Gaussian Process Regression Analysis SHapley Additive exPlanations (SHAP) combinatorial datasets data augmentation method Lead Free Perovskite Solar Cells Density Functional Theory (DFT) multi-fidelity data multi-task learning (MTL)
24	MEDICAL EXPERT SYSTEM FOR AXIAL SPONDYLOARTHIRITIS Laraib Fatima (19204162) 28 July 2024 (has links) <p dir="ltr">Axial spondyloarthritis (axSpA), a disease that due to its complexity and rarity, presents challenges in diagnosis. With a focus on integrating expert knowledge into an intelligent diagnostic system, the research explores the intricate nature of axSpA, emphasizing the challenges associated with its diverse clinical presentation. By leveraging a comprehensive knowledge base curated by domain experts, encompassing insights into pathophysiology, genetic factors, and evolving diagnostic criteria of axSpA, the expert system strives to provide a nuanced understanding of the disease. The methodology employs a hybrid reasoning approach, combining both forward and backward chaining techniques. Forward chaining iteratively processes clinical data and available evidence, applying logical rules to infer potential diagnoses and refine hypotheses, while backward chaining starts with the desired diagnostic goal and works backward through the knowledge base to validate or refute hypotheses. Additionally, certainty theory is incorporated to manage uncertainty in the diagnostic process, assigning confidence levels to conclusions based on the strength of evidence and expert knowledge. By integrating knowledge base, forward and backward chaining, and certainty theory, the research aims to enhance diagnostic precision for this less common yet impactful inflammatory rheumatic condition, emphasizing the importance of early and accurate identification for effective management and improved patient outcomes. The results indicate a significant improvement in diagnostic accuracy, sensitivity, and specificity compared to traditional methods. The system's potential to enhance early diagnosis and treatment outcomes is discussed, along with suggestions for future research to further refine and expand the system.</p> Rheumatology and arthritis Health systems Knowledge representation and reasoning Data engineering and data science Medical expert systems Expert systems (Computer science). decision support system development Expert system development Expert System Methodology Axial spondyloarthritis (D000089183) Hybrid Reasoning Diagnostic Systems Rheumatology Certainty Theory Uncertainty Management Forward chaining inference engine Backward Chaining
25	CyberWater: An open framework for data and model integration Ranran Chen (18423792) 03 June 2024 (has links) <p dir="ltr">Workflow management systems (WMSs) are commonly used to organize/automate sequences of tasks as workflows to accelerate scientific discoveries. During complex workflow modeling, a local interactive workflow environment is desirable, as users usually rely on their rich, local environments for fast prototyping and refinements before they consider using more powerful computing resources.</p><p dir="ltr">This dissertation delves into the innovative development of the CyberWater framework based on Workflow Management Systems (WMSs). Against the backdrop of data-intensive and complex models, CyberWater exemplifies the transition of intricate data into insightful and actionable knowledge and introduces the nuanced architecture of CyberWater, particularly focusing on its adaptation and enhancement from the VisTrails system. It highlights the significance of control and data flow mechanisms and the introduction of new data formats for effective data processing within the CyberWater framework.</p><p dir="ltr">This study presents an in-depth analysis of the design and implementation of Generic Model Agent Toolkits. The discussion centers on template-based component mechanisms and the integration with popular platforms, while emphasizing the toolkit’s ability to facilitate on-demand access to High-Performance Computing resources for large-scale data handling. Besides, the development of an asynchronously controlled workflow within CyberWater is also explored. This innovative approach enhances computational performance by optimizing pipeline-level parallelism and allows for on-demand submissions of HPC jobs, significantly improving the efficiency of data processing.</p><p dir="ltr">A comprehensive methodology for model-driven development and Python code integration within the CyberWater framework and innovative applications of GPT models for automated data retrieval are introduced in this research as well. It examines the implementation of Git Actions for system automation in data retrieval processes and discusses the transformation of raw data into a compatible format, enhancing the adaptability and reliability of the data retrieval component in the adaptive generic model agent toolkit component.</p><p dir="ltr">For the development and maintenance of software within the CyberWater framework, the use of tools like GitHub for version control and outlining automated processes has been applied for software updates and error reporting. Except that, the user data collection also emphasizes the role of the CyberWater Server in these processes.</p><p dir="ltr">In conclusion, this dissertation presents our comprehensive work on the CyberWater framework's advancements, setting new standards in scientific workflow management and demonstrating how technological innovation can significantly elevate the process of scientific discovery.</p> Data engineering and data science Database systems Information retrieval and web search Automated software engineering Programming languages Requirements engineering Software architecture Software quality, processes and metrics Concurrency theory High performance computing techniques Data Integration Open Framework Workflow Management System Model Integration Software Maintenance Management
26	<strong>TOWARDS A TRANSDISCIPLINARY CYBER FORENSICS GEO-CONTEXTUALIZATION FRAMEWORK</strong> Mohammad Meraj Mirza (16635918) 04 August 2023 (has links) <p>Technological advances have a profound impact on people and the world in which they live. People use a wide range of smart devices, such as the Internet of Things (IoT), smartphones, and wearable devices, on a regular basis, all of which store and use location data. With this explosion of technology, these devices have been playing an essential role in digital forensics and crime investigations. Digital forensic professionals have become more able to acquire and assess various types of data and locations; therefore, location data has become essential for responders, practitioners, and digital investigators dealing with digital forensic cases that rely heavily on digital devices that collect data about their users. It is very beneficial and critical when performing any digital/cyber forensic investigation to consider answering the six Ws questions (i.e., who, what, when, where, why, and how) by using location data recovered from digital devices, such as where the suspect was at the time of the crime or the deviant act. Therefore, they could convict a suspect or help prove their innocence. However, many digital forensic standards, guidelines, tools, and even the National Institute of Standards and Technology (NIST) Cyber Security Personnel Framework (NICE) lack full coverage of what location data can be, how to use such data effectively, and how to perform spatial analysis. Although current digital forensic frameworks recognize the importance of location data, only a limited number of data sources (e.g., GPS) are considered sources of location in these digital forensic frameworks. Moreover, most digital forensic frameworks and tools have yet to introduce geo-contextualization techniques and spatial analysis into the digital forensic process, which may aid digital forensic investigations and provide more information for decision-making. As a result, significant gaps in the digital forensics community are still influenced by a lack of understanding of how to properly curate geodata. Therefore, this research was conducted to develop a transdisciplinary framework to deal with the limitations of previous work and explore opportunities to deal with geodata recovered from digital evidence by improving the way of maintaining geodata and getting the best value from them using an iPhone case study. The findings of this study demonstrated the potential value of geodata in digital disciplinary investigations when using the created transdisciplinary framework. Moreover, the findings discuss the implications for digital spatial analytical techniques and multi-intelligence domains, including location intelligence and open-source intelligence, that aid investigators and generate an exceptional understanding of device users' spatial, temporal, and spatial-temporal patterns.</p> Spatial data and applications Knowledge representation and reasoning Digital forensics Data engineering and data science Knowledge and information management Digital curation and preservation Cyber Crime Cybersecurity Cyber Forensics DFIR Digital Forensics Incident Response Threat Intelligence Networking Mobile Forensics iOS Forensics Intelligence Open-source Intelligence (OSINT) Location Intelligence GIS Spatial Analysis Spatiotemporal Analysis UAV Forensics GeoDatabase Geodata
27	EXPLORING GRAPH NEURAL NETWORKS FOR CLUSTERING AND CLASSIFICATION Fattah Muhammad Tahabi (14160375) 03 February 2023 (has links) <p><strong>Graph Neural Networks</strong> (GNNs) have become excessively popular and prominent deep learning techniques to analyze structural graph data for their ability to solve complex real-world problems. Because graphs provide an efficient approach to contriving abstract hypothetical concepts, modern research overcomes the limitations of classical graph theory, requiring prior knowledge of the graph structure before employing traditional algorithms. GNNs, an impressive framework for representation learning of graphs, have already produced many state-of-the-art techniques to solve node classification, link prediction, and graph classification tasks. GNNs can learn meaningful representations of graphs incorporating topological structure, node attributes, and neighborhood aggregation to solve supervised, semi-supervised, and unsupervised graph-based problems. In this study, the usefulness of GNNs has been analyzed primarily from two aspects - <strong>clustering and classification</strong>. We focus on these two techniques, as they are the most popular strategies in data mining to discern collected data and employ predictive analysis.</p> Biomechanical engineering Neural engineering Health promotion Preventative health care Applications in health Spatial data and applications Evolutionary computation Natural language processing Planning and decision making Data engineering and data science Data mining and knowledge discovery Graph, social and multimedia data Information retrieval and web search Knowledge and information management Context learning Deep learning Neural networks Semi- and unsupervised learning Data structures and algorithms Graph neural network Node classification Graph clustering Temporal graphs dynamic graphs NODE2VEC Graph Attention Mechanism Hunting BiLSTM model EHR data colorectal Cancer Cancers Cancer symptoms symptom Symptom cluster studies Coauthorship networks network analysis Word2vec Hierarchical Clustering method Dunn index semantic analysis text mining Natural Language Processing Tool UMLS identifiers umls Clinical Data Management
28	Computational Methods for Renewable Energies: A Multi-Scale Perspective Diego Renan Aguilar Alfaro (19195102) 23 July 2024 (has links) <p dir="ltr">The urgent global shift towards decarbonization necessitates the development of robust frameworks to navigate the complex technological, financial, and regulatory challenges emerging in the clean energy transition. Furthermore, the increased adoption of renewable energy sources (RES) is correlated to the exponential growth in weather data research over the last few years. This circular relationship, where big data drives renewable growth, which feeds back the data pipeline, serves as the primary focus of this study: the development of computational tools across diverse spatial and temporal scales for the optimal design and operation of renewable energy-based systems. Two scales are considered, differentiated by their primary objectives and techniques used. </p><p dir="ltr"> In the first one, the integration of probabilistic forecasts into the operations of RES microgrids (MGs) is studied in detail. It is revealed that longer scheduling horizons can reduce dispatch costs but at the expense of forecast accuracy due to increased prediction accuracy decay (PAD). To address this, a novel method that determines how to split the time horizon into timeblocks to minimize dispatch costs and maximize forecast accuracy is proposed. This forms the basis of an optimal rolling horizon strategy (ORoHS) which schedules distributed energy resources over varying prediction/execution horizons. Results offer Pareto-optimal fronts, showing the trade-offs between cost and accuracy at varying confidence levels. Solar power proved more cost-effective than wind power due to lower variability, despite wind’s higher energy output. The ORoHS strategy outperformed common scheduling methods. In the case study, it achieved a cost of \$4.68 compared to \$9.89 (greedy policy) and \$9.37 (two-hour RoHS). The second study proposes the Caribbean Energy Corridor (CEC) project, a novel, ambitious initiative that aims to achieve total grid connectivity between the Caribbean islands. The analysis makes use of thorough data procedures and optimization methods for the resource assessment and design tasks needed to build such an infrastructure. Renewable energy potentials are quantified under different temporal and spatial coverages to maximize usage. Prioritizing offshore wind development, the CEC’s could significantly surpass anticipated growth in energy demand, with an estimated installed capacity of 34 GW of clean energy upon completion. The corridor is modeled as an HVDC grid with 32 nodes and 31 links. Underwater transmission is optimized with a Submarine-Cable-Dynamic-Programming (SCDP) algorithm that determines the best routes across the bathymetry of the region. It is found that the levelized cost of electricity remains on the low end at \$0.11/kWh, despite high initial capital investments. Projected savings reach \$ 100 billion when compared with ”business-as-usual” scenarios and the current social cost of carbon. Furthermore, this infrastructure has the potential to create around 50,000 jobs in construction, policy, and research within the coming decades, while simultaneously establishing a robust and sustainable energy-water nexus in the region. Finally, the broader implications of these works are explored, highlighting their potential to address global challenges such as energy accessibility, prosperity in conflict zones, and sharing these discoveries with the upcoming generations.</p> Electrical energy storage Photovoltaic power systems Evolutionary computation Modelling and simulation Planning and decision making Satisfiability and optimisation Data engineering and data science Neural networks Reinforcement learning Renewable Energies Optimization Power Systems Microgrids Energy Corridors Machine Learning Artificial Intelligence Robust Optimization Genetic Algorithm Mixed Integer Linear Programming Multi-objective Optimization Economic Dispatch Unit Commitment Wind Energy Solar Energy Caribbean Energy Corridor HVDC Submarine Power Transmission Dynamic Programming Wandering Salesman Problem Big Data Offshore Wind Battery Energy Storage System

Search results