Global ETD Search

141	Construction of a machine learning training pipeline for merging AIS data with external datasources / Utveckling av en ML-pipeline för att kombinera AIS-data medexterna datakällor i träningsprocessen Yahya, Sami Said January 2022 (has links) Machine learning methods are increasingly being used in the maritime domain to predict traffic anomalies and to mitigate risk, for example avoiding collision and groundingaccidents. However, most machine learning systems used for detecting such issues hasbeen trained predominately on single data sources such as vessel positioning data. Hence,it is desirable to support the means to combine different sources of data - in the trainingphase - to allow more complex models to be built. In this thesis, we propose a multi-data pipeline for accumulating, decoding, preprocessing, and merging Automatic Identification System (AIS) data with weather datato train time series based deep learning models. The pipeline comprises several REST APIsto connect and listen to the data sources, and storing and merging them using StructuredQuery Language (SQL). Specifically, the training pipeline consists of an AIS NMEA message decoder, weather data receiver, and a Postgres database for merging and storing thedata sources. Moreover, the pipeline was assessed by training a TensorFlow vRNN model.The proposed pipeline approach allows flexibility in the inclusion of new data sources toeffectively build models for the maritime domain as well as other traffic domains that usespositioning data. AI Machine Learning Deep learning AIS anomaly detection RNN Maritime Traffic Computer Sciences Datavetenskap (datalogi)
142	Anomaly Detection in Multi-Seasonal Time Series Data Williams, Ashton Taylor 05 June 2023 (has links) No description available. Computer Science Information Science anomaly detection moving average multiple seasonalities multi-SARIMA time series data SARIMA
143	Machine learning-based performance analytics for high-performance computing systems Aksar, Burak 17 January 2024 (has links) High-performance Computing (HPC) systems play pivotal roles in societal and scientific advancements, executing up to quintillions of calculations every second. As we shift towards exascale computing and beyond, modern HPC systems emphasize resource sharing, where various applications share processors, memory, networks, and other components. While this sharing enhances power efficiency, it complicates performance prediction and introduces significant variations in application running times, affecting overall system efficiency and operational costs. HPC systems utilize monitoring frameworks that gather numerical telemetry data on resource usage to track operational status. Given the massive complexity and volume of this data, manual analysis is often daunting and inefficient. Machine learning (ML) techniques offer automated performance anomaly diagnosis, but the transition from successful research outcomes to production-scale deployment encounters two critical obstacles. First, the scarcity of labeled training data (i.e., identifying healthy and anomalous runs) in telemetry datasets makes it hard to train these ML systems effectively. Second, runtime analysis, required for providing timely detection and diagnosis of performance anomalies, demands seamless integration of ML-based methods with the monitoring frameworks. This thesis claims that ML-based performance analytics frameworks that leverage a limited amount of labeled data and ensure runtime analysis can achieve sufficient anomaly diagnosis performance for production HPC systems. To support this claim, we undertake ML-based performance analytics on two fronts. First, we design and develop novel frameworks for anomaly diagnosis that leverage semi-supervised or unsupervised learning techniques to reduce the need for extensive labeled data. Second, we design a simple yet adaptable architecture to enable deployment and demonstrate that these frameworks are feasible for runtime analysis. This thesis makes the following specific contributions: First, we design a semi-supervised anomaly diagnosis framework, Proctor, which operates with hundreds of labeled samples (in contrast to tens of thousands) and a vast number of unlabeled samples. We show that Proctor outperforms the fully supervised baseline by up to 11% in F1-score for diagnosing anomalies when there are approximately 30 labeled samples. We then reframe the problem and introduce ALBADRoss to determine which samples should be labeled by experts to maximize the model performance using active learning. On a production HPC dataset, ALBADRoss achieves a 0.95 F1-score (the same score that a fully-supervised framework achieved) and near-zero false alarm rate using 24x fewer labeled samples. Finally, with Prodigy, we solve the anomaly detection problem but with a focus on deployment. Prodigy is designed for detecting performance anomalies on compute nodes using unsupervised learning. Our framework achieves a 0.95 F1-score in detecting anomalies on a production HPC system telemetry dataset. We also design a simple and adaptable software architecture and deploy it on a 1488-node production HPC system, detecting real-world performance anomalies with 88% accuracy. Computer engineering Anomaly detection Artificial intelligence High-performance computing Large-scale computing systems Machine learning
144	Some new anomaly detection methods with applications to financial data Zhao, Zhicong 06 August 2021 (has links) Novel clustering methods are presented and applied to financial data. First, a scan-statistics method for detecting price point clusters in financial transaction data is considered. The method is applied to Electronic Business Transfer (EBT) transaction data of the Supplemental Nutrition Assistance Program (SNAP). For a given vendor, transaction amounts are fit via maximum likelihood estimation which are then converted to the unit interval via a natural copula transformation. Next, a new Markov type relation for order statistics on the unit interval is developed. The relation is used to characterize the distribution of the minimum exceedance of all copula transformed transaction amounts above an observed order statistic. Conditional on observed order statistics, independent and asymptotically identical indicator functions are constructed and the success probably as a function of the gaps in consecutive order statistics is specified. The success probabilities are shown to be a function of the hazard rate of the transformed transaction distribution. If gaps are smaller than expected, then the corresponding indicator functions are more likely to be one. A scan statistic is then applied to the sequence of indicator functions to detect locations where too many gaps are smaller than expected. These sets of gaps are then flagged as being anomalous price point clusters. It is noted that prominent price point clusters appearing in the data may be a historical vestige of previous versions of the SNAP program involving outdated paper "food stamps". The second part of the project develops a novel clustering method whereby the time series of daily total EBT transaction amounts are clustered by periodicity. The schemeworks by normalizing the time series of daily total transaction amounts for two distinct vendors and taking daily differences in those two series. The difference series is then examined for periodicity via a novel F statistic. We find one may cluster the monthly periodicities of vendors by type of store using the F statistic, a proxy for a distance metric. This may indicate that spending preferences for SNAP benefit recipients varies by day of the month, however, this opens further questions about potential forcing mechanisms and the apparent changing appetites for spending. Anomaly Detection Time Series Clustering Scan Statistics
145	Identifying the Impact of Noise on Anomaly Detection through Functional Near-Infrared Spectroscopy (fNIRS) and Eye-tracking Gabbard, Ryan Dwight 11 August 2017 (has links) No description available. Neurosciences Biomedical Engineering functional near-infrared spectroscopy workload prefrontal cortex eye tracking noise anomaly detection
146	Performance of One-class Support Vector Machine (SVM) in Detection of Anomalies in the Bridge Data Dalvi, Aditi January 2017 (has links) No description available. Electrical Engineering Support Vector Machine unsupervised Structural Health Monitoring anomaly detection bridge in-construction one-class SVM
147	Anomaly Detection and Microstructure Characterization in Fiber Reinforced Ceramic Matrix Composites Bricker, Stephen January 2015 (has links) No description available. Electrical Engineering Materials Science anomaly detection fiber tracking ceramic matrix composites microstructure charaterization gaussian mixture modeling
148	Approaches to Abnormality Detection with Constraints Otey, Matthew Eric 12 September 2006 (has links) No description available. Computer Science abnormality detection anomaly detection signature detection outlier detection data mining
149	Topology-aware Correlated Network Anomaly Detection and Diagnosis Dhanapalan, Manojprasadh 19 July 2012 (has links) No description available. Computer Engineering Computer Science anomaly detection network-wide topology-aware perfSONAR drill-down correlation network management
150	Software Performance Anomaly Detection Through Analysis Of Test Data By Multivariate Techniques Salahshour Torshizi, Sara January 2022 (has links) This thesis aims to uncover anomalies in the data describing the performance behavior of a "robot controller" as measured by software metrics. The purpose of analyzing data is mainly to identify the changes that have resulted in different performance behaviors which we refer to as performance anomalies. To address this issue, two separate pre-processing approaches have been developed: one that adds the principal component to the data after cleaning steps and another that does not regard the principal component. Next, Isolation Forest is employed, which uses an ensemble of isolation trees for data points to segregate anomalies and generate scores that can be used to discover anomalies. Further, in order to detect anomalies, the highest distances matching cluster centroids are employed in the clustering procedure. These two data preparation methods, along with two anomaly detection algorithms, identified software builds that are very likely to be anomalies. According to an industrial evaluation conducted based on engineers’ domain knowledge, around 70% of the detected software builds as anomalous builds were successfully identified, indicating system variable deviations or software bugs. Software performance Software builds Anomaly detection Test results Isolation Forest Probability Theory and Statistics Sannolikhetsteori och statistik

Search results