Global ETD Search

31	A visual training based approach to surface inspection Niskanen, M. (Matti) 18 June 2003 (has links) Abstract Training a visual inspection device is not straightforward but suffers from the high variation in material to be inspected. This variation causes major difficulties for a human, and this is directly reflected in classifier training. Many inspection devices utilize rule-based classifiers the building and training of which rely mainly on human expertise. While designing such a classifier, a human tries to find the questions that would provide proper categorization. In training, an operator tunes the classifier parameters, aiming to achieve as good classification accuracy as possible. Such classifiers require lot of time and expertise before they can be fully utilized. Supervised classifiers form another common category. These learn automatically from training material, but rely on labels that a human has set for it. However, these labels tend to be inconsistent and thus reduce the classification accuracy achieved. Furthermore, as class boundaries are learnt from training samples, they cannot in practise be later adjusted if needed. In this thesis, a visual based training method is presented. It avoids the problems related to traditional training methods by combining a classifier and a user interface. The method relies on unsupervised projection and provides an intuitive way to directly set and tune the class boundaries of high-dimensional data. As the method groups the data only by the similarities of its features, it is not affected by erroneous and inconsistent labelling made for training samples. Furthermore, it does not require knowledge of the internal structure of the classifier or iterative parameter tuning, where a combination of parameter values leading to the desired class boundaries are sought. On the contrary, the class boundaries can be set directly, changing the classification parameters. The time need to take such a classifier into use is small and tuning the class boundaries can happen even on-line, if needed. The proposed method is tested with various experiments in this thesis. Different projection methods are evaluated from the point of view of visual based training. The method is further evaluated using a self-organizing map (SOM) as the projection method and wood as the test material. Parameters such as accuracy, map size, and speed are measured and discussed, and overall the method is found to be an advantageous training and classification scheme. SOM data visualization dimensionality reduction nonlinear projection unsupervised learning wood
32	Anomaly Detection with Advanced Nonlinear Dimensionality Reduction Beach, David J. 07 May 2020 (has links) Dimensionality reduction techniques such as t-SNE and UMAP are useful both for overview of high-dimensional datasets and as part of a machine learning pipeline. These techniques create a non-parametric model of the manifold by fitting a density kernel about each data point using the distances to its k-nearest neighbors. In dense regions, this approach works well, but in sparse regions, it tends to draw unrelated points into the nearest cluster. Our work focuses on a homotopy method which imposes graph-based regularization over the manifold parameters to update the embedding. As the homotopy parameter increases, so does the cost of modeling different scales between adjacent neighborhoods. This gradually imposes a more uniform scale over the manifold, resulting in a more faithful embedding which preserves structure in dense areas while pushing sparse anomalous points outward. Dimensionality Reduction Anomaly Detection Manifold Learning Unsupervised Learning
33	Introduction to fast Super-Paramagnetic Clustering Yelibi, Lionel 25 February 2020 (has links) We map stock market interactions to spin models to recover their hierarchical structure using a simulated annealing based Super-Paramagnetic Clustering (SPC) algorithm. This is directly compared to a modified implementation of a maximum likelihood approach to fast-Super-Paramagnetic Clustering (f-SPC). The methods are first applied standard toy test-case problems, and then to a dataset of 447 stocks traded on the New York Stock Exchange (NYSE) over 1249 days. The signal to noise ratio of stock market correlation matrices is briefly considered. Our result recover approximately clusters representative of standard economic sectors and mixed clusters whose dynamics shine light on the adaptive nature of financial markets and raise concerns relating to the effectiveness of industry based static financial market classification in the world of real-time data-analytics. A key result is that we show that the standard maximum likelihood methods are confirmed to converge to solutions within a Super-Paramagnetic (SP) phase. We use insights arising from this to discuss the implications of using a Maximum Entropy Principle (MEP) as opposed to the Maximum Likelihood Principle (MLP) as an optimization device for this class of problems. maximum likelihood Potts Models unsupervised learning clustering maximum entropy
34	A Different Approach to Attacking and Defending Deep Neural Networks Fourati, Fares 06 1900 (has links) Adversarial examples are among the most widespread attacks in adversarial machine learning. In this work, we define new targeted and non-targeted attacks that are computationally less expensive than standard adversarial attacks. Besides practical purposes in some scenarios, these attacks can improve our understanding of the robustness of machine learning models. Moreover, we introduce a new training scheme to improve the performance of pre-trained neural networks and defend against our attacks. We examine the differences between our method, standard training, and standard adversarial training on pre-trained models. We find that our method protects the networks better against our attacks. Furthermore, unlike usual adversarial training, which reduces standard accuracy when applied to previously trained networks, our method maintains and sometimes even improves standard accuracy. adversarial machine learning adversarial examples adversarial training unsupervised learning robustness
35	Automating Log Analysis Kommineni, Sri Sai Manoj, Dindi, Akhila January 2021 (has links) Background: With the advent of the information age, there are many large numbers of services rising which run on several clusters of computers. Maintaining such large complex systems is a very difficult task. Developers use one tool which is common for almost all software systems, they are the console logs. To troubleshoot problems, developers refer to these logs to solve the issue. Identifying anomalies in the logs would lead us to the cause of the problem, thereby automating the analysis of logs. This study focuses on anomaly detection in logs. Objectives: The main goal of the thesis is to identify different algorithms for anomaly detection in logs, implement the algorithms and compare them by doing an experiment. Methods: A literature review had been conducted for identifying the most suitable algorithms for anomaly detection in logs. An experiment was conducted to compare the algorithms identified in the literature review. The experiment was performed on a dataset of logs generated by Hadoop Data File System (HDFS) servers which consisted of more than 11 million lines of logs. The algorithms that have been compared are K-means, DBSCAN, Isolation Forest, and Local Outlier Factor algorithms which are all unsupervised learning algorithms. Results: The performance of all these algorithms has been compared using metrics precision, recall, accuracy, F1 score, and run time. Though DBSCAN was the fastest, it resulted in poor recall, similarly Isolation Forest also resulted in poor recall. Local Outlier Factor was the fastest to predict. K-means had the highest precision and Local Outlier Factor had the highest recall, accuracy, and F1 score. Conclusion: After comparing the metrics of different algorithms, we conclude that Local Outlier Factor performed better than the other algorithms with respect to most of the metrics measured. Anomaly detection Log analysis Unsupervised learning Computer Sciences Datavetenskap (datalogi)
36	Identifying phase transitions of disordered topological systems by unsupervised learning Sun, Yuanjie 30 April 2023 (has links) Phase transitions are critical in understanding the properties of different phases of matter, and their identification is an essential research focus in condensed matter physics. However, defining phase transitions for topological systems is more complex than for common mesoscale materials. This complexity is further compounded when disorders are present in the system. In this thesis work, we provide a comprehensive review of machine learning, topological insulators, and the conventional approach to classifying different topological phases. We focus on the Benalcazar, Bernevig, and Hughes (BBH) model, a higher-order topological insulator model, and investigate the challenges of identifying phase transitions in topological systems, particularly in the presence of disorders. To overcome these challenges, we implement the diffusion maps method, which accurately predicts the same transition points as traditional numerical calculations for both clean and disordered systems. Moreover, we demonstrate the efficacy of the diffusion maps method in predicting the transition point for the topological Anderson insulator. Our findings suggest that this approach has the potential to be generalized and applied to a broader range of disordered systems. Overall, this thesis work provides a novel method for identifying phase transition points in topological systems, which could have significant implications for the design and development of future topological materials. unsupervised learning topological phase transition topological Anderson insulator
37	Unsupervised Representation Learning with Clustering in Deep Convolutional Networks Caron, Mathilde January 2018 (has links) This master thesis tackles the problem of unsupervised learning of visual representations with deep Convolutional Neural Networks (CNN). This is one of the main actual challenges in image recognition to close the gap between unsupervised and supervised representation learning. We propose a novel and simple way of training CNN on fully unlabeled datasets. Our method jointly optimizes a grouping of the representations and trains a CNN using the groups as supervision. We evaluate the models trained with our method on standard transfer learning experiments from the literature. We find out that our method outperforms all self-supervised and unsupervised state-of-the-art approaches. More importantly, our method outperforms those methods even when the unsupervised training set is not ImageNet but an arbitrary subset of images from Flickr. / Detta examensarbete behandlar problemet med oövervakat lärande av visuella representationer med djupa konvolutionella neurala nätverk (CNN). Detta är en av de viktigaste faktiska utmaningarna i datorseende för att överbrygga klyftan mellan oövervakad och övervakad representationstjänst. Vi föreslår ett nytt och enkelt sätt att träna CNN på helt omärkta dataset. Vår metod består i att tillsammans optimera en gruppering av representationerna och träna ett CNN med hjälp av grupperna som tillsyn. Vi utvärderar modellerna som tränats med vår metod på standardöverföringslärande experiment från litteraturen. Vi finner att vår metod överträffar alla självövervakade och oövervakade, toppmoderna tillvägagångssätt, hur sofistikerade de än är. Ännu viktigare är att vår metod överträffar de metoderna även när den oövervakade träningsuppsättningen inte är ImageNet men en godtycklig delmängd av bilder från Flickr. Computer vision unsupervised learning Other Engineering and Technologies Annan teknik
38	MAP-GAN: Unsupervised Learning of Inverse Problems Campanella, Brandon S 01 December 2021 (has links) (PDF) In this paper we outline a novel method for training a generative adversarial network based denoising model from an exclusively corrupted and unpaired dataset of images. Our model can learn without clean data or corrupted image pairs, and instead only requires that the noise distribution is able to be expressed analytically and that the noise at each pixel is independent. We utilize maximum a posteriori estimation as the underlying solution framework, optimizing over the analytically expressed noise generating distribution as the likelihood and employ the GAN as the prior. We then evaluate our method on several popular datasets of varying size and levels of corruption. Further we directly compare the numerical results of our experiments to that of the current state of the art unsupervised denoising model. While our proposed approach's experiments do not achieve a new state of the art, it provides an alternative method to unsupervised denoising and shows strong promise as an area for future research and untapped potential. GAN Unsupervised Learning Inverse Problems Machine Learning Denoising
39	Building Energy Profile Clustering Based on Energy Consumption Patterns Afzalan, Milad 06 1900 (has links) With the widespread adoption of smart meters in buildings, an unprecedented amount of high- resolution energy data is released, which provides opportunities to understand building consumption patterns. Accordingly, research efforts have employed data analytics and machine learning methods for the segmentation of consumers based on their load profiles, which help utilities and energy providers for customized/personalized targeting for energy programs. However, building energy segmentation methodologies may present oversimplified representations of load shapes, which do not properly capture the realistic energy consumption patterns, in terms of temporal shapes and magnitude. In this thesis, we introduce a clustering technique that is capable of preserving both temporal patterns and total consumption of load shapes from customers’ energy data. The proposed approach first overpopulates clusters as the initial stage to preserve the accuracy and merges the similar ones to reduce redundancy in the second stage by integrating time-series similarity techniques. For such a purpose, different time-series similarity measures based on Dynamic Time Warping (DTW) are employed. Furthermore, evaluations of different unsupervised clustering methods such as k-means, hierarchical clustering, fuzzy c-means, and self-organizing map were presented on building load shape portfolios, and their performance were quantitatively and qualitatively compared. The evaluation was carried out on real energy data of ~250 households. The comparative assessment (both qualitatively and quantitatively) demonstrated the applicability of the proposed approach compared to benchmark techniques for power time-series clustering of household load shapes. The contribution of this thesis is to: (1) present a comparative assessment of clustering techniques on household electricity load shapes and highlighting the inadequacy of conventional validation indices for choosing the cluster number and (2) propose a two-stage clustering approach to improve the representation of temporal patterns and magnitude of household load shapes. / M.S. / With the unprecedented amount of data collected by smart meters, we have opportunities to systematically analyze the energy consumption patterns of households. Specifically, through using data analytics methods, one could cluster a large number of energy patterns (collected on a daily basis) into a number of representative groups, which could reveal actionable patterns for electric utilities for energy planning. However, commonly used clustering approaches may not properly show the variation of energy patterns or energy volume of customers at a neighborhood scale. Therefore, in this thesis, we introduced a clustering approach to improve the cluster representation by preserving the temporal shapes and energy volume of daily profiles (i.e., the energy data of a household collected during 1 day). In the first part of the study, we evaluated several well-known clustering techniques and validation indices in the literature and showed that they do not necessarily work well for this domain-specific problem. As a result, in the second part, we introduced a two-stage clustering technique to extract the typical energy consumption patterns of households. Different visualization and quantified metrics are shown for the comparison and applicability of the methods. A case-study on several datasets comprising more than 250 households was considered for evaluation. The findings show that datasets with more than thousands of observations can be clustered into 10-50 groups through the introduced two-stage approach, while reasonably maintaining the energy patterns and energy volume of individual profiles. Clustering Unsupervised learning Segmentation Smart gird Energy consumption
40	Classifying Driver Behaviour For Predicting Risk For Accidents : A case study of forklift operations / Identifiera beteendemönster hos truckförare för att förutspå risk för olycka Zachrison, Unn, Winqvist, Victoria January 2024 (has links) This thesis explores the possibility of identifying risk behaviour patterns among forklift drivers through the analysis of telemetry data using unsupervised clustering algorithms. The objective is to predict whether certain behaviour patterns increase the risk of accidents. With the increasing accessibility of Internet of Things technology, data from forklifts has become more available, allowing for the study of driver behaviour. The telemetry data utilised is sourced from Toyota Material Handling Manufacturer Sweden’s internal database, collected from Data Handling Units that are installed on forklifts across Europe. This data, referred to as shock data, is triggered when a force is applied to the forklift, such as a collision. The thesis investigates combinations of various clustering algorithms and dataset modifications. The evaluation of the results is conducted using several quantitative measures and visualisation, along with analysis of time distribution, geographical placement, comparison of forklift models, and comparison with "no-shock" data. The evaluation yields K-Prototypes and K-Means as the best performing algorithms, while indicating that soft clustering and density-based clustering are not well-suited for the data. The identified best performing algorithms reveal two recurring driver behaviour patterns: the first one being driving forward at high speed with the lift motor idle, and the second pattern being driving backward at low speed while lowering the forks. Furthermore, a majority of the data points remain unclassified into specific behaviour patterns, suggesting that the dataset or methods used may not be sufficient enough. The inclusion of additional featuers, such as steering angle and forklift height, should be considered for exploration in future work. The thesis demonstrates the feasibility of identifying risk behaviour patterns, with potential for future research expanding on the findings to further contribute to the prevention of workplace accidents involving forklifts. Machine learning unsupervised learning driver behavior clustering Computer Engineering Datorteknik

Search results