• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 195
  • 21
  • 18
  • 9
  • 5
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 324
  • 324
  • 118
  • 108
  • 81
  • 80
  • 78
  • 61
  • 61
  • 60
  • 53
  • 48
  • 47
  • 46
  • 45
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

A visual training based approach to surface inspection

Niskanen, M. (Matti) 18 June 2003 (has links)
Abstract Training a visual inspection device is not straightforward but suffers from the high variation in material to be inspected. This variation causes major difficulties for a human, and this is directly reflected in classifier training. Many inspection devices utilize rule-based classifiers the building and training of which rely mainly on human expertise. While designing such a classifier, a human tries to find the questions that would provide proper categorization. In training, an operator tunes the classifier parameters, aiming to achieve as good classification accuracy as possible. Such classifiers require lot of time and expertise before they can be fully utilized. Supervised classifiers form another common category. These learn automatically from training material, but rely on labels that a human has set for it. However, these labels tend to be inconsistent and thus reduce the classification accuracy achieved. Furthermore, as class boundaries are learnt from training samples, they cannot in practise be later adjusted if needed. In this thesis, a visual based training method is presented. It avoids the problems related to traditional training methods by combining a classifier and a user interface. The method relies on unsupervised projection and provides an intuitive way to directly set and tune the class boundaries of high-dimensional data. As the method groups the data only by the similarities of its features, it is not affected by erroneous and inconsistent labelling made for training samples. Furthermore, it does not require knowledge of the internal structure of the classifier or iterative parameter tuning, where a combination of parameter values leading to the desired class boundaries are sought. On the contrary, the class boundaries can be set directly, changing the classification parameters. The time need to take such a classifier into use is small and tuning the class boundaries can happen even on-line, if needed. The proposed method is tested with various experiments in this thesis. Different projection methods are evaluated from the point of view of visual based training. The method is further evaluated using a self-organizing map (SOM) as the projection method and wood as the test material. Parameters such as accuracy, map size, and speed are measured and discussed, and overall the method is found to be an advantageous training and classification scheme.
32

Anomaly Detection with Advanced Nonlinear Dimensionality Reduction

Beach, David J. 07 May 2020 (has links)
Dimensionality reduction techniques such as t-SNE and UMAP are useful both for overview of high-dimensional datasets and as part of a machine learning pipeline. These techniques create a non-parametric model of the manifold by fitting a density kernel about each data point using the distances to its k-nearest neighbors. In dense regions, this approach works well, but in sparse regions, it tends to draw unrelated points into the nearest cluster. Our work focuses on a homotopy method which imposes graph-based regularization over the manifold parameters to update the embedding. As the homotopy parameter increases, so does the cost of modeling different scales between adjacent neighborhoods. This gradually imposes a more uniform scale over the manifold, resulting in a more faithful embedding which preserves structure in dense areas while pushing sparse anomalous points outward.
33

Introduction to fast Super-Paramagnetic Clustering

Yelibi, Lionel 25 February 2020 (has links)
We map stock market interactions to spin models to recover their hierarchical structure using a simulated annealing based Super-Paramagnetic Clustering (SPC) algorithm. This is directly compared to a modified implementation of a maximum likelihood approach to fast-Super-Paramagnetic Clustering (f-SPC). The methods are first applied standard toy test-case problems, and then to a dataset of 447 stocks traded on the New York Stock Exchange (NYSE) over 1249 days. The signal to noise ratio of stock market correlation matrices is briefly considered. Our result recover approximately clusters representative of standard economic sectors and mixed clusters whose dynamics shine light on the adaptive nature of financial markets and raise concerns relating to the effectiveness of industry based static financial market classification in the world of real-time data-analytics. A key result is that we show that the standard maximum likelihood methods are confirmed to converge to solutions within a Super-Paramagnetic (SP) phase. We use insights arising from this to discuss the implications of using a Maximum Entropy Principle (MEP) as opposed to the Maximum Likelihood Principle (MLP) as an optimization device for this class of problems.
34

A Different Approach to Attacking and Defending Deep Neural Networks

Fourati, Fares 06 1900 (has links)
Adversarial examples are among the most widespread attacks in adversarial machine learning. In this work, we define new targeted and non-targeted attacks that are computationally less expensive than standard adversarial attacks. Besides practical purposes in some scenarios, these attacks can improve our understanding of the robustness of machine learning models. Moreover, we introduce a new training scheme to improve the performance of pre-trained neural networks and defend against our attacks. We examine the differences between our method, standard training, and standard adversarial training on pre-trained models. We find that our method protects the networks better against our attacks. Furthermore, unlike usual adversarial training, which reduces standard accuracy when applied to previously trained networks, our method maintains and sometimes even improves standard accuracy.
35

Building Energy Profile Clustering Based on Energy Consumption Patterns

Afzalan, Milad 06 1900 (has links)
With the widespread adoption of smart meters in buildings, an unprecedented amount of high- resolution energy data is released, which provides opportunities to understand building consumption patterns. Accordingly, research efforts have employed data analytics and machine learning methods for the segmentation of consumers based on their load profiles, which help utilities and energy providers for customized/personalized targeting for energy programs. However, building energy segmentation methodologies may present oversimplified representations of load shapes, which do not properly capture the realistic energy consumption patterns, in terms of temporal shapes and magnitude. In this thesis, we introduce a clustering technique that is capable of preserving both temporal patterns and total consumption of load shapes from customers’ energy data. The proposed approach first overpopulates clusters as the initial stage to preserve the accuracy and merges the similar ones to reduce redundancy in the second stage by integrating time-series similarity techniques. For such a purpose, different time-series similarity measures based on Dynamic Time Warping (DTW) are employed. Furthermore, evaluations of different unsupervised clustering methods such as k-means, hierarchical clustering, fuzzy c-means, and self-organizing map were presented on building load shape portfolios, and their performance were quantitatively and qualitatively compared. The evaluation was carried out on real energy data of ~250 households. The comparative assessment (both qualitatively and quantitatively) demonstrated the applicability of the proposed approach compared to benchmark techniques for power time-series clustering of household load shapes. The contribution of this thesis is to: (1) present a comparative assessment of clustering techniques on household electricity load shapes and highlighting the inadequacy of conventional validation indices for choosing the cluster number and (2) propose a two-stage clustering approach to improve the representation of temporal patterns and magnitude of household load shapes. / M.S. / With the unprecedented amount of data collected by smart meters, we have opportunities to systematically analyze the energy consumption patterns of households. Specifically, through using data analytics methods, one could cluster a large number of energy patterns (collected on a daily basis) into a number of representative groups, which could reveal actionable patterns for electric utilities for energy planning. However, commonly used clustering approaches may not properly show the variation of energy patterns or energy volume of customers at a neighborhood scale. Therefore, in this thesis, we introduced a clustering approach to improve the cluster representation by preserving the temporal shapes and energy volume of daily profiles (i.e., the energy data of a household collected during 1 day). In the first part of the study, we evaluated several well-known clustering techniques and validation indices in the literature and showed that they do not necessarily work well for this domain-specific problem. As a result, in the second part, we introduced a two-stage clustering technique to extract the typical energy consumption patterns of households. Different visualization and quantified metrics are shown for the comparison and applicability of the methods. A case-study on several datasets comprising more than 250 households was considered for evaluation. The findings show that datasets with more than thousands of observations can be clustered into 10-50 groups through the introduced two-stage approach, while reasonably maintaining the energy patterns and energy volume of individual profiles.
36

Automating Log Analysis

Kommineni, Sri Sai Manoj, Dindi, Akhila January 2021 (has links)
Background: With the advent of the information age, there are many large numbers of services rising which run on several clusters of computers.  Maintaining such large complex systems is a very difficult task. Developers use one tool which is common for almost all software systems, they are the console logs. To troubleshoot problems, developers refer to these logs to solve the issue. Identifying anomalies in the logs would lead us to the cause of the problem, thereby automating the analysis of logs. This study focuses on anomaly detection in logs. Objectives: The main goal of the thesis is to identify different algorithms for anomaly detection in logs, implement the algorithms and compare them by doing an experiment. Methods: A literature review had been conducted for identifying the most suitable algorithms for anomaly detection in logs. An experiment was conducted to compare the algorithms identified in the literature review. The experiment was performed on a dataset of logs generated by Hadoop Data File System (HDFS) servers which consisted of more than 11 million lines of logs. The algorithms that have been compared are K-means, DBSCAN, Isolation Forest, and Local Outlier Factor algorithms which are all unsupervised learning algorithms. Results: The performance of all these algorithms has been compared using metrics precision, recall, accuracy, F1 score, and run time. Though DBSCAN was the fastest, it resulted in poor recall, similarly Isolation Forest also resulted in poor recall. Local Outlier Factor was the fastest to predict. K-means had the highest precision and Local Outlier Factor had the highest recall, accuracy, and F1 score. Conclusion: After comparing the metrics of different algorithms, we conclude that Local Outlier Factor performed better than the other algorithms with respect to most of the metrics measured.
37

Identifying phase transitions of disordered topological systems by unsupervised learning

Sun, Yuanjie 30 April 2023 (has links)
Phase transitions are critical in understanding the properties of different phases of matter, and their identification is an essential research focus in condensed matter physics. However, defining phase transitions for topological systems is more complex than for common mesoscale materials. This complexity is further compounded when disorders are present in the system. In this thesis work, we provide a comprehensive review of machine learning, topological insulators, and the conventional approach to classifying different topological phases. We focus on the Benalcazar, Bernevig, and Hughes (BBH) model, a higher-order topological insulator model, and investigate the challenges of identifying phase transitions in topological systems, particularly in the presence of disorders. To overcome these challenges, we implement the diffusion maps method, which accurately predicts the same transition points as traditional numerical calculations for both clean and disordered systems. Moreover, we demonstrate the efficacy of the diffusion maps method in predicting the transition point for the topological Anderson insulator. Our findings suggest that this approach has the potential to be generalized and applied to a broader range of disordered systems. Overall, this thesis work provides a novel method for identifying phase transition points in topological systems, which could have significant implications for the design and development of future topological materials.
38

Unsupervised Representation Learning with Clustering in Deep Convolutional Networks

Caron, Mathilde January 2018 (has links)
This master thesis tackles the problem of unsupervised learning of visual representations with deep Convolutional Neural Networks (CNN). This is one of the main actual challenges in image recognition to close the gap between unsupervised and supervised representation learning. We propose a novel and simple way of training CNN on fully unlabeled datasets. Our method jointly optimizes a grouping of the representations and trains a CNN using the groups as supervision. We evaluate the models trained with our method on standard transfer learning experiments from the literature. We find out that our method outperforms all self-supervised and unsupervised state-of-the-art approaches. More importantly, our method outperforms those methods even when the unsupervised training set is not ImageNet but an arbitrary subset of images from Flickr. / Detta examensarbete behandlar problemet med oövervakat lärande av visuella representationer med djupa konvolutionella neurala nätverk (CNN). Detta är en av de viktigaste faktiska utmaningarna i datorseende för att överbrygga klyftan mellan oövervakad och övervakad representationstjänst. Vi föreslår ett nytt och enkelt sätt att träna CNN på helt omärkta dataset. Vår metod består i att tillsammans optimera en gruppering av representationerna och träna ett CNN med hjälp av grupperna som tillsyn. Vi utvärderar modellerna som tränats med vår metod på standardöverföringslärande experiment från litteraturen. Vi finner att vår metod överträffar alla självövervakade och oövervakade, toppmoderna tillvägagångssätt, hur sofistikerade de än är. Ännu viktigare är att vår metod överträffar de metoderna även när den oövervakade träningsuppsättningen inte är ImageNet men en godtycklig delmängd av bilder från Flickr.
39

MAP-GAN: Unsupervised Learning of Inverse Problems

Campanella, Brandon S 01 December 2021 (has links) (PDF)
In this paper we outline a novel method for training a generative adversarial network based denoising model from an exclusively corrupted and unpaired dataset of images. Our model can learn without clean data or corrupted image pairs, and instead only requires that the noise distribution is able to be expressed analytically and that the noise at each pixel is independent. We utilize maximum a posteriori estimation as the underlying solution framework, optimizing over the analytically expressed noise generating distribution as the likelihood and employ the GAN as the prior. We then evaluate our method on several popular datasets of varying size and levels of corruption. Further we directly compare the numerical results of our experiments to that of the current state of the art unsupervised denoising model. While our proposed approach's experiments do not achieve a new state of the art, it provides an alternative method to unsupervised denoising and shows strong promise as an area for future research and untapped potential.
40

Iterated learning framework for unsupervised part-of-speech induction

Christodoulopoulos, Christos January 2013 (has links)
Computational approaches to linguistic analysis have been used for more than half a century. The main tools come from the field of Natural Language Processing (NLP) and are based on rule-based or corpora-based (supervised) methods. Despite the undeniable success of supervised learning methods in NLP, they have two main drawbacks: on the practical side, it is expensive to produce the manual annotation (or the rules) required and it is not easy to find annotators for less common languages. A theoretical disadvantage is that the computational analysis produced is tied to a specific theory or annotation scheme. Unsupervised methods offer the possibility to expand our analyses into more resourcepoor languages, and to move beyond the conventional linguistic theories. They are a way of observing patterns and regularities emerging directly from the data and can provide new linguistic insights. In this thesis I explore unsupervised methods for inducing parts of speech across languages. I discuss the challenges in evaluation of unsupervised learning and at the same time, by looking at the historical evolution of part-of-speech systems, I make the case that the compartmentalised, traditional pipeline approach of NLP is not ideal for the task. I present a generative Bayesian system that makes it easy to incorporate multiple diverse features, spanning different levels of linguistic structure, like morphology, lexical distribution, syntactic dependencies and word alignment information that allow for the examination of cross-linguistic patterns. I test the system using features provided by unsupervised systems in a pipeline mode (where the output of one system is the input to another) and show that the performance of the baseline (distributional) model increases significantly, reaching and in some cases surpassing the performance of state-of-the-art part-of-speech induction systems. I then turn to the unsupervised systems that provided these sources of information (morphology, dependencies, word alignment) and examine the way that part-of-speech information influences their inference. Having established a bi-directional relationship between each system and my part-of-speech inducer, I describe an iterated learning method, where each component system is trained using the output of the other system in each iteration. The iterated learning method improves the performance of both component systems in each task. Finally, using this iterated learning framework, and by using parts of speech as the central component, I produce chains of linguistic structure induction that combine all the component systems to offer a more holistic view of NLP. To show the potential of this multi-level system, I demonstrate its use ‘in the wild’. I describe the creation of a vastly multilingual parallel corpus based on 100 translations of the Bible in a diverse set of languages. Using the multi-level induction system, I induce cross-lingual clusters, and provide some qualitative results of my approach. I show that it is possible to discover similarities between languages that correspond to ‘hidden’ morphological, syntactic or semantic elements.

Page generated in 0.1076 seconds