Global ETD Search

21	Integrating Feature and Graph Learning with Factorization Models for Low-Rank Data Representation Peng, Chong 01 December 2017 (has links) Representing and handling high-dimensional data has been increasingly ubiquitous in many real world-applications, such as computer vision, machine learning, and data mining. High-dimensional data usually have intrinsic low-dimensional structures, which are suitable for subsequent data processing. As a consequent, it has been a common demand to find low-dimensional data representations in many machine learning and data mining problems. Factorization methods have been impressive in recovering intrinsic low-dimensional structures of the data. When seeking low-dimensional representation of the data, traditional methods mainly face two challenges: 1) how to discover the most variational features/information from the data; 2) how to measure accurate nonlinear relationships of the data. As a solution to these challenges, traditional methods usually make use of a two-step approach by performing feature selection and manifold construction followed by further data processing, which omits the dependence between these learning tasks and produce inaccurate data representation. To resolve these problems, we propose to integrate feature learning and graph learning with factorization model, which allows the goals of learning features, constructing manifold, and seeking new data representation to mutually enhance and lead to powerful data representation capability. Moreover, it has been increasingly common that 2-dimensional (2D) data often have high dimensions of features, where each example of 2D data is a matrix with its elements being features. For such data, traditional data usually convert them to 1-dimensional vectorial data before data processing, which severely damages inherent structures of such data. We propose to directly use 2D data for seeking new representation, which enables the model to preserve inherent 2D structures of the data. We propose to seek projection directions to find the subspaces, in which spatial information is maximumly preserved. Also, manifold and new data representation are learned in these subspaces, such that the manifold are clean and the new representation is discriminative. Consequently, seeking projections, learning manifold and constructing new representation mutually enhance and lead to powerful data representation technique. clustering factorization integration low-dimensional structure unsupervised learning
22	Semantic Analysis Of Multi Meaning Words Using Machine Learning And Knowledge Representation Alirezaie, Marjan January 2011 (has links) The present thesis addresses machine learning in a domain of naturallanguage phrases that are names of universities. It describes two approaches to this problem and a software implementation that has made it possible to evaluate them and to compare them. In general terms, the system's task is to learn to 'understand' the significance of the various components of a university name, such as the city or region where the university is located, the scienti c disciplines that are studied there, or the name of a famous person which may be part of the university name. A concrete test for whether the system has acquired this understanding is when it is able to compose a plausible university name given some components that should occur in the name. In order to achieve this capability, our system learns the structure of available names of some universities in a given data set, i.e. it acquires a grammar for the microlanguage of university names. One of the challenges is that the system may encounter ambiguities due to multi meaning words. This problem is addressed using a small ontology that is created during the training phase. Both domain knowledge and grammatical knowledge is represented using decision trees, which is an ecient method for concept learning. Besides for inductive inference, their role is to partition the data set into a hierarchical structure which is used for resolving ambiguities. The present report also de nes some modi cations in the de nitions of parameters, for example a parameter for entropy, which enable the system to deal with cognitive uncertainties. Our method for automatic syntax acquisition, ADIOS, is an unsupervised learning method. This method is described and discussed here, including a report on the outcome of the tests using our data set. The software that has been implemented and used in this project has been implemented in C. Machine Learning Supervised Learning Unsupervised Learning Computer Sciences Datavetenskap (datalogi)
23	A visual training based approach to surface inspection Niskanen, M. (Matti) 18 June 2003 (has links) Abstract Training a visual inspection device is not straightforward but suffers from the high variation in material to be inspected. This variation causes major difficulties for a human, and this is directly reflected in classifier training. Many inspection devices utilize rule-based classifiers the building and training of which rely mainly on human expertise. While designing such a classifier, a human tries to find the questions that would provide proper categorization. In training, an operator tunes the classifier parameters, aiming to achieve as good classification accuracy as possible. Such classifiers require lot of time and expertise before they can be fully utilized. Supervised classifiers form another common category. These learn automatically from training material, but rely on labels that a human has set for it. However, these labels tend to be inconsistent and thus reduce the classification accuracy achieved. Furthermore, as class boundaries are learnt from training samples, they cannot in practise be later adjusted if needed. In this thesis, a visual based training method is presented. It avoids the problems related to traditional training methods by combining a classifier and a user interface. The method relies on unsupervised projection and provides an intuitive way to directly set and tune the class boundaries of high-dimensional data. As the method groups the data only by the similarities of its features, it is not affected by erroneous and inconsistent labelling made for training samples. Furthermore, it does not require knowledge of the internal structure of the classifier or iterative parameter tuning, where a combination of parameter values leading to the desired class boundaries are sought. On the contrary, the class boundaries can be set directly, changing the classification parameters. The time need to take such a classifier into use is small and tuning the class boundaries can happen even on-line, if needed. The proposed method is tested with various experiments in this thesis. Different projection methods are evaluated from the point of view of visual based training. The method is further evaluated using a self-organizing map (SOM) as the projection method and wood as the test material. Parameters such as accuracy, map size, and speed are measured and discussed, and overall the method is found to be an advantageous training and classification scheme. SOM data visualization dimensionality reduction nonlinear projection unsupervised learning wood
24	Anomaly Detection with Advanced Nonlinear Dimensionality Reduction Beach, David J. 07 May 2020 (has links) Dimensionality reduction techniques such as t-SNE and UMAP are useful both for overview of high-dimensional datasets and as part of a machine learning pipeline. These techniques create a non-parametric model of the manifold by fitting a density kernel about each data point using the distances to its k-nearest neighbors. In dense regions, this approach works well, but in sparse regions, it tends to draw unrelated points into the nearest cluster. Our work focuses on a homotopy method which imposes graph-based regularization over the manifold parameters to update the embedding. As the homotopy parameter increases, so does the cost of modeling different scales between adjacent neighborhoods. This gradually imposes a more uniform scale over the manifold, resulting in a more faithful embedding which preserves structure in dense areas while pushing sparse anomalous points outward. Dimensionality Reduction Anomaly Detection Manifold Learning Unsupervised Learning
25	Introduction to fast Super-Paramagnetic Clustering Yelibi, Lionel 25 February 2020 (has links) We map stock market interactions to spin models to recover their hierarchical structure using a simulated annealing based Super-Paramagnetic Clustering (SPC) algorithm. This is directly compared to a modified implementation of a maximum likelihood approach to fast-Super-Paramagnetic Clustering (f-SPC). The methods are first applied standard toy test-case problems, and then to a dataset of 447 stocks traded on the New York Stock Exchange (NYSE) over 1249 days. The signal to noise ratio of stock market correlation matrices is briefly considered. Our result recover approximately clusters representative of standard economic sectors and mixed clusters whose dynamics shine light on the adaptive nature of financial markets and raise concerns relating to the effectiveness of industry based static financial market classification in the world of real-time data-analytics. A key result is that we show that the standard maximum likelihood methods are confirmed to converge to solutions within a Super-Paramagnetic (SP) phase. We use insights arising from this to discuss the implications of using a Maximum Entropy Principle (MEP) as opposed to the Maximum Likelihood Principle (MLP) as an optimization device for this class of problems. maximum likelihood Potts Models unsupervised learning clustering maximum entropy
26	A Different Approach to Attacking and Defending Deep Neural Networks Fourati, Fares 06 1900 (has links) Adversarial examples are among the most widespread attacks in adversarial machine learning. In this work, we define new targeted and non-targeted attacks that are computationally less expensive than standard adversarial attacks. Besides practical purposes in some scenarios, these attacks can improve our understanding of the robustness of machine learning models. Moreover, we introduce a new training scheme to improve the performance of pre-trained neural networks and defend against our attacks. We examine the differences between our method, standard training, and standard adversarial training on pre-trained models. We find that our method protects the networks better against our attacks. Furthermore, unlike usual adversarial training, which reduces standard accuracy when applied to previously trained networks, our method maintains and sometimes even improves standard accuracy. adversarial machine learning adversarial examples adversarial training unsupervised learning robustness
27	Automating Log Analysis Kommineni, Sri Sai Manoj, Dindi, Akhila January 2021 (has links) Background: With the advent of the information age, there are many large numbers of services rising which run on several clusters of computers. Maintaining such large complex systems is a very difficult task. Developers use one tool which is common for almost all software systems, they are the console logs. To troubleshoot problems, developers refer to these logs to solve the issue. Identifying anomalies in the logs would lead us to the cause of the problem, thereby automating the analysis of logs. This study focuses on anomaly detection in logs. Objectives: The main goal of the thesis is to identify different algorithms for anomaly detection in logs, implement the algorithms and compare them by doing an experiment. Methods: A literature review had been conducted for identifying the most suitable algorithms for anomaly detection in logs. An experiment was conducted to compare the algorithms identified in the literature review. The experiment was performed on a dataset of logs generated by Hadoop Data File System (HDFS) servers which consisted of more than 11 million lines of logs. The algorithms that have been compared are K-means, DBSCAN, Isolation Forest, and Local Outlier Factor algorithms which are all unsupervised learning algorithms. Results: The performance of all these algorithms has been compared using metrics precision, recall, accuracy, F1 score, and run time. Though DBSCAN was the fastest, it resulted in poor recall, similarly Isolation Forest also resulted in poor recall. Local Outlier Factor was the fastest to predict. K-means had the highest precision and Local Outlier Factor had the highest recall, accuracy, and F1 score. Conclusion: After comparing the metrics of different algorithms, we conclude that Local Outlier Factor performed better than the other algorithms with respect to most of the metrics measured. Anomaly detection Log analysis Unsupervised learning Computer Sciences Datavetenskap (datalogi)
28	Unsupervised Representation Learning with Clustering in Deep Convolutional Networks Caron, Mathilde January 2018 (has links) This master thesis tackles the problem of unsupervised learning of visual representations with deep Convolutional Neural Networks (CNN). This is one of the main actual challenges in image recognition to close the gap between unsupervised and supervised representation learning. We propose a novel and simple way of training CNN on fully unlabeled datasets. Our method jointly optimizes a grouping of the representations and trains a CNN using the groups as supervision. We evaluate the models trained with our method on standard transfer learning experiments from the literature. We find out that our method outperforms all self-supervised and unsupervised state-of-the-art approaches. More importantly, our method outperforms those methods even when the unsupervised training set is not ImageNet but an arbitrary subset of images from Flickr. / Detta examensarbete behandlar problemet med oövervakat lärande av visuella representationer med djupa konvolutionella neurala nätverk (CNN). Detta är en av de viktigaste faktiska utmaningarna i datorseende för att överbrygga klyftan mellan oövervakad och övervakad representationstjänst. Vi föreslår ett nytt och enkelt sätt att träna CNN på helt omärkta dataset. Vår metod består i att tillsammans optimera en gruppering av representationerna och träna ett CNN med hjälp av grupperna som tillsyn. Vi utvärderar modellerna som tränats med vår metod på standardöverföringslärande experiment från litteraturen. Vi finner att vår metod överträffar alla självövervakade och oövervakade, toppmoderna tillvägagångssätt, hur sofistikerade de än är. Ännu viktigare är att vår metod överträffar de metoderna även när den oövervakade träningsuppsättningen inte är ImageNet men en godtycklig delmängd av bilder från Flickr. Computer vision unsupervised learning Other Engineering and Technologies Annan teknik
29	MAP-GAN: Unsupervised Learning of Inverse Problems Campanella, Brandon S 01 December 2021 (has links) (PDF) In this paper we outline a novel method for training a generative adversarial network based denoising model from an exclusively corrupted and unpaired dataset of images. Our model can learn without clean data or corrupted image pairs, and instead only requires that the noise distribution is able to be expressed analytically and that the noise at each pixel is independent. We utilize maximum a posteriori estimation as the underlying solution framework, optimizing over the analytically expressed noise generating distribution as the likelihood and employ the GAN as the prior. We then evaluate our method on several popular datasets of varying size and levels of corruption. Further we directly compare the numerical results of our experiments to that of the current state of the art unsupervised denoising model. While our proposed approach's experiments do not achieve a new state of the art, it provides an alternative method to unsupervised denoising and shows strong promise as an area for future research and untapped potential. GAN Unsupervised Learning Inverse Problems Machine Learning Denoising
30	Building Energy Profile Clustering Based on Energy Consumption Patterns Afzalan, Milad 06 1900 (has links) With the widespread adoption of smart meters in buildings, an unprecedented amount of high- resolution energy data is released, which provides opportunities to understand building consumption patterns. Accordingly, research efforts have employed data analytics and machine learning methods for the segmentation of consumers based on their load profiles, which help utilities and energy providers for customized/personalized targeting for energy programs. However, building energy segmentation methodologies may present oversimplified representations of load shapes, which do not properly capture the realistic energy consumption patterns, in terms of temporal shapes and magnitude. In this thesis, we introduce a clustering technique that is capable of preserving both temporal patterns and total consumption of load shapes from customers’ energy data. The proposed approach first overpopulates clusters as the initial stage to preserve the accuracy and merges the similar ones to reduce redundancy in the second stage by integrating time-series similarity techniques. For such a purpose, different time-series similarity measures based on Dynamic Time Warping (DTW) are employed. Furthermore, evaluations of different unsupervised clustering methods such as k-means, hierarchical clustering, fuzzy c-means, and self-organizing map were presented on building load shape portfolios, and their performance were quantitatively and qualitatively compared. The evaluation was carried out on real energy data of ~250 households. The comparative assessment (both qualitatively and quantitatively) demonstrated the applicability of the proposed approach compared to benchmark techniques for power time-series clustering of household load shapes. The contribution of this thesis is to: (1) present a comparative assessment of clustering techniques on household electricity load shapes and highlighting the inadequacy of conventional validation indices for choosing the cluster number and (2) propose a two-stage clustering approach to improve the representation of temporal patterns and magnitude of household load shapes. / M.S. / With the unprecedented amount of data collected by smart meters, we have opportunities to systematically analyze the energy consumption patterns of households. Specifically, through using data analytics methods, one could cluster a large number of energy patterns (collected on a daily basis) into a number of representative groups, which could reveal actionable patterns for electric utilities for energy planning. However, commonly used clustering approaches may not properly show the variation of energy patterns or energy volume of customers at a neighborhood scale. Therefore, in this thesis, we introduced a clustering approach to improve the cluster representation by preserving the temporal shapes and energy volume of daily profiles (i.e., the energy data of a household collected during 1 day). In the first part of the study, we evaluated several well-known clustering techniques and validation indices in the literature and showed that they do not necessarily work well for this domain-specific problem. As a result, in the second part, we introduced a two-stage clustering technique to extract the typical energy consumption patterns of households. Different visualization and quantified metrics are shown for the comparison and applicability of the methods. A case-study on several datasets comprising more than 250 households was considered for evaluation. The findings show that datasets with more than thousands of observations can be clustered into 10-50 groups through the introduced two-stage approach, while reasonably maintaining the energy patterns and energy volume of individual profiles. Clustering Unsupervised learning Segmentation Smart gird Energy consumption

Search results