Global ETD Search

81	Understanding Traffic Cruising Causation : Via Parking Data Enhancement Jasarevic, Mirza January 2021 (has links) Background. Some computer scientists have recently pointed out that it may be more effective for the computer science community to focus more on data preparation for performance improvements, rather than exclusively comparing modeling techniques.Testing how useful this shift in focus is, this paper chooses a particular data extraction technique to examine the differences in data model performance. Objectives. Five recent (2016-2020) studies concerning modeling parking congestion have used a rationalized approach to feature extraction rather than a measured approach. Their main focus was to select modeling techniques to find the best performance. Instead, this study picks a feature common to them all and attempts to improve it. It is then compared to the performance of the feature when it retains the state it had in the related studies. Weights are applied to the selected features, and altered, rather than using several modeling techniques. Specifically in the case of time series parking data, as the opportunity appeared in that sector. Apart from this, the reusability of the data is also gauged. Methods. An experimental case study is designed in three parts. The first tests the importance of weighted sum configurations relative to drivers' expectations. The second analyzes how much data can be recycled from the real data, and whether spatial or temporal comparisons are better for data synthesis of parking data. The third part compares the performance of the best configuration against the default configuration using k-means clustering algorithm and dynamic time warping distance. Results. The experimental results show performance improvements on all levels, and increasing improvement as the sample sizes grow, up to 9% average improvement per category, 6.2% for the entire city. The popularity of a parking lot turned out to be as important as occupancy rates(50% importance each), while volatility was obstructive. A few months were recyclable, and a few small parking lots could replace each other's datasets. Temporal aspects turned out to be better for parking data simulations than spatial aspects. Conclusions. The results support the data scientists' belief that quality- and quantity improvements of data are more important than creating more, new types of models. The score can be used as a better metric for parking congestion rates, for both drivers and managers. It can be employed in the public sphere under the condition that higher quality, richer data are provided. Machine Learning Unsupervised Learning Time Series Parking Dynamic Time Warping Computer Sciences Datavetenskap (datalogi)
82	Latent analysis of unsupervised latent variable models in fault diagnostics of rotating machinery under stationary and time-varying operating conditions Balshaw, Ryan January 2020 (has links) Vibration-based condition monitoring is a key and crucial element for asset longevity and to avoid unexpected financial compromise. Currently, data-driven methodologies often require significant investments into data acquisition and a large amount of operational data for both healthy and unhealthy cases. The acquisition of unhealthy fault data is often financially infeasible and the result is that most methods detailed in literature are not suitable for critical industrial applications. In this work, unsupervised latent variable models negate the requirement for asset fault data. These models operate by learning the representation of healthy data and utilise health indicators to track deviance from this representation. A variety of latent variable models are compared, namely: Principal Component Analysis, Variational Auto-Encoders and Generative Adversarial Network-based methods. This research investigated the relationship between time-series data and latent variable model design under the sensible notion of data interpretation, the influence of model complexity on result performance on different datasets and shows that the latent manifold, when untangled and traversed in a sensible manner, is indicative of damage. Three latent health indicators are proposed in this work and utilised in conjunction with a proposed temporal preservation approach. The performance is compared over the different models. It was found that these latent health indicators can augment standard health indicators and benefit model performance. This allows one to compare the performance of different latent variable models, an approach that has not been realised in previous work as the interpretation of the latent manifold and the manifold response to anomalous instances had not been explored. If all aspects of a latent variable model are systematically investigated and compared, different models can be analysed on a consistent platform. In the model analysis step, a latent variable model is used to evaluate the available data such that the health indicators used to infer the health state of an asset, are available for analysis and comparison. The datasets investigated in this work consist of stationary and time-varying operating conditions. The objective was to determine whether deep learning is comparable or on par with state-of-the-art signal processing techniques. The results showed that damage is detectable in both the input space and the latent space and can be trended to identify clear condition deviance points. This highlights that both spaces are indicative of damage when analysed in a sensible manner. A key take away from this work is that for data that contains impulsive components that manifest naturally and not due to the presence of a fault, the anomaly detection procedure may be limited by inherent assumptions made in model formulations concerning Gaussianity. This work illustrates how the latent manifold is useful for the detection of anomalous instances, how one must consider a variety of latent-variable model types and how subtle changes to data processing can benefit model performance analysis substantially. For vibration-based condition monitoring, latent variable models offer significant improvements in fault diagnostics and reduce the requirement for expert knowledge. This can ultimately improve asset longevity and the investment required from businesses in asset maintenance. / Dissertation (MEng (Mechanical Engineering))--University of Pretoria, 2020. / Eskom Power Plant Engineering Institute (EPPEI) / UP Postgraduate Bursary / Mechanical and Aeronautical Engineering / MEng (Mechanical Engineering) / Unrestricted Latent Variable Models Unsupervised Learning Latent Analysis Temporal Preservation Time-Varying Operating Conditions UCTD
83	Vliv selekce příznaků metodou HFS na shlukovou analýzu / Effect of HFS Based Feature Selection on Cluster Analysis Malásek, Jan January 2015 (has links) Master´s thesis is focused on cluster analysis. Clustering has its roots in many areas, including data mining, statistics, biology and machine learning. The aim of this thesis is to elaborate a recherche of cluster analysis methods, methods for determining number of clusters and a short survey of feature selection methods for unsupervised learning. The very important part of this thesis is software realization for comparing different cluster analysis methods focused on finding optimal number of clusters and sorting data points into correct classes. The program also consists of feature selection HFS method implementation. Experimental methods validation was processed in Matlab environment. The end of master´s thesis compares success of clustering methods using data with known output classes and assesses contribution of feature selection HFS method for unsupervised learning for quality of cluster analysis.
84	Unsupervised Attributed Graph Learning: Models and Applications January 2019 (has links) abstract: Graph is a ubiquitous data structure, which appears in a broad range of real-world scenarios. Accordingly, there has been a surge of research to represent and learn from graphs in order to accomplish various machine learning and graph analysis tasks. However, most of these efforts only utilize the graph structure while nodes in real-world graphs usually come with a rich set of attributes. Typical examples of such nodes and their attributes are users and their profiles in social networks, scientific articles and their content in citation networks, protein molecules and their gene sets in biological networks as well as web pages and their content on the Web. Utilizing node features in such graphs---attributed graphs---can alleviate the graph sparsity problem and help explain various phenomena (e.g., the motives behind the formation of communities in social networks). Therefore, further study of attributed graphs is required to take full advantage of node attributes. In the wild, attributed graphs are usually unlabeled. Moreover, annotating data is an expensive and time-consuming process, which suffers from many limitations such as annotators’ subjectivity, reproducibility, and consistency. The challenges of data annotation and the growing increase of unlabeled attributed graphs in various real-world applications significantly demand unsupervised learning for attributed graphs. In this dissertation, I propose a set of novel models to learn from attributed graphs in an unsupervised manner. To better understand and represent nodes and communities in attributed graphs, I present different models in node and community levels. In node level, I utilize node features as well as the graph structure in attributed graphs to learn distributed representations of nodes, which can be useful in a variety of downstream machine learning applications. In community level, with a focus on social media, I take advantage of both node attributes and the graph structure to discover not only communities but also their sentiment-driven profiles and inter-community relations (i.e., alliance, antagonism, or no relation). The discovered community profiles and relations help to better understand the structure and dynamics of social media. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2019 Computer science Attributed Graphs Attributed Networks Graph Learning Unsupervised Attributed Graph Learning Unsupervised Learning
85	Statistical and Computational Models for Whole Word Morphology Janicki, Maciej 09 September 2019 (has links) Das Ziel dieser Arbeit ist die Formulierung eines Ansatzes zum maschinellen Lernen von Sprachmorphologie, in dem letztere als Zeichenkettentransformationen auf ganzen Wörtern, und nicht als Zerlegung von Wörtern in kleinere stukturelle Einheiten, modelliert wird. Der Beitrag besteht aus zwei wesentlichen Teilen: zum einen wird ein Rechenmodell formuliert, in dem morphologische Regeln als Funktionen auf Zeichenketten definiert sind. Solche Funktionen lassen sich leicht zu endlichen Transduktoren übersetzen, was eine solide algorithmische Grundlage für den Ansatz liefert. Zum anderen wird ein statistisches Modell für Graphen von Wortab\-leitungen eingeführt. Die Inferenz in diesem Modell erfolgt mithilfe des Monte Carlo Expectation Maximization-Algorithmus und die Erwartungswerte über Graphen werden durch einen Metropolis-Hastings-Sampler approximiert. Das Modell wird auf einer Reihe von praktischen Aufgaben evaluiert: Clustering flektierter Formen, Lernen von Lemmatisierung, Vorhersage von Wortart für unbekannte Wörter, sowie Generierung neuer Wörter. info:eu-repo/classification/ddc/000 ddc:000
86	Summarization and keyword extraction on customer feedback data : Comparing different unsupervised methods for extracting trends and insight from text Skoghäll, Therése, Öhman, David January 2022 (has links) Polestar has during the last couple of months more than doubled its amount of customer feedback, and the forecast for the future is that this amount will increase even more. Manually reading this feedback is expensive and time-consuming, and for this reason there's a need to automatically analyse the customer feedback. The company wants to understand the customer and extract trends and topics that concerns the consumer in order to improve the customer experience. Over the last couple of years as Natural Language Processing developed immensely, new state of the art language models have pushed the boundaries in all type of benchmark tasks. In this thesis have three different extractive summarization models and three different keyword extraction methods been tested and evaluated based on two different quantitative measures and human evaluation to extract information from text. This master thesis has shown that extractive summarization models with a Transformer-based text representation are best at capturing the context in a text. Based on the quantitative results and the company's needs, Textrank with a Transformer-based embedding was chosen as the final extractive summarization model. For Keywords extraction was the best overall model YAKE!, based on the quantitative measure and human validation Unsupervised learning Natural Language Processing Text Summarization Keyword Extraction K-means YAKE! BERT Mathematics Matematik
87	Graph-based Multi-view Clustering for Continuous Pattern Mining Åleskog, Christoffer January 2021 (has links) Background. In many smart monitoring applications, such as smart healthcare, smart building, autonomous cars etc., data are collected from multiple sources and contain information about different perspectives/views of the monitored phenomenon, physical object, system. In addition, in many of those applications the availability of relevant labelled data is often low or even non-existing. Inspired by this, in this thesis study we propose a novel algorithm for multi-view stream clustering. The algorithm can be applied for continuous pattern mining and labeling of streaming data. Objectives. The main objective of this thesis is to develop and implement a novel multi-view stream clustering algorithm. In addition, the potential of the proposed algorithm is studied and evaluated on two datasets: synthetic and real-world. The conducted experiments study the new algorithm’s performance compared to a single-view clustering algorithm and an algorithm without transferring knowledge between chunks. Finally, the obtained results are analyzed, discussed and interpreted. Methods. Initially, we study the state-of-the-art multi-view (stream) clustering algorithms. Then we develop our multi-view clustering algorithm for streaming data by implementing transfer of knowledge feature. We present and explain in details the developed algorithm by motivating each choice made during the algorithm design phase. Finally, discussion of the algorithm configuration, experimental setup and the datasets chosen for the experiments are presented and motivated. Results. Different configurations of the proposed algorithm have been studied and evaluated under different experimental scenarios on two different datasets: synthetic and real-world. The proposed multi-view clustering algorithm has demonstrated higher performance on the synthetic data than on the real-world dataset. This is mainly due to not very good quality of the used real-world data. Conclusions. The proposed algorithm has demonstrated higher performance results on the synthetic dataset than on the real-world dataset. It can generate high-quality clustering solutions with respect to the used evaluation metrics. In addition, the transfer of knowledge feature has been shown to have a positive effect on the algorithm performance. A further study of the proposed algorithm on other richer and more suitable datasets, e.g., data collected from numerous sensors used for monitoring some phenomenon, is planned to be conducted in the future work. Machine Learning Unsupervised Learning Multi-view Clustering Data Stream Mining Pattern Mining Computer Sciences Datavetenskap (datalogi)
88	Apprentissage de structures dans les valeurs extrêmes en grande dimension / Discovering patterns in high-dimensional extremes Chiapino, Maël 28 June 2018 (has links) Nous présentons et étudions des méthodes d’apprentissage non-supervisé de phénomènes extrêmes multivariés en grande dimension. Dans le cas où chacune des distributions marginales d’un vecteur aléatoire est à queue lourde, l’étude de son comportement dans les régions extrêmes (i.e. loin de l’origine) ne peut plus se faire via les méthodes usuelles qui supposent une moyenne et une variance finies. La théorie des valeurs extrêmes offre alors un cadre adapté à cette étude, en donnant notamment une base théorique à la réduction de dimension à travers la mesure angulaire. La thèse s’articule autour de deux grandes étapes : - Réduire la dimension du problème en trouvant un résumé de la structure de dépendance dans les régions extrêmes. Cette étape vise en particulier à trouver les sous-groupes de composantes étant susceptible de dépasser un seuil élevé de façon simultané. - Modéliser la mesure angulaire par une densité de mélange qui suit une structure de dépendance déterminée à l’avance. Ces deux étapes permettent notamment de développer des méthodes de classification non-supervisée à travers la construction d’une matrice de similarité pour les points extrêmes. / We present and study unsupervised learning methods of multivariate extreme phenomena in high-dimension. Considering a random vector on which each marginal is heavy-tailed, the study of its behavior in extreme regions is no longer possible via usual methods that involve finite means and variances. Multivariate extreme value theory provides an adapted framework to this study. In particular it gives theoretical basis to dimension reduction through the angular measure. The thesis is divided in two main part: - Reduce the dimension by finding a simplified dependence structure in extreme regions. This step aim at recover subgroups of features that are likely to exceed large thresholds simultaneously. - Model the angular measure with a mixture distribution that follows a predefined dependence structure. These steps allow to develop new clustering methods for extreme points in high dimension. Théorie des valeurs extrêmes Apprentissage non-supervisé Réduction de dimension Clustering Extreme value theory Unsupervised learning Dimension reduction Clustering
89	Application of Autoencoder Ensembles in Anomaly and Intrusion Detection using Time-Based Analysis Mathur, Nitin O. January 2020 (has links) No description available. Information Technology Intrusion Detection Autoencoder Unsupervised learning Anomaly Detection CICIDS2017 Neural Networks
90	Unsupervised Learning for Structure from Motion Örjehag, Erik January 2021 (has links) Perception of depth, ego-motion and robust keypoints is critical for SLAM andstructure from motion applications. Neural networks have achieved great perfor-mance in perception tasks in recent years. But collecting labeled data for super-vised training is labor intensive and costly. This thesis explores recent methodsin unsupervised training of neural networks that can predict depth, ego-motion,keypoints and do geometric consensus maximization. The benefit of unsuper-vised training is that the networks can learn from raw data collected from thecamera sensor, instead of labeled data. The thesis focuses on training on imagesfrom a monocular camera, where no stereo or LIDAR data is available. The exper-iments compare different techniques for depth and ego-motion prediction fromprevious research, and shows how the techniques can be combined successfully.A keypoint prediction network is evaluated and its performance is comparedwith the ORB detector provided by OpenCV. A geometric consensus network isalso implemented and its performance is compared with the RANSAC algorithmin OpenCV. The consensus maximization network is trained on the output of thekeypoint prediction network. For future work it is suggested that all networkscould be combined and trained jointly to reach a better overall performance. Theresults show (1) which techniques in unsupervised depth prediction are most ef-fective, (2) that the keypoint predicting network outperformed the ORB detector,and (3) that the consensus maximization network was able to classify outlierswith comparable performance to the RANSAC algorithm of OpenCV. sfm structure from motion depth ego-motion unsupervised learning consensus maximization Computer Sciences Datavetenskap (datalogi)

Search results