Spelling suggestions: "subject:"[een] UNSUPERVISED LEARNING"" "subject:"[enn] UNSUPERVISED LEARNING""
71 |
Unsupervised Attributed Graph Learning: Models and ApplicationsJanuary 2019 (has links)
abstract: Graph is a ubiquitous data structure, which appears in a broad range of real-world scenarios. Accordingly, there has been a surge of research to represent and learn from graphs in order to accomplish various machine learning and graph analysis tasks. However, most of these efforts only utilize the graph structure while nodes in real-world graphs usually come with a rich set of attributes. Typical examples of such nodes and their attributes are users and their profiles in social networks, scientific articles and their content in citation networks, protein molecules and their gene sets in biological networks as well as web pages and their content on the Web. Utilizing node features in such graphs---attributed graphs---can alleviate the graph sparsity problem and help explain various phenomena (e.g., the motives behind the formation of communities in social networks). Therefore, further study of attributed graphs is required to take full advantage of node attributes.
In the wild, attributed graphs are usually unlabeled. Moreover, annotating data is an expensive and time-consuming process, which suffers from many limitations such as annotators’ subjectivity, reproducibility, and consistency. The challenges of data annotation and the growing increase of unlabeled attributed graphs in various real-world applications significantly demand unsupervised learning for attributed graphs.
In this dissertation, I propose a set of novel models to learn from attributed graphs in an unsupervised manner. To better understand and represent nodes and communities in attributed graphs, I present different models in node and community levels. In node level, I utilize node features as well as the graph structure in attributed graphs to learn distributed representations of nodes, which can be useful in a variety of downstream machine learning applications. In community level, with a focus on social media, I take advantage of both node attributes and the graph structure to discover not only communities but also their sentiment-driven profiles and inter-community relations (i.e., alliance, antagonism, or no relation). The discovered community profiles and relations help to better understand the structure and dynamics of social media. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2019
|
72 |
Graph-based Multi-view Clustering for Continuous Pattern MiningÅleskog, Christoffer January 2021 (has links)
Background. In many smart monitoring applications, such as smart healthcare, smart building, autonomous cars etc., data are collected from multiple sources and contain information about different perspectives/views of the monitored phenomenon, physical object, system. In addition, in many of those applications the availability of relevant labelled data is often low or even non-existing. Inspired by this, in this thesis study we propose a novel algorithm for multi-view stream clustering. The algorithm can be applied for continuous pattern mining and labeling of streaming data. Objectives. The main objective of this thesis is to develop and implement a novel multi-view stream clustering algorithm. In addition, the potential of the proposed algorithm is studied and evaluated on two datasets: synthetic and real-world. The conducted experiments study the new algorithm’s performance compared to a single-view clustering algorithm and an algorithm without transferring knowledge between chunks. Finally, the obtained results are analyzed, discussed and interpreted. Methods. Initially, we study the state-of-the-art multi-view (stream) clustering algorithms. Then we develop our multi-view clustering algorithm for streaming data by implementing transfer of knowledge feature. We present and explain in details the developed algorithm by motivating each choice made during the algorithm design phase. Finally, discussion of the algorithm configuration, experimental setup and the datasets chosen for the experiments are presented and motivated. Results. Different configurations of the proposed algorithm have been studied and evaluated under different experimental scenarios on two different datasets: synthetic and real-world. The proposed multi-view clustering algorithm has demonstrated higher performance on the synthetic data than on the real-world dataset. This is mainly due to not very good quality of the used real-world data. Conclusions. The proposed algorithm has demonstrated higher performance results on the synthetic dataset than on the real-world dataset. It can generate high-quality clustering solutions with respect to the used evaluation metrics. In addition, the transfer of knowledge feature has been shown to have a positive effect on the algorithm performance. A further study of the proposed algorithm on other richer and more suitable datasets, e.g., data collected from numerous sensors used for monitoring some phenomenon, is planned to be conducted in the future work.
|
73 |
Apprentissage de structures dans les valeurs extrêmes en grande dimension / Discovering patterns in high-dimensional extremesChiapino, Maël 28 June 2018 (has links)
Nous présentons et étudions des méthodes d’apprentissage non-supervisé de phénomènes extrêmes multivariés en grande dimension. Dans le cas où chacune des distributions marginales d’un vecteur aléatoire est à queue lourde, l’étude de son comportement dans les régions extrêmes (i.e. loin de l’origine) ne peut plus se faire via les méthodes usuelles qui supposent une moyenne et une variance finies. La théorie des valeurs extrêmes offre alors un cadre adapté à cette étude, en donnant notamment une base théorique à la réduction de dimension à travers la mesure angulaire. La thèse s’articule autour de deux grandes étapes : - Réduire la dimension du problème en trouvant un résumé de la structure de dépendance dans les régions extrêmes. Cette étape vise en particulier à trouver les sous-groupes de composantes étant susceptible de dépasser un seuil élevé de façon simultané. - Modéliser la mesure angulaire par une densité de mélange qui suit une structure de dépendance déterminée à l’avance. Ces deux étapes permettent notamment de développer des méthodes de classification non-supervisée à travers la construction d’une matrice de similarité pour les points extrêmes. / We present and study unsupervised learning methods of multivariate extreme phenomena in high-dimension. Considering a random vector on which each marginal is heavy-tailed, the study of its behavior in extreme regions is no longer possible via usual methods that involve finite means and variances. Multivariate extreme value theory provides an adapted framework to this study. In particular it gives theoretical basis to dimension reduction through the angular measure. The thesis is divided in two main part: - Reduce the dimension by finding a simplified dependence structure in extreme regions. This step aim at recover subgroups of features that are likely to exceed large thresholds simultaneously. - Model the angular measure with a mixture distribution that follows a predefined dependence structure. These steps allow to develop new clustering methods for extreme points in high dimension.
|
74 |
Application of Autoencoder Ensembles in Anomaly and Intrusion Detection using Time-Based AnalysisMathur, Nitin O. January 2020 (has links)
No description available.
|
75 |
Unsupervised Learning for Structure from MotionÖrjehag, Erik January 2021 (has links)
Perception of depth, ego-motion and robust keypoints is critical for SLAM andstructure from motion applications. Neural networks have achieved great perfor-mance in perception tasks in recent years. But collecting labeled data for super-vised training is labor intensive and costly. This thesis explores recent methodsin unsupervised training of neural networks that can predict depth, ego-motion,keypoints and do geometric consensus maximization. The benefit of unsuper-vised training is that the networks can learn from raw data collected from thecamera sensor, instead of labeled data. The thesis focuses on training on imagesfrom a monocular camera, where no stereo or LIDAR data is available. The exper-iments compare different techniques for depth and ego-motion prediction fromprevious research, and shows how the techniques can be combined successfully.A keypoint prediction network is evaluated and its performance is comparedwith the ORB detector provided by OpenCV. A geometric consensus network isalso implemented and its performance is compared with the RANSAC algorithmin OpenCV. The consensus maximization network is trained on the output of thekeypoint prediction network. For future work it is suggested that all networkscould be combined and trained jointly to reach a better overall performance. Theresults show (1) which techniques in unsupervised depth prediction are most ef-fective, (2) that the keypoint predicting network outperformed the ORB detector,and (3) that the consensus maximization network was able to classify outlierswith comparable performance to the RANSAC algorithm of OpenCV.
|
76 |
Unsupervised Clustering of Behavior Data From a Parking Application : A Heuristic and Deep Learning Approach / Oövervakad klustring av beteendedata från en parkeringsapplikation : En heuristisk och djupinlärningsmetodMagnell, Edvard, Nordling, Joakim January 2023 (has links)
This report aims to present a project in the field of unsupervised clustering on human behavior in a parking application. With increasing opportunities to collect and store data, the demands to utilize the data in meaningful ways also increase. The purpose of this work is to explore common behaviors within the app and what those reveal about its usage. Transforming event based data into user sessions was the first step. The next step was to establish how to measure the similarity between sequences. This was achieved using two different approaches. One approach based on a combination of string metrics and heuristics. The other approach creates array representations of the sessions using an autoencoder. With these two ways of representing the similarity between sessions, we utilize clustering algorithms to assign labels to all sessions. Due to the unknown attributes of the data set, the versatile clustering algorithm HDBSCAN was employed on both representations of the session separately. The clusters produced by HDBSCAN were compared to those produced by simple partitioning algorithms. The noisy nature of human behavior allowed HDBSCAN to create better clusters with distinct behaviors in comparison to the simpler partitioning algorithms. Without a ground truth to rely on, evaluating the models proved to be a difficult part of the project. We utilized both quantitative metrics, as well as qualitative methods for evaluation. In conclusion, our work provides a new way of evaluating user behavior. It brings new insights into different ways the customer achieves their goals within the app. And finally it lays ground for connecting user behavior with transaction data. / Denna rapport syftar till att presentera ett projekt inom oövervakat klustrande av mänskligt beteende i en parkeringsapplikation. Med ökande möjligheter att samla in och lagra data ökar också kraven på att använda informationen på meningsfulla sätt. Syftet med detta arbete är att undersöka vanligt förekommande beteenden inom applikationen och vad dessa avslöjar om användningen. Första steget var att omvandla händelsesbaserad data till användarsessioner. Nästa steg var att etablera hur man mäter likheten mellan sekvenser. Detta uppnåddes genom att använda två olika metoder. Första metoden var baserad på en kombination av strängmått och heuristik. Den andra metoden skapade vektorreprestation av sessionerna med hjälp av en autokodare. Med dessa två sätt att representera likheten mellan sessioner användes klustringsalgoritmer för att tilldela etiketter till alla sessioner. På grund av de okända attributen hos datasetet applicerades den mångsidiga klustringsalgoritmen HDBSCAN för båda representationer av sessionerna. Klustren som skapades från HDBSCAN jämfördes med de kluster som skapades med hjälp av enkla partitioneringsalgoritmer. Bruset som mänskligt beteende medför gjorde att HDBSCAN kunde skapa bättre kluster med tydliga beteenden jämfört med de simpla partitionsalgoritmerna. Utan en grundläggande sanning att utgå ifrån visade sig utvärderingen av modellerna vara en svår del av projektet. Vi använde både kvantitativa mätvärden och kvalitativa metoder för utvärderingen. Sammanfattningsvis resulterade vårt arbete i ett nytt sätt att utvärdera användarbeteende. Vidare skapades nya insikter kring de olika sätt som användare navigerar applikationen för att uträtta olika ärenden. Slutligen lägger arbetet grunden för att koppla samman användarbeteende med transaktionsdata i framtida projekt.
|
77 |
Unsupervised Learning Using Change Point Features Of Time-Series Data For Improved PHMDai, Honghao 05 June 2023 (has links)
No description available.
|
78 |
A Data Analytic Methodology for Materials InformaticsAbuOmar, Osama Yousef 17 May 2014 (has links)
A data analytic materials informatics methodology is proposed after applying different data mining techniques on some datasets of particular domain in order to discover and model certain patterns, trends and behavior related to that domain. In essence, it is proposed to develop an information mining tool for vapor-grown carbon nanofiber (VGCNF)/vinyl ester (VE) nanocomposites as a case study. Formulation and processing factors (VGCNF type, use of a dispersing agent, mixing method, and VGCNF weight fraction) and testing temperature were utilized as inputs and the storage modulus, loss modulus, and tan delta were selected as outputs or responses. The data mining and knowledge discovery algorithms and techniques included self-organizing maps (SOMs) and clustering techniques. SOMs demonstrated that temperature had the most significant effect on the output responses followed by VGCNF weight fraction. A clustering technique, i.e., fuzzy C-means (FCM) algorithm, was also applied to discover certain patterns in nanocomposite behavior after using principal component analysis (PCA) as a dimensionality reduction technique. Particularly, these techniques were able to separate the nanocomposite specimens into different clusters based on temperature and tan delta features as well as to place the neat VE specimens in separate clusters. In addition, an artificial neural network (ANN) model was used to explore the VGCNF/VE dataset. The ANN was able to predict/model the VGCNF/VE responses with minimal mean square error (MSE) using the resubstitution and 3olds cross validation (CV) techniques. Furthermore, the proposed methodology was employed to acquire new information and mechanical and physical patterns and trends about not only viscoelastic VGCNF/VE nanocomposites, but also about flexural and impact strengths properties for VGCNF/ VE nanocomposites. Formulation and processing factors (curing environment, use or absence of dispersing agent, mixing method, VGCNF fiber loading, VGCNF type, high shear mixing time, sonication time) and testing temperature were utilized as inputs and the true ultimate strength, true yield strength, engineering elastic modulus, engineering ultimate strength, flexural modulus, flexural strength, storage modulus, loss modulus, and tan delta were selected as outputs. This work highlights the significance and utility of data mining and knowledge discovery techniques in the context of materials informatics.
|
79 |
Improving Variational Autoencoders on Robustness, Regularization, and Task-Invariance / ロバスト性,正則化,タスク不変性に関する変分オートエンコーダの改善Hiroshi, Takahashi 23 March 2023 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24725号 / 情博第813号 / 新制||情||137(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 鹿島 久嗣, 教授 山本 章博, 教授 吉川 正俊 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
80 |
Correcting for Patient Breathing Motion in PET ImagingO'Briain, Teaghan 26 August 2022 (has links)
Positron emission tomography (PET) requires imaging times that last several minutes long. Therefore, when imaging areas that are prone to respiratory motion, blurring effects are often observed. This blurring can impair our ability to use these images for diagnostics purposes as well for treatment planning. While there are methods that are used to account for this effect, they often rely on adjustments to the imaging protocols in the form of longer scan times or subjecting the patient to higher doses of radiation. This dissertation explores an alternative approach that leverages state-of-the-art deep learning techniques to align the PET signal acquired at different points of the breathing motion. This method does not require adjustments to standard clinical protocols; and therefore, is more efficient and/or safer than the most widely adopted approach. To help validate this method, Monte Carlo (MC) simulations were conducted to emulate the PET imaging process, which represent the focus of our first experiment. The next experiment was the development and testing of our motion correction method.
A clinical four-ring PET imaging system was modelled using GATE (v. 9.0). To validate the simulations, PET images were acquired of a cylindrical phantom, point source, and image quality phantom with the modeled system and the experimental procedures were also simulated. The simulations were compared against the measurements in terms of their count rates and sensitivity as well as their image uniformity, resolution, recovery coefficients, coefficients of variation, contrast, and background variability. When compared to the measured data, the number of true detections in the MC simulations was within 5%. The scatter fraction was found to be (31.1 ± 1.1)% and (29.8 ± 0.8)% in the measured and simulated scans, respectively. Analyzing the measured and simulated sinograms, the sensitivities were found to be 10.0 cps/kBq and 9.5 cps/kBq, respectively. The fraction of random coincidences were 19% in the measured data and 25% in the simulation. When calculating the image uniformity within the axial slices, the measured image exhibited a uniformity of (0.015 ± 0.005), while the simulated image had a uniformity of (0.029 ± 0.011). In the axial direction, the uniformity was measured to be (0.024 ± 0.006) and (0.040 ± 0.015) for the measured and simulated data, respectively. Comparing the image resolution, an average percentage difference of 2.9% was found between the measurements and simulations. The recovery coefficients calculated in both the measured and simulated images were found to be within the EARL ranges, except for that of the simulation of the smallest sphere. The coefficients of variation for the measured and simulated images were found to be 12% and 13%, respectively. Lastly, the background variability was consistent between the measurements and simulations, while the average percentage difference in the sphere contrasts was found to be 8.8%. The code used to run the GATE simulations and evaluate the described metrics has been made available (https://github.com/teaghan/PET_MonteCarlo).
Next, to correct for breathing motion in PET imaging, an interpretable and unsupervised deep learning technique, FlowNet-PET, was constructed. The network was trained to predict the optical flow between two PET frames from different breathing amplitude ranges. As a result, the trained model groups different retrospectively-gated PET images together into a motion-corrected single bin, providing a final image with similar counting statistics as a non-gated image, but without the blurring effects that were initially observed. As a proof-of-concept, FlowNet-PET was applied to anthropomorphic digital phantom data, which provided the possibility to design robust metrics to quantify the corrections. When comparing the predicted optical flows to the ground truths, the median absolute error was found to be smaller than the pixel and slice widths, even for the phantom with a diaphragm movement of 21 mm. The improvements were illustrated by comparing against images without motion and computing the intersection over union (IoU) of the tumors as well as the enclosed activity and coefficient of variation (CoV) within the no-motion tumor volume before and after the corrections were applied. The average relative improvements provided by the network were 54%, 90%, and 76% for the IoU, total activity, and CoV, respectively. The results were then compared against the conventional retrospective phase binning approach. FlowNet-PET achieved similar results as retrospective binning, but only required one sixth of the scan duration. The code and data used for training and analysis has been made publicly available (https://github.com/teaghan/FlowNet_PET).
The encouraging results provided by our motion correction method present the opportunity for many possible future applications. For instance, this method can be transferred to clinical patient PET images or applied to alternative imaging modalities that would benefit from similar motion corrections. When applied to clinical PET images, FlowNet-PET would provide the capability of acquiring high quality images without the requirement for either longer scan times or subjecting the patients to higher doses of radiation. Accordingly, the imaging process would likely become more efficient and/or safer, which would be appreciated by both the health care institutions and their patients. / Graduate
|
Page generated in 0.0463 seconds