Spelling suggestions: "subject:"unsupervised 1earning."" "subject:"unsupervised c1earning.""
91 |
Unsupervised Clustering of Behavior Data From a Parking Application : A Heuristic and Deep Learning Approach / Oövervakad klustring av beteendedata från en parkeringsapplikation : En heuristisk och djupinlärningsmetodMagnell, Edvard, Nordling, Joakim January 2023 (has links)
This report aims to present a project in the field of unsupervised clustering on human behavior in a parking application. With increasing opportunities to collect and store data, the demands to utilize the data in meaningful ways also increase. The purpose of this work is to explore common behaviors within the app and what those reveal about its usage. Transforming event based data into user sessions was the first step. The next step was to establish how to measure the similarity between sequences. This was achieved using two different approaches. One approach based on a combination of string metrics and heuristics. The other approach creates array representations of the sessions using an autoencoder. With these two ways of representing the similarity between sessions, we utilize clustering algorithms to assign labels to all sessions. Due to the unknown attributes of the data set, the versatile clustering algorithm HDBSCAN was employed on both representations of the session separately. The clusters produced by HDBSCAN were compared to those produced by simple partitioning algorithms. The noisy nature of human behavior allowed HDBSCAN to create better clusters with distinct behaviors in comparison to the simpler partitioning algorithms. Without a ground truth to rely on, evaluating the models proved to be a difficult part of the project. We utilized both quantitative metrics, as well as qualitative methods for evaluation. In conclusion, our work provides a new way of evaluating user behavior. It brings new insights into different ways the customer achieves their goals within the app. And finally it lays ground for connecting user behavior with transaction data. / Denna rapport syftar till att presentera ett projekt inom oövervakat klustrande av mänskligt beteende i en parkeringsapplikation. Med ökande möjligheter att samla in och lagra data ökar också kraven på att använda informationen på meningsfulla sätt. Syftet med detta arbete är att undersöka vanligt förekommande beteenden inom applikationen och vad dessa avslöjar om användningen. Första steget var att omvandla händelsesbaserad data till användarsessioner. Nästa steg var att etablera hur man mäter likheten mellan sekvenser. Detta uppnåddes genom att använda två olika metoder. Första metoden var baserad på en kombination av strängmått och heuristik. Den andra metoden skapade vektorreprestation av sessionerna med hjälp av en autokodare. Med dessa två sätt att representera likheten mellan sessioner användes klustringsalgoritmer för att tilldela etiketter till alla sessioner. På grund av de okända attributen hos datasetet applicerades den mångsidiga klustringsalgoritmen HDBSCAN för båda representationer av sessionerna. Klustren som skapades från HDBSCAN jämfördes med de kluster som skapades med hjälp av enkla partitioneringsalgoritmer. Bruset som mänskligt beteende medför gjorde att HDBSCAN kunde skapa bättre kluster med tydliga beteenden jämfört med de simpla partitionsalgoritmerna. Utan en grundläggande sanning att utgå ifrån visade sig utvärderingen av modellerna vara en svår del av projektet. Vi använde både kvantitativa mätvärden och kvalitativa metoder för utvärderingen. Sammanfattningsvis resulterade vårt arbete i ett nytt sätt att utvärdera användarbeteende. Vidare skapades nya insikter kring de olika sätt som användare navigerar applikationen för att uträtta olika ärenden. Slutligen lägger arbetet grunden för att koppla samman användarbeteende med transaktionsdata i framtida projekt.
|
92 |
Unsupervised Learning Using Change Point Features Of Time-Series Data For Improved PHMDai, Honghao 05 June 2023 (has links)
No description available.
|
93 |
A Data Analytic Methodology for Materials InformaticsAbuOmar, Osama Yousef 17 May 2014 (has links)
A data analytic materials informatics methodology is proposed after applying different data mining techniques on some datasets of particular domain in order to discover and model certain patterns, trends and behavior related to that domain. In essence, it is proposed to develop an information mining tool for vapor-grown carbon nanofiber (VGCNF)/vinyl ester (VE) nanocomposites as a case study. Formulation and processing factors (VGCNF type, use of a dispersing agent, mixing method, and VGCNF weight fraction) and testing temperature were utilized as inputs and the storage modulus, loss modulus, and tan delta were selected as outputs or responses. The data mining and knowledge discovery algorithms and techniques included self-organizing maps (SOMs) and clustering techniques. SOMs demonstrated that temperature had the most significant effect on the output responses followed by VGCNF weight fraction. A clustering technique, i.e., fuzzy C-means (FCM) algorithm, was also applied to discover certain patterns in nanocomposite behavior after using principal component analysis (PCA) as a dimensionality reduction technique. Particularly, these techniques were able to separate the nanocomposite specimens into different clusters based on temperature and tan delta features as well as to place the neat VE specimens in separate clusters. In addition, an artificial neural network (ANN) model was used to explore the VGCNF/VE dataset. The ANN was able to predict/model the VGCNF/VE responses with minimal mean square error (MSE) using the resubstitution and 3olds cross validation (CV) techniques. Furthermore, the proposed methodology was employed to acquire new information and mechanical and physical patterns and trends about not only viscoelastic VGCNF/VE nanocomposites, but also about flexural and impact strengths properties for VGCNF/ VE nanocomposites. Formulation and processing factors (curing environment, use or absence of dispersing agent, mixing method, VGCNF fiber loading, VGCNF type, high shear mixing time, sonication time) and testing temperature were utilized as inputs and the true ultimate strength, true yield strength, engineering elastic modulus, engineering ultimate strength, flexural modulus, flexural strength, storage modulus, loss modulus, and tan delta were selected as outputs. This work highlights the significance and utility of data mining and knowledge discovery techniques in the context of materials informatics.
|
94 |
Improving Variational Autoencoders on Robustness, Regularization, and Task-Invariance / ロバスト性,正則化,タスク不変性に関する変分オートエンコーダの改善Hiroshi, Takahashi 23 March 2023 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24725号 / 情博第813号 / 新制||情||137(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 鹿島 久嗣, 教授 山本 章博, 教授 吉川 正俊 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
95 |
Correcting for Patient Breathing Motion in PET ImagingO'Briain, Teaghan 26 August 2022 (has links)
Positron emission tomography (PET) requires imaging times that last several minutes long. Therefore, when imaging areas that are prone to respiratory motion, blurring effects are often observed. This blurring can impair our ability to use these images for diagnostics purposes as well for treatment planning. While there are methods that are used to account for this effect, they often rely on adjustments to the imaging protocols in the form of longer scan times or subjecting the patient to higher doses of radiation. This dissertation explores an alternative approach that leverages state-of-the-art deep learning techniques to align the PET signal acquired at different points of the breathing motion. This method does not require adjustments to standard clinical protocols; and therefore, is more efficient and/or safer than the most widely adopted approach. To help validate this method, Monte Carlo (MC) simulations were conducted to emulate the PET imaging process, which represent the focus of our first experiment. The next experiment was the development and testing of our motion correction method.
A clinical four-ring PET imaging system was modelled using GATE (v. 9.0). To validate the simulations, PET images were acquired of a cylindrical phantom, point source, and image quality phantom with the modeled system and the experimental procedures were also simulated. The simulations were compared against the measurements in terms of their count rates and sensitivity as well as their image uniformity, resolution, recovery coefficients, coefficients of variation, contrast, and background variability. When compared to the measured data, the number of true detections in the MC simulations was within 5%. The scatter fraction was found to be (31.1 ± 1.1)% and (29.8 ± 0.8)% in the measured and simulated scans, respectively. Analyzing the measured and simulated sinograms, the sensitivities were found to be 10.0 cps/kBq and 9.5 cps/kBq, respectively. The fraction of random coincidences were 19% in the measured data and 25% in the simulation. When calculating the image uniformity within the axial slices, the measured image exhibited a uniformity of (0.015 ± 0.005), while the simulated image had a uniformity of (0.029 ± 0.011). In the axial direction, the uniformity was measured to be (0.024 ± 0.006) and (0.040 ± 0.015) for the measured and simulated data, respectively. Comparing the image resolution, an average percentage difference of 2.9% was found between the measurements and simulations. The recovery coefficients calculated in both the measured and simulated images were found to be within the EARL ranges, except for that of the simulation of the smallest sphere. The coefficients of variation for the measured and simulated images were found to be 12% and 13%, respectively. Lastly, the background variability was consistent between the measurements and simulations, while the average percentage difference in the sphere contrasts was found to be 8.8%. The code used to run the GATE simulations and evaluate the described metrics has been made available (https://github.com/teaghan/PET_MonteCarlo).
Next, to correct for breathing motion in PET imaging, an interpretable and unsupervised deep learning technique, FlowNet-PET, was constructed. The network was trained to predict the optical flow between two PET frames from different breathing amplitude ranges. As a result, the trained model groups different retrospectively-gated PET images together into a motion-corrected single bin, providing a final image with similar counting statistics as a non-gated image, but without the blurring effects that were initially observed. As a proof-of-concept, FlowNet-PET was applied to anthropomorphic digital phantom data, which provided the possibility to design robust metrics to quantify the corrections. When comparing the predicted optical flows to the ground truths, the median absolute error was found to be smaller than the pixel and slice widths, even for the phantom with a diaphragm movement of 21 mm. The improvements were illustrated by comparing against images without motion and computing the intersection over union (IoU) of the tumors as well as the enclosed activity and coefficient of variation (CoV) within the no-motion tumor volume before and after the corrections were applied. The average relative improvements provided by the network were 54%, 90%, and 76% for the IoU, total activity, and CoV, respectively. The results were then compared against the conventional retrospective phase binning approach. FlowNet-PET achieved similar results as retrospective binning, but only required one sixth of the scan duration. The code and data used for training and analysis has been made publicly available (https://github.com/teaghan/FlowNet_PET).
The encouraging results provided by our motion correction method present the opportunity for many possible future applications. For instance, this method can be transferred to clinical patient PET images or applied to alternative imaging modalities that would benefit from similar motion corrections. When applied to clinical PET images, FlowNet-PET would provide the capability of acquiring high quality images without the requirement for either longer scan times or subjecting the patients to higher doses of radiation. Accordingly, the imaging process would likely become more efficient and/or safer, which would be appreciated by both the health care institutions and their patients. / Graduate
|
96 |
Unsupervised learning with mixed type data : for detecting money laundering / Klusteranalys av heterogen dataEngardt, Sara January 2018 (has links)
The purpose of this master's thesis is to perform a cluster analysis on parts of Handelsbanken's customer database. The ambition is to explore if this could be of aid in identifying type customers within risk of illegal activities such as money laundering. A literature study is conducted to help determine which of the clustering methods described in the literature are most suitable for the current problem. The most important constraints of the problem are that the data consists of mixed type attributes (categorical and numerical) and the large presence of outliers in the data. An extension to the self-organising map as well as the k-prototypes algorithms were chosen for the clustering. It is concluded that clusters exist in the data, however in the presence of outliers. More work is needed on handling missing values in the dataset. / Syftet med denna masteruppsats är att utföra en klusteranalys på delar av Handelsbankens kunddatabas. Tanken är att undersöka ifall detta kan vara till hjälp i att identifiera typkunder inom olagliga aktiviteter såsom penningtvätt. Först genomförs en litteraturstudie för att undersöka vilken algoritm som är bäst lämpad för att lösa problemet. Kunddatabasen består av data med både numeriska och kategoriska attribut. Ett utökat Kohonen-nätverk (eng: self-organising map) samt k-prototyp algoritmen används för klustringen. Resultaten visar att det finns kluster i datat, men i närvaro av brus. Mer arbete behöver göras för att hantera tomma värden bland attributen.
|
97 |
Domain Expertise–Agnostic Feature Selection for the Analysis of Breast Cancer DataPozzoli, Susanna January 2019 (has links)
At present, high-dimensional data sets are becoming more and more frequent. The problem of feature selection has already become widespread, owing to the curse of dimensionality. Unfortunately, feature selection is largely based on ground truth and domain expertise. It is possible that ground truth and/or domain expertise will be unavailable, therefore there is a growing need for unsupervised feature selection in multiple fields, such as marketing and proteomics.Now, unlike in past time, it is possible for biologists to measure the amount of protein in a cancer cell. No wonder the data is high-dimensional, the human body is composed of thousands and thousands of proteins. Intuitively, only a handful of proteins cause the onset of the disease. It might be desirable to cluster the cancer sufferers, but at the same time we want to find the proteins that produce good partitions.We hereby propose a methodology designed to find the features able to maximize the clustering performance. After we divided the proteins into different groups, we clustered the patients. Next, we evaluated the clustering performance. We developed a couple of pipelines. Whilst the first focuses its attention on the data provided by the laboratory, the second takes advantage both of the external data on protein complexes and of the internal data. We set the threshold of clustering performance thanks to the biologists at Karolinska Institutet who contributed to the project.In the thesis we show how to make a good selection of features without domain expertise in case of breast cancer data. This experiment illustrates how we can reach a clustering performance up to eight times better than the baseline with the aid of feature selection. / Högdimensionella dataseter blir allt vanligare. Problemet med funktionsval har redan blivit utbrett på grund av dimensionalitetens förbannelse. Dessvärre är funktionsvalet i stor utsträckning baserat på grundläggande sanning och domänkunskap. Det är möjligt att grundläggande sanning och/eller domänkunskap kommer att vara otillgänglig, därför finns det ett växande behov av icke-övervakat funktionsval i flera områden, såsom marknadsföring och proteomics.I nuläge, till skillnad från tidigare, är det möjligt för biologer att mäta mängden protein i en cancercell. Inte undra på att data är högdimensionella, människokroppen består av tusentals och tusentals proteiner. Intuitivt orsakar bara en handfull proteiner sjukdomsuppkomsten. Det kan vara önskvärt att klustrera cancerlidarna, men samtidigt vill vi hitta proteiner som producerar goda partitioner.Vi föreslår härmed en metod som är utformad för att hitta funktioner som kan maximera klustringsprestandan. Efter att vi delat proteinerna i olika grupper klustrade vi patienterna. Därefter utvärderade vi klustringsprestandan. Vi utvecklade ett par pipelines. Medan den första fokuserar på de data som laboratoriet tillhandahåller, utnyttjar den andra både extern data på proteinkomplex och intern data. Vi ställde gränsen för klusterprestationen tack vare biologerna vid Karolinska Institutet som bidragit till projektet.I avhandlingen visar vi hur man gör ett bra utbud av funktioner utan domänkompetens vid bröstcancerdata. Detta experiment illustrerar hur vi kan nå en klusterprestation upp till åtta gånger bättre än baslinjen med hjälp av funktionsval.
|
98 |
Machine learning and spending patterns : A study on the possibility of identifying riskily spending behaviour / Maskininlärning och utgiftsmönsterHolm, Mathias January 2018 (has links)
The aim of this study is to research the possibility of using customer transactional data to identify spending patterns among individuals, that in turn can be used to assess creditworthiness. Two different approaches to unsupervised clustering are used and compared in the study, one being K-means and the other an hierarchical approach. The features used in both clustering techniques are extracted from customer transactional data collected from the customers banks. Internal cluster validity indices and credit scores, calculated by credit institutes, are used to evaluate the results of the clustering techniques. Based on the experiments in this report, we believe that the approach exhibit interesting results and that further research with evaluation on a larger dataset is desired. Proposed future work is to append additional features to the models and study the effect on the resulting clusters. / Målet med detta arbete är att studera möjligheten att använda data om individers kontotransaktioner för att identifiera utgiftsmönster hos individer, som i sin tur kan användas för att utvärdera kreditvärdighet. Två olika tillvägagångssätt som använder oövervakad klustring (eng. unsupervised clustering) används och utvärderas i rapporten, den ena är K-means och den andra är en hierarkisk teknik. De attribut (eng. features) som används i de båda klustrings teknikerna utvinns från data som innehåller kontotransaktioner och som erhålls från banker. Interna kluster värde index (eng. cluster validity indices) och individers riskprognoser, som beräknats av ett kreditinstitut, används för att utvärdera resultaten från klustrings teknikerna. Vi menar att resultaten som presenteras i denna rapport visar att målet till viss del uppnåtts, men att mer data och forskning krävs. Vidare forskning som föreslås är att lägga till fler attribut (eng. features) till modellerna och utvärdera effekten på de resulterande klusterna.
|
99 |
Generative Image Transformer (GIT): unsupervised continuous image generative and transformable model for [¹²³I]FP CIT SPECT images / 画像生成Transformer(GIT):[¹²³I]FP-CIT SPECT画像における教師なし連続画像生成変換モデルWatanabe, Shogo 23 March 2022 (has links)
京都大学 / 新制・課程博士 / 博士(人間健康科学) / 甲第23825号 / 人健博第96号 / 新制||人健||7(附属図書館) / 京都大学大学院医学研究科人間健康科学系専攻 / (主査)教授 椎名 毅, 教授 精山 明敏, 教授 中本 裕士 / 学位規則第4条第1項該当 / Doctor of Human Health Sciences / Kyoto University / DFAM
|
100 |
Attentional Parsing NetworksKarr, Marcus 01 December 2020 (has links) (PDF)
Convolutional neural networks (CNNs) have dominated the computer vision field since the early 2010s, when deep learning largely replaced previous approaches like hand-crafted feature engineering and hierarchical image parsing. Meanwhile transformer architectures have attained preeminence in natural language processing, and have even begun to supplant CNNs as the state of the art for some computer vision tasks.
This study proposes a novel transformer-based architecture, the attentional parsing network, that reconciles the deep learning and hierarchical image parsing approaches to computer vision. We recast unsupervised image representation as a sequence-to-sequence translation problem where image patches are mapped to successive layers of latent variables; and we enforce symmetry and sparsity constraints to encourage these mappings take the form of a parse tree.
We measure the quality of learned representations by passing them to a classifier and find high accuracy (> 90%) for even small models. We also demonstrate controllable image generation: first by “back translating” from latent variables to pixels, and then by selecting subsets of those variables with attention masks. Finally we discuss our design choices and compare them with alternatives, suggesting best practices and possible areas of improvement.
|
Page generated in 0.081 seconds