Global ETD Search

1	Vascular plaque detection using texture based segmentation of optical coherence tomography images Ocaña Macias Mariano 14 September 2015 (has links) Abstract Cardiovascular disease is one of the leading causes of death in Canada. Atherosclerosis is considered the primary cause for cardiovascular disease. Optical coherence tomography (OCT) provides a means to minimally invasive imaging and assessment of textural features of atherosclerotic plaque. However, detecting atherosclerotic plaque by visual inspection from Optical Coherence Tomography (OCT) images is usually difficult. Therefore we developed unsupervised segmentation algorithms to automatically detect atherosclerosis plaque from OCT images. We used three different clustering methods to identify atherosclerotic plaque automatically from OCT images. Our method involves data preprocessing of raw OCT images, feature selection and texture feature extraction using the Spatial Gray Level Dependence Matrix method (SGLDM), and the application of three different clustering techniques: K-means, Fuzzy C-means and Gustafson-Kessel algorithms to segment the plaque regions from OCT images and to map the cluster regions (background, vascular tissue, OCT degraded signal region and Atherosclerosis plaque) from the feature-space back to the original preprocessed OCT image. We validated our results by comparing our segmented OCT images with actual photographic images of vascular tissue with plaque. / October 2015
2	The development of a methodology for automated sorting in the minerals industry Fitzpatrick, Robert Stuart January 2008 (has links) The objective of this research project was to develop a methodology to establish the potential of automated sorting for a minerals application. Such methodologies, have been developed for testwork in many established mineral processing disciplines. These techniques ensure that data is reproducible and that testing can be undertaken in a quick and efficient manner. Due to the relatively recent development of automated sorters as a mineral processing technique, such guidelines have yet to be established. The methodology developed was applied to two practical applications including the separation of a Ni/Cu sulphide ore. This experimentation also highlighted the advantages of multi-sensor sorting and illustrated a means by which sorters can be used as multi-output machines; generating a number of tailored concentrates for down-stream processing. This is in contrast to the traditional view of sorters as a simple binary, concentrate/waste pre-concentration technique. A further key result of the research was the emulation of expert-based training using unsupervised clustering techniques and neural networks for colour quantisation. These techniques add flexibility and value to sorters in the minerals industry as they do not require a trained expert and so allow machines to be optimised by mine operators as conditions vary. The techniques also have an advantage as they complete the task of colour quantisation in a fraction of the time taken for an expert and so lend themselves well to the quick and efficient determination of automated sorting for a minerals application. Future research should focus on the advancement and application of neural networks to colour quantisation in conjunction with tradition training methods Further to this research should concentrate on practical applications utilising a multi-sensor, multi-output approach to automated sorting. 622
3	Target Discrimination Against Clutter Based on Unsupervised Clustering and Sequential Monte Carlo Tracking January 2016 (has links) abstract: The radar performance of detecting a target and estimating its parameters can deteriorate rapidly in the presence of high clutter. This is because radar measurements due to clutter returns can be falsely detected as if originating from the actual target. Various data association methods and multiple hypothesis filtering approaches have been considered to solve this problem. Such methods, however, can be computationally intensive for real time radar processing. This work proposes a new approach that is based on the unsupervised clustering of target and clutter detections before target tracking using particle filtering. In particular, Gaussian mixture modeling is first used to separate detections into two Gaussian distinct mixtures. Using eigenvector analysis, the eccentricity of the covariance matrices of the Gaussian mixtures are computed and compared to threshold values that are obtained a priori. The thresholding allows only target detections to be used for target tracking. Simulations demonstrate the performance of the new algorithm and compare it with using k-means for clustering instead of Gaussian mixture modeling. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2016 Electrical engineering Systems science Clutter Machine Learning Radar Sequential Monte Carlo Methods Target Tracking Unsupervised Clustering
4	Unsupervised Categorical Clustering on Labor Markets Steffen, Matthew James 10 April 2023 (has links) During this "white collar recession,'' there is a flooded labor market of workers. For employers seeking to hire, there is a need to identify potential qualified candidates for each job. The current state of the art is LinkedIn Recruiting or elastic search on Resumes. The current state of the art lacks efficiency and scalability along with an intuitive ranking of candidates. We believe this can be fixed with multi-layer categorical clustering via modularity maximization. To test this, we gathered a dataset that is extensive and representative of the job market. Our data comes from PeopleDataLabs and LinkedIn and is sampled from 153 million individuals. As such, this data represents one of the most informative datasets for the task of ranking and clustering job titles and skills. Properly grouping individuals will help identify more candidates to fulfill the multitude of vacant positions. We implement a novel framework for categorical clustering, involving these attributes to deliver a reliable pool of candidates. We develop a metric for clustering based on commonality to rank clustering algorithms. The metric prefers modularity-based clustering algorithms like the Louvain algorithm. This allows us to use such algorithms to outperform other unsupervised methods for categorical clustering. Our implementation accurately clusters emergency services, health-care and other fields while managerial positions are interestingly swamped by soft or uninformative features thereby resulting in dominant ambiguous clusters. Categorical Clustering Modularity Maximization Unsupervised Clustering Louvain Sparse Categorical Data Physical Sciences and Mathematics
5	Experimental time-domain controlled source electromagnetic induction for highly conductive targets detection and discrimination Benavides Iglesias, Alfonso 17 September 2007 (has links) The response of geological materials at the scale of meters and the response of buried targets of different shapes and sizes using controlled-source electromagnetic induction (CSEM) is investigated. This dissertation focuses on three topics; i) frac- tal properties on electric conductivity data from near-surface geology and processing techniques for enhancing man-made target responses, ii) non-linear inversion of spa- tiotemporal data using continuation method, and iii) classification of CSEM transient and spatiotemporal data. In the first topic, apparent conductivity profiles and maps were studied to de- termine self-affine properties of the geological noise and the effects of man-made con- ductive metal targets. 2-D Fourier transform and omnidirectional variograms showed that variations in apparent conductivity exhibit self-affinity, corresponding to frac- tional Brownian motion. Self-affinity no longer holds when targets are buried in the near-surface, making feasible the use of spectral methods to determine their pres- ence. The difference between the geology and target responses can be exploited using wavelet decomposition. A series of experiments showed that wavelet filtering is able to separate target responses from the geological background. In the second topic, a continuation-based inversion method approach is adopted, based on path-tracking in model space, to solve the non-linear least squares prob- lem for unexploded ordnance (UXO) data. The model corresponds to a stretched- exponential decay of eddy currents induced in a magnetic spheroid. The fast inversion of actual field multi-receiver CSEM responses of inert, buried ordnance is also shown. Software based on the continuation method could be installed within a multi-receiver CSEM sensor and used for near-real-time UXO decision. In the third topic, unsupervised self-organizing maps (SOM) were adapted for data clustering and classification. The use of self-organizing maps (SOM) for central- loop CSEM transients shows potential capability to perform classification, discrimi- nating background and non-dangerous items (clutter) data from, for instance, unex- ploded ordnance. Implementation of a merge SOM algorithm showed that clustering and classification of spatiotemporal CSEM data is possible. The ability to extract tar- get signals from a background-contaminated pattern is desired to avoid dealing with forward models containing subsurface response or to implement processing algorithm to remove, to some degree, the effects of background response and the target-host interactions. electromagnetic induction controlled-source spectral analysis self-affine fractal dimension inversion unexploded ordnance classification discrimination unsupervised clustering self-organizing maps
6	Speech Segregation in Background Noise and Competing Speech Hu, Ke 17 July 2012 (has links) No description available. Computer Science Monaural Speech Separation CASA Unvoiced Speech Nonspeech Interference Cochannel Speech Separation Unsupervised Clustering Model-based Method Iterative Estimation
7	Aide au diagnostic de cancers cutanés et de la leucémie lymphoïde chronique par microspectroscopies vibrationnelles couplées à des analyses numériques multivariées / Vibrational spectroscopies coupled with numerical multivariate analyzes as an aid to diagnose skin cancers and chronic lymphocytic leukemia Happillon, Teddy 12 December 2013 (has links) La spectroscopie vibrationnelle est une technologie permettant de générer une grande quantité de données très informatives quant à la composition moléculaire des échantillons analysés. Lorsqu'elle est couplée à des méthodes chimiométriques de traitement et de classification de données, elle devient un outil très performant pour l'identification de structures et sous-structures des échantillons. Appliqué dans le domaine du biomédical, cet outil présente alors un fort potentiel pour le diagnostic de maladie. C'est dans ce cadre qu'ont été réalisés les travaux de ce manuscrit. Dans une première étude relevant du développement algorithmique, un algorithme automatique de classification non supervisée (basé sur les Fuzzy C-Means) et récemment implémenté au sein du laboratoire pour apporter une aide au diagnostic de cancers cutanés par imagerie infrarouge, a été amélioré afin de i) considérablement réduire le temps nécessaire à son exécution ii) augmenter la qualité des résultats obtenus sur les données infrarouge et iii) étendre son champs d'application à des données réelles et simulées, habituellement employées dans la littérature. Cet outil a été testé sur des données infrarouge acquises sur 16 échantillons de cancers cutanés (BCC, SCC, maladie de Bowen et mélanomes), et sur 49 jeux de données réels et simulés. Les résultats obtenus ont montré la capacité de ce nouvel algorithme à estimer des partitions proches de la réalité quelque soit le type de données étudié. La seconde étude de ce manuscrit avait pour but de mettre au point un outil chimiométrique autonome d'aide au diagnostic de la leucémie lymphoïde chronique par spectroscopie Raman. Dans ce travail, des traitements numériques et l'algorithme de classification supervisée Support Vector Machines, ont été appliqués à des données acquises sur des cellules sanguine de 27 témoins et 49 patients présentant une leucémie lymphoïde chronique. Les résultats de classification obtenus ont montré une sensibilité de 80% et une spécificité de 100% dans la détection de la maladie. / Vibrational spectroscopy is a technology able to record a large amount of molecular information from studied samples. Coupled with chemometrics and classification methods, vibrational spectroscopy is an efficient tool to identify sample structures and substructures. When applied to the biomedical field, this tool shows a high potential for disease diagnosis. It is in this context that the works presented in this thesis have been realized. In a first study, dealing with algorithmic development, an automatic and unsupervised classification algorithm (based on the Fuzzy C-Means) and developed by our laboratory in order to help for skin cancer diagnosis using IR spectroscopy, was improved in order to i) reduce the computational time needed to realize clustering, ii) increase results quality obtained on infrared data, iii) and extend its application fields to simulated and real datasets, commonly used in the literature. This tool has been tested on 16 infrared spectral images of skin cancers (BCC, SCC, Bowen's disease and melanoma), and 49 real and simulated datasets. The obtained results showed the ability of this new algorithm to estimate realistic data partitions regardless the considered dataset. The second study of this work aimed at developing an independent chemometric tool to assist for chronic lymphocytic leukemia diagnosis by Raman spectroscopy. In this second work, different numerical preprocessing steps and a supervised classification algorithm, Support Vector Machines, have been applied on data recorded on blood cells coming from 27 healthy persons and 49 patients with chronic lymphocytic leukemia. The classification results showed a sensitivity of 80% and a specificity of 100% in the disease diagnosis. Classification non supervisée Classification supervisée Spectroscopie infrarouge Cancers cutanés Leucémie lymphoïde chronique Spectroscopie Raman Unsupervised clustering Supervised clustering Infrared spectroscopy Skin cancers Chronic lymphocytic leukemia Raman spectroscopy 610
8	Structuration du modèle acoustique pour améliorer les performance de reconnaissance automatique de la parole / Acoustic model structuring for improving automatic speech recognition performance Gorin, Arseniy 26 November 2014 (has links) Cette thèse se concentre sur la structuration du modèle acoustique pour améliorer la reconnaissance de la parole par modèle de Markov. La structuration repose sur l’utilisation d’une classification non supervisée des phrases du corpus d’apprentissage pour tenir compte des variabilités dues aux locuteurs et aux canaux de transmission. L’idée est de regrouper automatiquement les phrases prononcées en classes correspondant à des données acoustiquement similaires. Pour la modélisation multiple, un modèle acoustique indépendant du locuteur est adapté aux données de chaque classe. Quand le nombre de classes augmente, la quantité de données disponibles pour l’apprentissage du modèle de chaque classe diminue, et cela peut rendre la modélisation moins fiable. Une façon de pallier ce problème est de modifier le critère de classification appliqué sur les données d’apprentissage pour permettre à une phrase d’être associée à plusieurs classes. Ceci est obtenu par l’introduction d’une marge de tolérance lors de la classification ; et cette approche est étudiée dans la première partie de la thèse. L’essentiel de la thèse est consacré à une nouvelle approche qui utilise la classification automatique des données d’apprentissage pour structurer le modèle acoustique. Ainsi, au lieu d’adapter tous les paramètres du modèle HMM-GMM pour chaque classe de données, les informations de classe sont explicitement introduites dans la structure des GMM en associant chaque composante des densités multigaussiennes avec une classe. Pour exploiter efficacement cette structuration des composantes, deux types de modélisations sont proposés. Dans la première approche on propose de compléter cette structuration des densités par des pondérations des composantes gaussiennes dépendantes des classes de locuteurs. Pour cette modélisation, les composantes gaussiennes des mélanges GMM sont structurées en fonction des classes et partagées entre toutes les classes, tandis que les pondérations des composantes des densités sont dépendantes de la classe. Lors du décodage, le jeu de pondérations des gaussiennes est sélectionné en fonction de la classe estimée. Dans une deuxième approche, les pondérations des gaussiennes sont remplacées par des matrices de transition entre les composantes gaussiennes des densités. Les approches proposées dans cette thèse sont analysées et évaluées sur différents corpus de parole qui couvrent différentes sources de variabilité (âge, sexe, accent et bruit) / This thesis focuses on acoustic model structuring for improving HMM-Based automatic speech recognition. The structuring relies on unsupervised clustering of speech utterances of the training data in order to handle speaker and channel variability. The idea is to split the data into acoustically similar classes. In conventional multi-Modeling (or class-Based) approach, separate class-Dependent models are built via adaptation of a speaker-Independent model. When the number of classes increases, less data becomes available for the estimation of the class-Based models, and the parameters are less reliable. One way to handle such problem is to modify the classification criterion applied on the training data, allowing a given utterance to belong to more than one class. This is obtained by relaxing the classification decision through a soft margin. This is investigated in the first part of the thesis. In the main part of the thesis, a novel approach is proposed that uses the clustered data more efficiently in a class-Structured GMM. Instead of adapting all HMM-GMM parameters separately for each class of data, the class information is explicitly introduced into the GMM structure by associating a given density component with a given class. To efficiently exploit such structured HMM-GMM, two different approaches are proposed. The first approach combines class-Structured GMM with class-Dependent mixture weights. In this model the Gaussian components are shared across speaker classes, but they are class-Structured, and the mixture weights are class-Dependent. For decoding an utterance, the set of mixture weights is selected according to the estimated class. In the second approach, the mixture weights are replaced by density component transition probabilities. The approaches proposed in the thesis are analyzed and evaluated on various speech data, which cover different types of variability sources (age, gender, accent and noise) Reconnaissance de la parole Classification non supervisée Modèles de classes de locuteurs Modèles stochastiques de trajectoire Variabilité de locuteur Speech recognition Unsupervised clustering Speaker class modeling Stochastic trajectory modeling Speaker variability 006.454
9	基於大數據資料的非監督分散式分群演算法 / An Effective Distributed GHSOM Algorithm for Unsupervised Clustering on Big Data 邱垂暉, Chiu, Chui Hui Unknown Date (has links) 基於屬性相似度將樣本進行分群的技術已經被廣泛應用在許多領域，如模式識別，特徵提取和惡意行為偵測。由於此技術的重要性，很多人已經將各種分群技術利用分散式框架進行再製，例如K-means搭配Hadoop在Apache Mahout平台上。由於K-means需要預先定義分群數量，而自組織映射圖（SOM）需要預先定義圖的大小，所以能夠自動將樣本依照樣本間的變化容差進行分群的GHSOM（增長層次自組織映射圖）就提供了一個很棒的非監督學習方法用來針對某些資訊不完整的資料。然而，GHSOM目前並不是一個分散式的演算法，這就限制了其在大數據資料的應用上。在本篇論文中，我們提出了一種新的分散式GHSOM演算法。我們使用Scala的Actor Model來實現GHSOM的分散式系統，我們將GHSOM演算法中的水平擴增以及垂直擴增交由Actor來處理並顯示出顯著的性能提升。為了評估我們所提出的方法，我們收集並分析了數千個惡意程式在現實生活中的執行行為，並通過在數百萬個樣本上進行非監督分群後推導出惡意程式行為的檢測規則來顯示其性能的改進、規則有效性以及實踐中的潛在用法。 / Clustering techniques that group samples based on their attribute similarity have been widely used in many fields such as pattern recognition, feature extraction and malicious behavior characterization. Due to its importance, various clustering techniques have been developed with distributed frameworks such as K-means with Hadoop in Apache Mahout for scalable computation. While K-means requires the number of clusters and self organizing maps (SOM) requires the map size to be given, the technique of GHSOM (growing hierarchical self organizing maps) that clusters samples dynamically to satisfy the requirement on tolerance of variation between samples, poses an attractive unsupervised learning solution for data that have limited information to decide the number of clusters in advance. However it is not scalable with sequential computation, which limits its applications on big data. In this paper, we present a novel distributed algorithm on GHSOM. We take advantage of parallel computation with scala actor model for GHSOM construction, distributing vertical and horizontal expansion tasks to actors and showing significant performance improvement. To evaluate the presented approach, we collect and analyze execution behaviors of thousands of malware in real life and derive detection rules with the presented unsupervised clustering on millions samples, showing its performance improvement, rule effectiveness and potential usage in practice. 非監督式分群 GHSOM Actor Model 惡意程式偵測平行運算 Unsupervised clustering GHSOM Actor model Malware detection Parallel computation
10	Les collections volumineuses de documents audiovisuels : segmentation et regroupement en locuteurs / Speaker diarization : the voluminous collections of audiovisual recordings Dupuy, Grégor 03 July 2015 (has links) La tâche de Segmentation et Regroupement en Locuteurs (SRL), telle que définie par le NIST, considère le traitement des enregistrements d’un corpus comme des problèmes indépendants. Les enregistrements sont traités séparément, et le tauxd’erreur global sur le corpus correspond finalement à une moyenne pondérée. Dans ce contexte, les locuteurs détectés par le système sont identifiés par des étiquettes anonymes propres à chaque enregistrement. Un même locuteur qui interviendrait dans plusieurs enregistrements sera donc identifié par des étiquettes différentes selon les enregistrements. Cette situation est pourtant très fréquente dans les émissions journalistiques d’information : les présentateurs, les journalistes et autres invités qui animent une émission interviennent généralement de manière récurrente. En conséquence, la tâche de SRL a depuis peu été considérée dans un contexte plus large, où les locuteurs récurrents doivent être identifiés de manière unique dans tous les enregistrements qui composent un corpus. Cette généralisation du problème de regroupement en locuteurs va de pair avec l’émergence du concept de collection, qui se réfère, dans le cadre de la SRL, à un ensemble d’enregistrements ayant une ou plusieurs caractéristiques communes. Le travail proposé dans cette thèse concerne le regroupement en locuteurs sur des collections de documents audiovisuels volumineuses (plusieurs dizaines d’heures d’enregistrements). L’objectif principal est de proposer (ou adapter) des approches de regroupement afin de traiter efficacement de gros volumes de données, tout en détectant les locuteurs récurrents. L’efficacité des approches proposées est étudiée sous deux aspects : d’une part, la qualité des segmentations produites (en termes de taux d’erreur), et d’autre part, la durée nécessaire pour effectuer les traitements. Nous proposons à cet effet deux architectures adaptées au regroupement en locuteurs sur des collections de documents. Nous proposons une approche de simplification où le problème de regroupement est représenté par une graphe non-orienté. La décompositionde ce graphe en composantes connexes permet de décomposer le problème de regroupement en un certain nombre de sous-problèmes indépendants. La résolution de ces sous-problèmes de regroupement est expérimentée avec deux approches de regroupements différentes (HAC et ILP) tirant parti des récentes avancées en modélisation du locuteur (i-vector et PLDA). / The task of speaker diarization, as defined by NIST, considers the recordings from a corpus as independent processes. The recordings are processed separately, and the overall error rate is a weighted average. In this context, detected speakers are identified by anonymous labels specific to each recording. Therefore, a speaker appearing in several recordings will be identified by a different label in each of the recordings. Yet, this situation is very common in broadcast news data: hosts, journalists and other guests may appear recurrently. Consequently, speaker diarization has been recently considered in a broader context, where recurring speakers must be uniquely identified in every recording that compose a corpus. This generalization of the speaker partitioning problem goes hand in hand with the emergence of the concept of collections, which refers, in the context of speaker diarization, to a set of recordings sharing one or more common characteristics.The work proposed in this thesis concerns speaker clustering of large audiovisual collections (several tens of hours of recordings). The main objective is to propose (or adapt) clustering approaches in order to efficiently process large volumes of data, while detecting recurrent speakers. The effectiveness of the proposed approaches is discussed from two point of view: first, the quality of the produced clustering (in terms of error rate), and secondly, the time required to perform the process. For this purpose, we propose two architectures designed to perform cross-show speaker diarization with collections of recordings. We propose a simplifying approach to decomposing a large clustering problem in several independent sub-problems. Solving these sub-problems is done with either of two clustering approaches which takeadvantage of the recent advances in speaker modeling. Collections de documents audiovisuels Traitement automatique de la parole Speaker Diarization Audiovisual recording collections Unsupervised clustering algorithms Automatic sprech processing 005.741

Search results