Spelling suggestions: "subject:"[een] UNSUPERVISED LEARNING"" "subject:"[enn] UNSUPERVISED LEARNING""
171 |
[en] HYBRID GENETIC ALGORITHM FOR THE MINIMUM SUM-OF-SQUARES CLUSTERING PROBLEM / [pt] ALGORITMO GENÉTICO HÍBRIDO PARA O PROBLEMA DE CLUSTERIZAÇÃO MINIMUM SUM-OF-SQUARESDANIEL LEMES GRIBEL 27 July 2017 (has links)
[pt] Clusterização desempenha um papel importante em data mining, sendo útil em muitas áreas que lidam com a análise exploratória de dados, tais como recuperação de informações, extração de documentos e segmentação de imagens. Embora sejam essenciais em aplicações de data mining, a maioria
dos algoritmos de clusterização são métodos ad-hoc. Eles carecem de garantias na qualidade da solução, que em muitos casos está relacionada a uma convergência prematura para um mínimo local no espaço de busca. Neste trabalho, abordamos o problema de clusterização a partir da perspectiva de otimização, onde propomos um algoritmo genético híbrido para resolver o problema Minimum Sum-of-Squares Clustering (MSSC, em inglês). A meta-heurística proposta é capaz de escapar de mínimos locais e gerar soluções quase ótimas para o problema MSSC. Os resultados mostram que o método proposto superou os resultados atuais da literatura – em termos de qualidade da solução – para quase todos os conjuntos de instâncias considerados para o problema MSSC. / [en] Clustering plays an important role in data mining, being useful in many fields that deal with exploratory data analysis, such as information retrieval, document extraction, and image segmentation. Although they are essential in data mining applications, most clustering algorithms are adhoc methods. They have a lack of guarantee on the solution quality, which in many cases is related to a premature convergence to a local minimum of the search space. In this research, we address the problem of data clustering from an optimization perspective, where we propose a hybrid genetic algorithm to solve the Minimum Sum-of-Squares Clustering (MSSC) problem. This meta-heuristic is capable of escaping from local minima and generating near-optimal solutions to the MSSC problem. Results show that the proposed method outperformed the best current literature results - in terms of solution quality - for almost all considered sets of benchmark
instances for the MSSC objective.
|
172 |
Uma abordagem interativa guiada por semântica para identificação e recuperação de imagens / A semantic guided interactive image retrieval approachGonçalves, Filipe Marcel Fernandes [UNESP] 17 August 2016 (has links)
Submitted by Filipe Marcel Fernandes Gonçalves null (filipemfg@gmail.com) on 2016-10-13T22:19:26Z
No. of bitstreams: 1
Dissertação_Mestrado_Filipe_Marcel_Fernandes_Gonçalves.pdf: 6479864 bytes, checksum: 4596171ab4ce8e8c1a6ce9723f335b36 (MD5) / Approved for entry into archive by Juliano Benedito Ferreira (julianoferreira@reitoria.unesp.br) on 2016-10-19T18:04:08Z (GMT) No. of bitstreams: 1
goncalves_fmf_me_sjrp.pdf: 6479864 bytes, checksum: 4596171ab4ce8e8c1a6ce9723f335b36 (MD5) / Made available in DSpace on 2016-10-19T18:04:08Z (GMT). No. of bitstreams: 1
goncalves_fmf_me_sjrp.pdf: 6479864 bytes, checksum: 4596171ab4ce8e8c1a6ce9723f335b36 (MD5)
Previous issue date: 2016-08-17 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O grande volume de imagens disponível na Web gerado em diferentes domínios requer um conhecimento especializado para sua a análise e identificação. Nesse sentido, recentes avanços ocorreram com desenvolvimento de técnicas de recuperação de imagens baseadas nas características visuais. Entretanto, o gap semântico entre as características de baixo-nível das imagens e aquilo que a imagem representa ainda é um grande desafio. Uma solução para diminuir o gap semântico consiste em combinar a informação de características visuais das imagens com o conhecimento do domínio de tais imagens. Nesse sentido, ontologias podem auxiliar, já que estruturam o conhecimento. Desse modo, o presente trabalho apresenta uma nova abordagem denominada Recuperação Interativa de Imagens Guiada por Semântica (Semantic Interactive Image Retrieval – SIIR) que combina técnicas de recuperação de imagens baseadas no conteúdo (Content Based Image Retrieval – CBIR) e aprendizado não supervisionado, com o conhecimento definido em ontologias. Desse modo, o trabalho em questão propõe uma nova abordagem a fim de simular o papel dos biólogos na classificação de famílias de Angiospermas a partir de uma imagem e seu conteúdo. Para tanto, foi desenvolvida uma ontologia de estruturas e propriedades de plantas com flor e fruto, de modo a conceitualizar e relacionar tais atributos visando a classificação de famílias de Angiospermas. Para análise das características visuais foram utilizados métodos de extração de características de baixo-nível das imagens. Com relação ao aprendizado não supervisionado foi utilizado o algoritmo RL-Sim a fim de melhorar a eficácia da recuperação das imagens. A abordagem combina técnicas CBIR com ontologias ao utilizar um grafo bipartido e um grafo discriminativo de atributos. O grafo discriminativo de atributos permite a análise semântica utilizada para selecionar o atributo que melhor classifica a planta da imagem de busca. Os atributos selecionados são utilizados para formular uma interação com um usuário, de modo a melhorar a eficácia da recuperação e diminuir os esforços necessários na identificação da planta. O método proposto foi avaliado nos conjuntos de dados públicos Oxford Flowers 17 e 102 Classes, de modo que os resultados demonstram alta eficácia para ambos os conjuntos de dados quando comparados com outras abordagens. / A large amount of images is currently generated in many domains, thus requiring specialized knowledge on the identification and analysis. From one standpoint, many advances have been accomplished in the development of image retrieval techniques based on visual image properties. However, the semantic gap between low-level features and high level concepts still represents a challenge scenario. One another standpoint, knowledge has also been structured in many fields by ontologies. A promising solution for bridging the semantic gap consists in combining the information from low-level features with semantic knowledge. This work proposes a new approach denominated Semantic Interactive Image Retrieval (SIIR) which combines Content Based Image Retrieval (CBIR) and unsupervised learning with ontology techniques. We present a novel approach aiming to simulate the biologists role in the classification of Angiosperm families from image sources and their content. In order to achieve this goal, we developed a domain ontology from plant properties and structures, hence relating features from the Angiosperm families. In regard to Unsupervised Learning, we used the RL-Sim algorithm to improve image classification. The proposed approach combines CBIR techniques with ontologies using a bipartite graph and a discriminative attribute graph. Such graph structures allow a semantic analysis used for the selection of the attribute that best classify the plant. The selected attributes are used for formulating the user interactions, improving the effectiveness and reducing the user efforts required. The proposed method was evaluated on the popular Oxford Flowers 17 and 102 Classes datasets, yielding very high effectiveness results in both datasets when compared to other approaches.
|
173 |
Statistiques en grande dimension pour la détection d'anomalies dans les données fonctionnelles issues des satellites / High Dimension Statistics for Space Applications on functional data deriving from satellitesBarreyre, Clementine 18 May 2018 (has links)
Ce travail de thèse consiste au développement de méthodes statistiques pour détecter des com- portements anormaux dans les données fonctionnelles que produit le satellite tout au long de sa vie. Un premier travail a été de comprendre comment mettre en évidence les anomalies grâce à des projections sur des bases de fonctions. En complément de cette revue des projections, nous avons appliqué plusieurs méthodes de détection d’anomalies, telles que la One-Class SVM et le Local Outlier Factor (LOF). En plus de ces deux méthodes, nous avons développé notre propre méthode pour prendre en compte la saisonnalité des courbes que nous considérons. En se basant sur cette étude, nous avons développé une nouvelle procédure pour sélectionner automatiquement les coefficients les plus intéressants pour la détection d’anomalies dans un cadre semi-supervisé. Notre méthode est une procédure de tests multiples où nous appliquons un test à deux échantillons à tous les niveaux de coefficients. Nous nous sommes également intéressés aux covariances des télémesures entre elles pour la détection d’anomalies. Pour cela, nous cherchons à comparer les covariances entre un groupe de télémesures pour deux journées, ou périodes consécutives. Nous avons appliqué trois tests sta- tistiques ayant des angles d’approche différents. Nous avons également développé dans ce but un nouveau test asymptotique. Outre la démonstration de la convergence de notre test, nous démontrons par des exemples que ce test est dans la pratique le plus puissant sur les données dont nous disposons. Dans cette thèse, nous avons abordé plusieurs aspects de la détection d’anomalies dans les données fonctionnelles issues des satellites. Pour chacune des méthodes, nous avons pu détecter toutes les anomalies, améliorant sensiblement le taux de fausses alarmes. / In this PhD, we have developed statistical methods to detect abnormal events in all the functional data produced by the satellite all through its lifecycle. The data we are dealing with come from two main phases in the satellite’s life, telemetries and test data. A first work on this thesis was to understand how to highlight the outliers thanks to projections onto functional bases. On these projections, we have also applied several outlier detection methods, such as the One-Class SVM, the Local Outlier Factor (LOF). In addition to these two methods, we have developed our own outlier detection method, by taking into account the seasonality of the data we consider. Based on this study, we have developed an original procedure to select automatically the most interesting coefficients in a semi-supervised framework for the outlier detection, from a given projection. Our method is a multiple testing procedure where we apply the two sample-test to all the levels of coefficients.We have also chosen to analyze the covariance matrices representing the covariance of the te- lemetries between themselves for the outlier detection in multivariate data. In this purpose, we are comparing the covariance of a cluster of several telemetries deriving from two consecutive days, or consecutive orbit periods. We have applied three statistical tests targeting this same issue with different approaches. We have also developed an original asymptotic test, inspired by both first tests. In addition to the proof of the convergence of this test, we demonstrate thanks to examples that this new test is the most powerful. In this PhD, we have tackled several aspects of the anomaly detection in the functional data deriving from satellites. For each of these methods, we have detected all the major anomalies, improving significantly the false discovery rate.
|
174 |
Eficiência energética de redes de sensores sem fio aplicada ao conforto térmico em ambientes fechadosSouza, Thales Ruano Barros de 21 February 2014 (has links)
Made available in DSpace on 2015-04-22T22:00:49Z (GMT). No. of bitstreams: 1
Thales Ruano Barros de Souza.pdf: 1498428 bytes, checksum: 00ad867812a75177cff9522fd68c06de (MD5)
Previous issue date: 2014-02-21 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Wireless Sensor Networks can be used in several applications such as monitoring and study of thermal comfort in indoor environment. However, since the sensors nodes are battery powered, the cost of network maintenance can quickly exceed the cost of whole monitoring system. In this work, it is proposed and evaluated a routing protocol considering the monitored variables in sensors nodes to form groups of sensors in order to reduce the intra cluster communication. Our solution uses not only localization but also temperature and relative humidity collected by each sensor node to elect cluster heads (CHs). The k-means algorithm was used to group the correlated sensors nodes. Results show that the proposed scheme has a performance 50% higher than the state-of-the-art algorithms in relation to residual energy and network lifetime. Therefore there is a substantial gain when we exploit the correlation structure of the collected data to form groups and to elect CHs. / As Redes de Sensores Sem Fio (RSSFs) podem ser utilizadas em aplicações de monitoramento e estudo do conforto térmico em ambientes fechados, devido a sua capacidade de sensoriamento, baixo custo e implantação rápida. Porém o custo de manutenção pode ultrapassar o custo do próprio sistema de monitoramento, isso porque os nós sensores são alimentados por bateria. Nesta dissertação é proposto e avaliado um protocolo de roteamento que leva em consideração as variáveis monitoradas na formação dos grupos de nós sensores, para redução da comunicação intragrupo. A solução proposta utiliza, não apenas a localização dos nós sensores, mas também a temperatura e a umidade relativa coletadas por cada nó sensor, para a eleição dos cluster-heads (CHs). Para agrupar os nós sensores com medidas correlacionadas, utilizamos o algoritmo k-means, que agrupa instâncias similares. Os experimentos mostram que, em comparação com outros algoritmos da literatura, o consumo de energia chegou a ser reduzido em 50%. Portanto há um ganho substancial quando se explora a estrutura de correlação das variáveis coletadas na formação dos grupos e na eleição dos CHs.
|
175 |
Collective dynamics in complex networks for machine learning / Dinâmica coletiva em redes complexas para aprendizado de máquinaFilipe Alves Neto Verri 19 March 2018 (has links)
Machine learning enables machines to learn automatically from data. In literature, graph-based methods have received increasing attention due to their ability to learn from both local and global information. In these methods, each data instance is represented by a vertex and is linked to other vertices according to a predefined affinity rule. However, they usually have unfeasible time cost for large problems. To overcome this problem, techniques can employ a heuristic to find suboptimal solutions in a feasible time. Early heuristic optimization methods exploit nature-inspired collective processes, such as ants looking for food sources and swarms of bees. Nowadays, advances in the field of complex systems provide powerful tools to assess and to understand dynamical systems. Complex networks, which are graphs with nontrivial topology, are among these theoretical tools capable of describing the interplay of topology, structure, and dynamics of complex systems. Therefore, machine learning methods based on complex networks and collective dynamics have been proposed. They encompass three steps. First, a complex network is constructed from the input data. Then, the simulation of a distributed collective system in the network generates rich information. Finally, the collected information is used to solve the learning problem. The coordination of the individuals in the system permit to achieve dynamics that is far more complex than the behavior of single individuals. In this research, I have explored collective dynamics in machine learning tasks, both in unsupervised and semi-supervised scenarios. Specifically, I have proposed a new collective system of competing particles that shifts the traditional vertex-centric dynamics to a more informative edge-centric one. Moreover, it is the first particle competition system applied in machine learning task that has deterministic behavior. Results show several advantages of the edge-centric model, including the ability to acquire more information about overlapping areas, a better exploration behavior, and a faster convergence time. Also, I have proposed a new network formation technique that is not based on similarity and has low computational cost. Since addition and removal of samples in the network is cheap, it can be used in real-time application. Finally, I have conducted analytical investigations of a flocking-like system that was needed to guarantee the expected behavior in community detection tasks. In conclusion, the result of the research contributes to many areas of machine learning and complex systems. / Aprendizado de máquina permite que computadores aprendam automaticamente dos dados. Na literatura, métodos baseados em grafos recebem crescente atenção por serem capazes de aprender através de informações locais e globais. Nestes métodos, cada item de dado é um vértice e as conexões são dadas uma regra de afinidade. Todavia, tais técnicas possuem custo de tempo impraticável para grandes grafos. O uso de heurísticas supera este problema, encontrando soluções subótimas em tempo factível. No início, alguns métodos de otimização inspiraram suas heurísticas em processos naturais coletivos, como formigas procurando por comida e enxames de abelhas. Atualmente, os avanços na área de sistemas complexos provêm ferramentas para medir e entender estes sistemas. Redes complexas, as quais são grafos com topologia não trivial, são uma das ferramentas. Elas são capazes de descrever as relações entre topologia, estrutura e dinâmica de sistemas complexos. Deste modo, novos métodos de aprendizado baseados em redes complexas e dinâmica coletiva vêm surgindo. Eles atuam em três passos. Primeiro, uma rede complexa é construída da entrada. Então, simula-se um sistema coletivo distribuído na rede para obter informações. Enfim, a informação coletada é utilizada para resolver o problema. A interação entre indivíduos no sistema permite alcançar uma dinâmica muito mais complexa do que o comportamento individual. Nesta pesquisa, estudei o uso de dinâmica coletiva em problemas de aprendizado de máquina, tanto em casos não supervisionados como semissupervisionados. Especificamente, propus um novo sistema de competição de partículas cuja competição ocorre em arestas ao invés de vértices, aumentando a informação do sistema. Ainda, o sistema proposto é o primeiro modelo de competição de partículas aplicado em aprendizado de máquina com comportamento determinístico. Resultados comprovam várias vantagens do modelo em arestas, includindo detecção de áreas sobrepostas, melhor exploração do espaço e convergência mais rápida. Além disso, apresento uma nova técnica de formação de redes que não é baseada na similaridade dos dados e possui baixa complexidade computational. Uma vez que o custo de inserção e remoção de exemplos na rede é barato, o método pode ser aplicado em aplicações de tempo real. Finalmente, conduzi um estudo analítico em um sistema de alinhamento de partículas. O estudo foi necessário para garantir o comportamento esperado na aplicação do sistema em problemas de detecção de comunidades. Em suma, os resultados da pesquisa contribuíram para várias áreas de aprendizado de máquina e sistemas complexos.
|
176 |
Modélisation non-supervisée de signaux sociaux / Unsupervised modelisation of social signalsMichelet, Stéphane 10 March 2016 (has links)
Le but de cette thèse est de proposer des méthodes d'étude et des modèles pour l'analyse des signaux sociaux dans un contexte d'interaction en exploitant à la fois des techniques issues du traitement du signal et de la reconnaissance des formes.Tout d'abord, une méthode non supervisée permettant de mesurer l'imitation entre deux partenaires en termes de délai et de degré est proposée en étudiant uniquement des données gestuelles. Dans un premier temps, des points d'intérêts spatio-temporels sont détectés afin de sélectionner les régions les plus importantes des vidéos. Ils sont ensuite décrits à l'aide d'histogrammes pour permettre la construction de modèles sac-de-mots dans lesquels l'information spatiale est réintroduite. Le degré d'imitation et le délai entre les partenaires sont alors estimés de manière continue grâce à une corrélation-croisée entre les deux modèles sac-de-mots.La deuxième partie de cette thèse porte sur l'extraction automatique d'indices permettant de caractériser des interactions de groupe. Après avoir regroupé tous les indices couramment employés dans la littérature, nous avons proposé l'utilisation d'une factorisation en matrice non négative. En plus d'extraire les indices les plus pertinents, celle-ci a permis de regrouper automatiquement et de manière non supervisée des meetings en 3 classes correspondant aux trois types de leadership tels que définis par les psychologues.Enfin, la dernière partie se focalise sur l'extraction non supervisée d'indices permettant de caractériser des groupes. La pertinence de ces indices, par rapport à des indices ad-hoc provenant de l'état de l'art, est ensuite validée dans une tâche de reconnaissance des rôles. / In a social interaction, we adapt our behavior to our interlocutors. Studying and understanding the underlying mecanisms of this adaptation is the center of Social Signal Processing. The goal of this thesis is to propose methods of study and models for the analysis of social signals in the context of interaction, by exploiting both social processing and pattern recognition techniques. First, an unsupervised method allowing the measurement of imitation between two partners in terms of delay and degree is proposed, only using gestual data. Spatio-temporal interest point are first detected in order to select the most important regions of videos. Then they are described by histograms in order to construct bag-of-words models in which spatial information is reintroduced. Imitation degree and delay between partners are estimated in a continuous way thanks to cross-correlation between the two bag-of-words models. The second part of this thesis focus on the automatic extraction of features permitting to characterizing group interactions. After regrouping all features commonly used in literature, we proposed the utilization of non-negative factorization. More than only extracting the most pertinent features, it also allowed to automatically regroup, and in an unsupervised manner, meetings in three classes corresponding to three types of leadership defined by psychologists. Finally, the last part focus on unsupervised extraction of features permitting to characterize groups. The relevance of these features, compared to ad-hoc features from state of the art, is then validated in a role recognition task.
|
177 |
GCC-NMF : séparation et rehaussement de la parole en temps-réel à faible latence / GCC-NMF: low latency real-time speech separation and enhancementWood, Sean January 2017 (has links)
Le phénomène du cocktail party fait référence à notre remarquable capacité à nous concentrer sur une seule voix dans des environnements bruyants. Dans cette thèse, nous concevons, implémentons et évaluons une approche computationnelle nommée GCC-NMF pour résoudre ce problème. GCC-NMF combine l’apprentissage automatique non supervisé par la factorisation matricielle non négative (NMF) avec la méthode de localisation spatiale à corrélation croisée généralisée (GCC). Les atomes du dictionnaire NMF sont attribués au locuteur cible ou à l’interférence à chaque instant en fonction de leurs emplacements spatiaux estimés. Nous commençons par étudier GCC-NMF dans le contexte hors ligne, où des mélanges de 10 secondes sont traités à la fois. Nous développons ensuite une variante temps réel de GCC-NMF et réduisons par la suite sa latence algorithmique inhérente de 64 ms à 2 ms avec une méthode asymétrique de transformée de Fourier de courte durée (STFT). Nous montrons que des latences aussi faibles que 6 ms, dans la plage des délais tolérables pour les aides auditives, sont possibles sur les plateformes embarquées actuelles.
Nous évaluons la performance de GCC-NMF sur des données publiquement disponibles de la campagne d’évaluation de séparation des signaux SiSEC. La qualité de séparation objective est quantifiée avec les méthodes PEASS, estimant les évaluations subjectives humaines, ainsi que BSS Eval basée sur le rapport signal sur bruit (SNR) traditionnel. Bien que GCC-NMF hors ligne ait moins bien performé que d’autres méthodes du défi SiSEC en termes de métriques SNR, ses scores PEASS sont comparables aux meilleurs résultats. Dans le cas de GCC-NMF en ligne, alors que les métriques basées sur le SNR favorisent à nouveau d’autres méthodes, GCC-NMF surpasse toutes les approches précédentes sauf une en termes de scores PEASS globaux, obtenant des résultats comparables au masque binaire idéale. Nous montrons que GCC-NMF augmente la qualité objective et les métriques d’intelligibilité STOI et ESTOI sur une large gamme de SNR d’entrée de -30 à 20 dB, avec seulement des réductions mineures pour les SNR d’entrée supérieurs à 20 dB.
GCC-NMF présente plusieurs caractéristiques souhaitables lorsqu’on le compare aux approches existantes. Contrairement aux méthodes d’analyse de scène auditive computationnelle (CASA), GCC-NMF ne nécessite aucune connaissance préalable sur la nature des signaux d’entrée et pourrait donc convenir aux applications de séparation et de débruitage de source dans un grand nombre de domaines. Dans le cas de GCC-NMF en ligne, seule une petite quantité de données non étiquetées est nécessaire pour apprendre le dictionnaire NMF. Cela se traduit par une plus grande flexibilité et un apprentissage beaucoup plus rapide par rapport aux approches supervisées, y compris les solutions basées sur NMF et les réseaux neuronaux profonds qui reposent sur de grands ensembles de données étiquetées. Enfin, contrairement aux méthodes de séparation de source aveugle (BSS) qui reposent sur des statistiques de signal accumulées, GCC-NMF fonctionne indépendamment pour chaque trame, ce qui permet des applications en temps réel à faible latence. / Abstract: The cocktail party phenomenon refers to our remarkable ability to focus on a single voice in noisy environments. In this thesis, we design, implement, and evaluate a computational approach to solving this problem named GCC-NMF. GCC-NMF combines unsupervised machine learning via non-negative matrix factorization (NMF) with the generalized cross-correlation (GCC) spatial localization method. Individual NMF dictionary atoms are attributed to the target speaker or background interference at each point in time based on their estimated spatial locations. We begin by studying GCC-NMF in the offline context, where entire 10-second mixtures are treated at once. We then develop an online, instantaneous variant of GCC-NMF and subsequently reduce its inherent algorithmic latency from 64 ms to 2 ms with an asymmetric short-time Fourier transform (STFT) windowing method. We show that latencies as low as 6 ms, within the range of tolerable delays for hearing aids, are possible on current hardware platforms. We evaluate the performance of GCC-NMF on publicly available data from the Signal Separation Evaluation Campaign (SiSEC), where objective separation quality is quantified using the signal-to-noise ratio (SNR)-based BSS Eval and perceptually-motivated PEASS toolboxes. Though offline GCC-NMF underperformed other methods from the SiSEC challenge in terms of the SNR-based metrics, its PEASS scores were comparable with the best results. In the case of online GCC-NMF, while SNR-based metrics again favoured other methods, GCC-NMF outperformed all but one of the previous approaches in terms of overall PEASS scores, achieving comparable results to the ideal binary mask (IBM) baseline. Furthermore, we show that GCC-NMF increases objective speech quality and the STOI and ETOI speech intelligibility metrics over a wide range of input SNRs from -30 dB to 20 dB, with only minor reductions for input SNRs greater than 20 dB. GCC-NMF exhibits a number of desirable characteristics when compared existing approaches. Unlike computational auditory scene analysis (CASA) methods, GCC-NMF requires no prior knowledge about the nature of the input signals, and may thus be suitable for source separation and denoising applications in a wide range of fields. In the case of online GCC-NMF, only a small amount of unlabeled data is required to pre-train the NMF dictionary. This results in much greater flexibility and significantly faster training when compared to supervised approaches including NMF and deep neural network-based solutions that rely on large, supervised datasets. Finally, in contrast with blind source separation (BSS) methods that rely on accumulated signal statistics, GCC-NMF operates independently for each time frame, allowing for low latency, real-time applications.
|
178 |
機器學習分類方法DCG 與其他方法比較(以紅酒為例) / A supervised learning study of comparison between DCG tree and other machine learning methods in a wine quality dataset楊俊隆, Yang, Jiun Lung Unknown Date (has links)
隨著大數據時代來臨,機器學習方法已然成為熱門學習的主題,主要分為監督式學習與非監督式學習,亦即分類與分群。本研究以羅吉斯迴歸配適結果加權距離矩陣,以資料雲幾何樹分群法為主,在含有類別變數的紅酒資料中,透過先分群再分類的方式,判斷是否可以得到更佳的預測結果。並比較監督式學習下各種機器學習方法預測表現,及非監督式學習下後再透過分類器方法的預測表現。在內容的排序上,首先介紹常見的分類與分群演算方法,並分析其優缺點與假設限制,接著將介紹資料雲幾何樹演算法,並詳述執行步驟。最後再引入加權資料雲幾何樹演算法,將權重的觀點應用在資料雲幾何樹演算法中,透過紅酒資料,比較各種分類與分群方法的預測準確率。 / Machine learning has become a popular topic since the coming of big data era. Machine learning algorithms are often categorized as being supervised or unsupervised, namely classification or clustering methods. In this study, first, we introduced the advantages, disadvantages, and limits of traditional classification and clustering algorithms. Next, we introduced DCG-tree and WDCG algorithms. We extended the idea of WDCG to the cases with label size=3. The distance matrix was modified by the fitted results of logistic regression. Lastly, by using a real wine dataset, we then compared the performance of WDCG with the performance of traditional classification methodologies. The study showed that using unsupervised learning algorithm with logistic regression as a classifier performs better than using only the traditional classification methods.
|
179 |
Computer aided identification of biological specimens using self-organizing mapsDean, Eileen J 12 January 2011 (has links)
For scientific or socio-economic reasons it is often necessary or desirable that biological material be identified. Given that there are an estimated 10 million living organisms on Earth, the identification of biological material can be problematic. Consequently the services of taxonomist specialists are often required. However, if such expertise is not readily available it is necessary to attempt an identification using an alternative method. Some of these alternative methods are unsatisfactory or can lead to a wrong identification. One of the most common problems encountered when identifying specimens is that important diagnostic features are often not easily observed, or may even be completely absent. A number of techniques can be used to try to overcome this problem, one of which, the Self Organizing Map (or SOM), is a particularly appealing technique because of its ability to handle missing data. This thesis explores the use of SOMs as a technique for the identification of indigenous trees of the Acacia species in KwaZulu-Natal, South Africa. The ability of the SOM technique to perform exploratory data analysis through data clustering is utilized and assessed, as is its usefulness for visualizing the results of the analysis of numerical, multivariate botanical data sets. The SOM’s ability to investigate, discover and interpret relationships within these data sets is examined, and the technique’s ability to identify tree species successfully is tested. These data sets are also tested using the C5 and CN2 classification techniques. Results from both these techniques are compared with the results obtained by using a SOM commercial package. These results indicate that the application of the SOM to the problem of biological identification could provide the start of the long-awaited breakthrough in computerized identification that biologists have eagerly been seeking. / Dissertation (MSc)--University of Pretoria, 2011. / Computer Science / unrestricted
|
180 |
Učení bez učitele / Unsupervised learningKantor, Jan January 2008 (has links)
The purpose of this work has been to describe some techniques which are normally used for cluster data analysis process of unsupervised learning. The thesis consists of two parts. The first part of thesis has been focused on some algorithms theory describing advantages and disadvantages of each discussed method and validation of clusters quality. There are many ways how to estimate and compute clustering quality based on internal and external knowledge which is mentioned in this part. A good technique of clustering quality validation is one of the most important parts in cluster analysis. The second part of thesis deals with implementation of different clustering techniques and programs on real datasets and their comparison with true dataset partitioning and published related work.
|
Page generated in 0.0701 seconds