• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 41
  • 11
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 71
  • 71
  • 27
  • 24
  • 15
  • 12
  • 11
  • 11
  • 10
  • 10
  • 9
  • 8
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Diagnostic de systèmes non linéaires par analyse en composantes principales à noyau / Diagnosis of nonlinear systems using kernel Principal Component Analysis

Anani, Kwami Dodzivi 21 March 2019 (has links)
Dans cette thèse, le diagnostic d'un système non linéaire a été réalisé par une analyse de données. Initialement conçue pour analyser les données liées par des relations linéaires, l'Analyse en Composantes Principales (ACP) est couplée aux méthodes à noyau pour détecter, localiser et estimer l'amplitude des défauts sur des systèmes non linéaires. L'ACP à noyau consiste à projeter les données par l'intermédiaire d'une application non linéaire dans un espace de dimension élevée dénommé espace des caractéristiques où l'ACP linéaire est appliquée. Ayant fait la projection à l'aide de noyaux, la détection peut facilement être réalisée dans l'espace des caractéristiques. Cependant, l'estimation de l'amplitude du défaut nécessite la résolution d'un problème d'optimisation non linéaire. Une étude de contributions permet de localiser et d'estimer ces amplitudes. La variable ayant la plus grande contribution est susceptible d'être affectée par un défaut. Dans notre travail, nous avons proposé de nouvelles méthodes pour les phases de localisation et d'estimation des défauts pour lesquelles les travaux existants ont des limites. La nouvelle méthode proposée est basée sur les contributions sous contraintes permettant d'obtenir une reconstruction parcimonieuse des variables. L'efficacité des méthodes proposées est montrée sur un réacteur à agitation continue (CSTR). / In this thesis, the diagnosis of a nonlinear system was performed using data analysis. Initially developed to analyze linear system, Principal Component Analysis (PCA) is coupled with kernel methods for detection, isolation and estimation of faults' magnitude for nonlinear systems. Kernel PCA consists in projecting data using a nonlinear mapping function into a higher dimensional space called feature space where the linear PCA is applied. Due to the fact that the projections are done using kernels, the detection can be performed in the feature space. However, estimating the magnitude of the fault requires the resolution of a nonlinear optimization problem. The variables' contributions make it possible to isolate and estimate these magnitudes. The variable with the largest contribution may be considered as faulty. In our work, we proposed new methods for the isolation and estimation phases for which previous work has some limitations. The new proposed method in this thesis is based on contributions under constraints. The effectiveness of the developed methods is illustrated on the simulated continuous stirred tank reactor (CSTR).
52

A distributed kernel summation framework for machine learning and scientific applications

Lee, Dong Ryeol 11 May 2012 (has links)
The class of computational problems I consider in this thesis share the common trait of requiring consideration of pairs (or higher-order tuples) of data points. I focus on the problem of kernel summation operations ubiquitous in many data mining and scientific algorithms. In machine learning, kernel summations appear in popular kernel methods which can model nonlinear structures in data. Kernel methods include many non-parametric methods such as kernel density estimation, kernel regression, Gaussian process regression, kernel PCA, and kernel support vector machines (SVM). In computational physics, kernel summations occur inside the classical N-body problem for simulating positions of a set of celestial bodies or atoms. This thesis attempts to marry, for the first time, the best relevant techniques in parallel computing, where kernel summations are in low dimensions, with the best general-dimension algorithms from the machine learning literature. We provide a unified, efficient parallel kernel summation framework that can utilize: (1) various types of deterministic and probabilistic approximations that may be suitable for both low and high-dimensional problems with a large number of data points; (2) indexing the data using any multi-dimensional binary tree with both distributed memory (MPI) and shared memory (OpenMP/Intel TBB) parallelism; (3) a dynamic load balancing scheme to adjust work imbalances during the computation. I will first summarize my previous research in serial kernel summation algorithms. This work started from Greengard/Rokhlin's earlier work on fast multipole methods for the purpose of approximating potential sums of many particles. The contributions of this part of this thesis include the followings: (1) reinterpretation of Greengard/Rokhlin's work for the computer science community; (2) the extension of the algorithms to use a larger class of approximation strategies, i.e. probabilistic error bounds via Monte Carlo techniques; (3) the multibody series expansion: the generalization of the theory of fast multipole methods to handle interactions of more than two entities; (4) the first O(N) proof of the batch approximate kernel summation using a notion of intrinsic dimensionality. Then I move onto the problem of parallelization of the kernel summations and tackling the scaling of two other kernel methods, Gaussian process regression (kernel matrix inversion) and kernel PCA (kernel matrix eigendecomposition). The artifact of this thesis has contributed to an open-source machine learning package called MLPACK which has been first demonstrated at the NIPS 2008 and subsequently at the NIPS 2011 Big Learning Workshop. Completing a portion of this thesis involved utilization of high performance computing resource at XSEDE (eXtreme Science and Engineering Discovery Environment) and NERSC (National Energy Research Scientific Computing Center).
53

Kernel Methods for Genes and Networks to Study Genome-Wide Associations of Lung Cancer and Rheumatoid Arthritis

Freytag, Saskia 08 January 2014 (has links)
No description available.
54

Image Analysis Applications of the Maximum Mean Discrepancy Distance Measure

Diu, Michael January 2013 (has links)
The need to quantify distance between two groups of objects is prevalent throughout the signal processing world. The difference of group means computed using the Euclidean, or L2 distance, is one of the predominant distance measures used to compare feature vectors and groups of vectors, but many problems arise with it when high data dimensionality is present. Maximum mean discrepancy (MMD) is a recent unsupervised kernel-based pattern recognition method which may improve differentiation between two distinct populations over many commonly used methods such as the difference of means, when paired with the proper feature representations and kernels. MMD-based distance computation combines many powerful concepts from the machine learning literature, such as data distribution-leveraging similarity measures and kernel methods for machine learning. Due to this heritage, we posit that dissimilarity-based classification and changepoint detection using MMD can lead to enhanced separation between different populations. To test this hypothesis, we conduct studies comparing MMD and the difference of means in two subareas of image analysis and understanding: first, to detect scene changes in video in an unsupervised manner, and secondly, in the biomedical imaging field, using clinical ultrasound to assess tumor response to treatment. We leverage effective computer vision data descriptors, such as the bag-of-visual-words and sparse combinations of SIFT descriptors, and choose from an assessment of several similarity kernels (e.g. Histogram Intersection, Radial Basis Function) in order to engineer useful systems using MMD. Promising improvements over the difference of means, measured primarily using precision/recall for scene change detection, and k-nearest neighbour classification accuracy for tumor response assessment, are obtained in both applications.
55

Algoritmos online baseados em vetores suporte para regressão clássica e ortogonal

Souza, Roberto Carlos Soares Nalon Pereira 21 February 2013 (has links)
Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-05-30T20:07:56Z No. of bitstreams: 1 robertocarlossoaresnalonpereirasouza.pdf: 1346845 bytes, checksum: e248f967f42f4ef763b613dc39ed0649 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-06-01T11:51:04Z (GMT) No. of bitstreams: 1 robertocarlossoaresnalonpereirasouza.pdf: 1346845 bytes, checksum: e248f967f42f4ef763b613dc39ed0649 (MD5) / Made available in DSpace on 2017-06-01T11:51:04Z (GMT). No. of bitstreams: 1 robertocarlossoaresnalonpereirasouza.pdf: 1346845 bytes, checksum: e248f967f42f4ef763b613dc39ed0649 (MD5) Previous issue date: 2013-02-21 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Neste trabalho apresenta-se uma nova formulação para regressão ortogonal. O problema é definido como a minimização do risco empírico em relação a uma função de perda com tubo desenvolvida para regressão ortogonal, chamada ρ-insensível. Um algoritmo para resolver esse problema é proposto, baseado na abordagem da descida do gradiente estocástica. Quando formulado em variáveis duais o método permite a introdução de funções kernel e flexibilidade do tubo. Até onde se sabe, este é o primeiro método que permite a introdução de kernels, através do chamado “kernel-trick”, para regressão ortogonal. Apresenta-se ainda um algoritmo para regressão clássica que usa a função de perda ε-insensível e segue também a abordagem da descida do gradiente. Para esse algo ritmo apresenta-se uma prova de convergência que garante um número finito de correções. Finalmente, introduz-se uma estratégia incremental que pode ser usada acoplada com ambos os algoritmos para obter soluções esparsas e também uma aproximação para o “tubo mínimo”que contém os dados. Experimentos numéricos são apresentados e os resultados comparados a outros métodos da literatura. / In this work, we introduce a new formulation for orthogonal regression. The problem is defined as minimization of the empirical risk with respect to a tube loss function de veloped for orthogonal regression, named ρ-insensitive. The method is constructed via an stochastic gradient descent approach. The algorithm can be used in primal or in dual variables. The latter formulation allows the introduction of kernels and soft margins. To the best of our knowledge, this is the first method that allows the introduction of kernels via the so-called “kernel-trick” for orthogonal regression. Also, we present an algorithm to solve the classical regression problem using the ε-insensitive loss function. A conver gence proof that guarantees a finite number of updates is presented for this algorithm. In addition, an incremental strategy algorithm is introduced, which can be used to find sparse solutions and also an approximation to the “minimal tube” containing the data. Numerical experiments are shown and the results compared with other methods.
56

Learning discriminative models from structured multi-sensor data for human context recognition

Suutala, J. (Jaakko) 17 June 2012 (has links)
Abstract In this work, statistical machine learning and pattern recognition methods were developed and applied to sensor-based human context recognition. More precisely, we concentrated on an effective discriminative learning framework, where input-output mapping is learned directly from a labeled dataset. Non-parametric discriminative classification and regression models based on kernel methods were applied. They include support vector machines (SVM) and Gaussian processes (GP), which play a central role in modern statistical machine learning. Based on these established models, we propose various extensions for handling structured data that usually arise from real-life applications, for example, in a field of context-aware computing. We applied both SVM and GP techniques to handle data with multiple classes in a structured multi-sensor domain. Moreover, a framework for combining data from several sources in this setting was developed using multiple classifiers and fusion rules, where kernel methods are used as base classifiers. We developed two novel methods for handling sequential input and output data. For sequential time-series data, a novel kernel based on graphical presentation, called a weighted walk-based graph kernel (WWGK), is introduced. For sequential output labels, discriminative temporal smoothing (DTS) is proposed. Again, the proposed algorithms are modular, so different kernel classifiers can be used as base models. Finally, we propose a group of techniques based on Gaussian process regression (GPR) and particle filtering (PF) to learn to track multiple targets. We applied the proposed methodology to three different human-motion-based context recognition applications: person identification, person tracking, and activity recognition, where floor (pressure-sensitive and binary switch) and wearable acceleration sensors are used to measure human motion and gait during walking and other activities. Furthermore, we extracted a useful set of specific high-level features from raw sensor measurements based on time, frequency, and spatial domains for each application. As a result, we developed practical extensions to kernel-based discriminative learning to handle many kinds of structured data applied to human context recognition. / Tiivistelmä Tässä työssä kehitettiin ja sovellettiin tilastollisen koneoppimisen ja hahmontunnistuksen menetelmiä anturipohjaiseen ihmiseen liittyvän tilannetiedon tunnistamiseen. Esitetyt menetelmät kuuluvat erottelevan oppimisen viitekehykseen, jossa ennustemalli sisääntulomuuttujien ja vastemuuttujan välille voidaan oppia suoraan tunnetuilla vastemuuttujilla nimetystä aineistosta. Parametrittomien erottelevien mallien oppimiseen käytettiin ydinmenetelmiä kuten tukivektorikoneita (SVM) ja Gaussin prosesseja (GP), joita voidaan pitää yhtenä modernin tilastollisen koneoppimisen tärkeimmistä menetelmistä. Työssä kehitettiin näihin menetelmiin liittyviä laajennuksia, joiden avulla rakenteellista aineistoa voidaan mallittaa paremmin reaalimaailman sovelluksissa, esimerkiksi tilannetietoisen laskennan sovellusalueella. Tutkimuksessa sovellettiin SVM- ja GP-menetelmiä moniluokkaisiin luokitteluongelmiin rakenteellisen monianturitiedon mallituksessa. Useiden tietolähteiden käsittelyyn esitetään menettely, joka yhdistää useat opetetut luokittelijat päätöstason säännöillä lopulliseksi malliksi. Tämän lisäksi aikasarjatiedon käsittelyyn kehitettiin uusi graafiesitykseen perustuva ydinfunktio sekä menettely sekventiaalisten luokkavastemuuttujien käsittelyyn. Nämä voidaan liittää modulaarisesti ydinmenetelmiin perustuviin erotteleviin luokittelijoihin. Lopuksi esitetään tekniikoita usean liikkuvan kohteen seuraamiseen. Menetelmät perustuvat anturitiedosta oppivaan GP-regressiomalliin ja partikkelisuodattimeen. Työssä esitettyjä menetelmiä sovellettiin kolmessa ihmisen liikkeisiin liittyvässä tilannetiedon tunnistussovelluksessa: henkilön biometrinen tunnistaminen, henkilöiden seuraaminen sekä aktiviteettien tunnistaminen. Näissä sovelluksissa henkilön asentoa, liikkeitä ja astuntaa kävelyn ja muiden aktiviteettien aikana mitattiin kahdella erilaisella paineherkällä lattia-anturilla sekä puettavilla kiihtyvyysantureilla. Tunnistusmenetelmien laajennuksien lisäksi jokaisessa sovelluksessa kehitettiin menetelmiä signaalin segmentointiin ja kuvaavien piirteiden irroittamiseen matalantason anturitiedosta. Tutkimuksen tuloksena saatiin parannuksia erottelevien mallien oppimiseen rakenteellisesta anturitiedosta sekä erityisesti uusia menettelyjä tilannetiedon tunnistamiseen.
57

A multi-source perspective on inter-subject learning : Contributions to neuroimaging

Takerkart, Sylvain 24 September 2015 (has links)
L’apprentissage inter-sujet consiste à fournir des prédictions sur des données d'un sujet humain non présent dans la base d’apprentissage, comme dans l’aide au diagnostic où un ordinateur doit prédire si un sujet inconnu est sain ou malade. Dans cette thèse, nous défendons le point de vue que ce problème doit être formalisé dans le cadre multi-source, où chaque sujet d’apprentissage fournit une source de données. Nous présentons ensuite trois contributions destinées à des applications en neuroimagerie.La première est une méthode de prédiction inter-sujet pour données d'IRM fonctionnelle. La variabilité inter-sujet fait que les espaces d’entrée sont tous différents. Nous construisons un espace commun à l'aide de graphes et d'un noyau de graphe, qui projette ces données dans un espace de hilbert à noyau reproduisant. Nous démontrons l’efficacité de cette approche sur des données de tonotopie enregistrées dans le cortex auditif.La deuxième est une méthode de morphométrie corticale. Nous construisons des graphes à partir des extrema de profondeur du cortex, que nous projetons dans un espace commun grâce à un noyau de graphe. Une méthode d’inférence spatiale permet l’identification des zones du cortex qui présentent des différences entre populations. Nous étudions avec cette méthode les asymétries corticales et les différences inter-sexe.La troisième est une méthode d’adaptation de domaine multi-source. Nous décrivons une extension du kernel mean matching au cas où l’ensemble d’apprentissage se compose de plusieurs sources de données et des résultats préliminaires sur une tâche de classification inter-sujet dans une expérience de magnéto-encéphalographie. / Inter-subject learning consists in giving predictions on data from a subject not present in the training database, as with computer-aided diagnosis where the computer has to guess wether an unknown individual is healthy or sick. In this thesis, we argue that inter-subject learning should be handled in the multi-source framework where each subject is a different source of data. We then introduce three original contributions for neuroimaging applications.The first one is a method for inter-subject predictions of fMRI data. Because of the inter-subject variability, the original feature spaces are all different. Using graphs and a graph kernel, the input patterns are implicitly projected into a common reproducing kernel hilbert space. We show the effectiveness of this method on tonotopy data recorded in the auditory cortex.The second one is a cortical morphometry method. We design graphs from the deepest points of cortical sulci, and we project them into a common space using a graph kernel. A spatial inference method is then proposed to perform the detection of cortical zones where populations are different. Using this method, we study cortical asymmetries and gender differences.The third contribution of this thesis is a multi-source domain adaptation technique. Our method is an extension of the kernel mean matching for the multi-source case. We present preliminary results on a inter-subject prediction task used to analyse data from a magneto-encephalography experiment.
58

Forecasting hourly electricity consumption for sets of households using machine learning algorithms

Linton, Thomas January 2015 (has links)
To address inefficiency, waste, and the negative consequences of electricity generation, companies and government entities are looking to behavioural change among residential consumers. To drive behavioural change, consumers need better feedback about their electricity consumption. A monthly or quarterly bill provides the consumer with almost no useful information about the relationship between their behaviours and their electricity consumption. Smart meters are now widely dispersed in developed countries and they are capable of providing electricity consumption readings at an hourly resolution, but this data is mostly used as a basis for billing and not as a tool to assist the consumer in reducing their consumption. One component required to deliver innovative feedback mechanisms is the capability to forecast hourly electricity consumption at the household scale. The work presented by this thesis is an evaluation of the effectiveness of a selection of kernel based machine learning methods at forecasting the hourly aggregate electricity consumption for different sized sets of households. The work of this thesis demonstrates that k-Nearest Neighbour Regression and Gaussian process Regression are the most accurate methods within the constraints of the problem considered. In addition to accuracy, the advantages and disadvantages of each machine learning method are evaluated, and a simple comparison of each algorithms computational performance is made. / För att ta itu med ineffektivitet, avfall, och de negativa konsekvenserna av elproduktion så vill företag och myndigheter se beteendeförändringar bland hushållskonsumenter. För att skapa beteendeförändringar så behöver konsumenterna bättre återkoppling när det gäller deras elförbrukning. Den nuvarande återkopplingen i en månads- eller kvartalsfaktura ger konsumenten nästan ingen användbar information om hur deras beteenden relaterar till deras konsumtion. Smarta mätare finns nu överallt i de utvecklade länderna och de kan ge en mängd information om bostäders konsumtion, men denna data används främst som underlag för fakturering och inte som ett verktyg för att hjälpa konsumenterna att minska sin konsumtion. En komponent som krävs för att leverera innovativa återkopplingsmekanismer är förmågan att förutse elförbrukningen på hushållsskala. Arbetet som presenteras i denna avhandling är en utvärdering av noggrannheten hos ett urval av kärnbaserad maskininlärningsmetoder för att förutse den sammanlagda förbrukningen för olika stora uppsättningar av hushåll. Arbetet i denna avhandling visar att "k-Nearest Neighbour Regression" och "Gaussian Process Regression" är de mest exakta metoder inom problemets begränsningar. Förutom noggrannhet, så görs en utvärdering av fördelar, nackdelar och prestanda hos varje maskininlärningsmetod.
59

Nonlinear System Identification with Kernels : Applications of Derivatives in Reproducing Kernel Hilbert Spaces / Contribution à l'identification des systèmes non-linéaires par des méthodes à noyaux

Bhujwalla, Yusuf 05 December 2017 (has links)
Cette thèse se concentrera exclusivement sur l’application de méthodes non paramétriques basées sur le noyau à des problèmes d’identification non-linéaires. Comme pour les autres méthodes non-linéaires, deux questions clés dans l’identification basée sur le noyau sont les questions de comment définir un modèle non-linéaire (sélection du noyau) et comment ajuster la complexité du modèle (régularisation). La contribution principale de cette thèse est la présentation et l’étude de deux critères d’optimisation (un existant dans la littérature et une nouvelle proposition) pour l’approximation structurale et l’accord de complexité dans l’identification de systèmes non-linéaires basés sur le noyau. Les deux méthodes sont basées sur l’idée d’intégrer des contraintes de complexité basées sur des caractéristiques dans le critère d’optimisation, en pénalisant les dérivées de fonctions. Essentiellement, de telles méthodes offrent à l’utilisateur une certaine souplesse dans la définition d’une fonction noyau et dans le choix du terme de régularisation, ce qui ouvre de nouvelles possibilités quant à la facon dont les modèles non-linéaires peuvent être estimés dans la pratique. Les deux méthodes ont des liens étroits avec d’autres méthodes de la littérature, qui seront examinées en détail dans les chapitres 2 et 3 et formeront la base des développements ultérieurs de la thèse. Alors que l’analogie sera faite avec des cadres parallèles, la discussion sera ancrée dans le cadre de Reproducing Kernel Hilbert Spaces (RKHS). L’utilisation des méthodes RKHS permettra d’analyser les méthodes présentées d’un point de vue à la fois théorique et pratique. De plus, les méthodes développées seront appliquées à plusieurs «études de cas» d’identification, comprenant à la fois des exemples de simulation et de données réelles, notamment : • Détection structurelle dans les systèmes statiques non-linéaires. • Contrôle de la fluidité dans les modèles LPV. • Ajustement de la complexité à l’aide de pénalités structurelles dans les systèmes NARX. • Modelisation de trafic internet par l’utilisation des méthodes à noyau / This thesis will focus exclusively on the application of kernel-based nonparametric methods to nonlinear identification problems. As for other nonlinear methods, two key questions in kernel-based identification are the questions of how to define a nonlinear model (kernel selection) and how to tune the complexity of the model (regularisation). The following chapter will discuss how these questions are usually dealt with in the literature. The principal contribution of this thesis is the presentation and investigation of two optimisation criteria (one existing in the literature and one novel proposition) for structural approximation and complexity tuning in kernel-based nonlinear system identification. Both methods are based on the idea of incorporating feature-based complexity constraints into the optimisation criterion, by penalising derivatives of functions. Essentially, such methods offer the user flexibility in the definition of a kernel function and the choice of regularisation term, which opens new possibilities with respect to how nonlinear models can be estimated in practice. Both methods bear strong links with other methods from the literature, which will be examined in detail in Chapters 2 and 3 and will form the basis of the subsequent developments of the thesis. Whilst analogy will be made with parallel frameworks, the discussion will be rooted in the framework of Reproducing Kernel Hilbert Spaces (RKHS). Using RKHS methods will allow analysis of the methods presented from both a theoretical and a practical point-of-view. Furthermore, the methods developed will be applied to several identification ‘case studies’, comprising of both simulation and real-data examples, notably: • Structural detection in static nonlinear systems. • Controlling smoothness in LPV models. • Complexity tuning using structural penalties in NARX systems. • Internet traffic modelling using kernel methods
60

Modelos de aprendizado supervisionado usando métodos kernel, conjuntos fuzzy e medidas de probabilidade / Supervised machine learning models using kernel methods, probability measures and fuzzy sets

Guevara Díaz, Jorge Luis 04 May 2015 (has links)
Esta tese propõe uma metodologia baseada em métodos de kernel, teoria fuzzy e probabilidade para tratar conjuntos de dados cujas observações são conjuntos de pontos. As medidas de probabilidade e os conjuntos fuzzy são usados para modelar essas observações. Posteriormente, graças a kernels definidos sobre medidas de probabilidade, ou em conjuntos fuzzy, é feito o mapeamento implícito dessas medidas de probabilidade, ou desses conjuntos fuzzy, para espaços de Hilbert com kernel reproduzível, onde a análise pode ser feita com algum método kernel. Usando essa metodologia, é possível fazer frente a uma ampla gamma de problemas de aprendizado para esses conjuntos de dados. Em particular, a tese apresenta o projeto de modelos de descrição de dados para observações modeladas com medidas de probabilidade. Isso é conseguido graças ao mergulho das medidas de probabilidade nos espaços de Hilbert, e a construção de esferas envolventes mínimas nesses espaços de Hilbert. A tese apresenta como esses modelos podem ser usados como classificadores de uma classe, aplicados na tarefa de detecção de anomalias grupais. No caso que as observações sejam modeladas por conjuntos fuzzy, a tese propõe mapear esses conjuntos fuzzy para os espaços de Hilbert com kernel reproduzível. Isso pode ser feito graças à projeção de novos kernels definidos sobre conjuntos fuzzy. A tese apresenta como esses novos kernels podem ser usados em diversos problemas como classificação, regressão e na definição de distâncias entre conjuntos fuzzy. Em particular, a tese apresenta a aplicação desses kernels em problemas de classificação supervisionada em dados intervalares e teste kernel de duas amostras para dados contendo atributos imprecisos. / This thesis proposes a methodology based on kernel methods, probability measures and fuzzy sets, to analyze datasets whose individual observations are itself sets of points, instead of individual points. Fuzzy sets and probability measures are used to model observations; and kernel methods to analyze the data. Fuzzy sets are used when the observation contain imprecise, vague or linguistic values. Whereas probability measures are used when the observation is given as a set of multidimensional points in a $D$-dimensional Euclidean space. Using this methodology, it is possible to address a wide range of machine learning problems for such datasets. Particularly, this work presents data description models when observations are modeled by probability measures. Those description models are applied to the group anomaly detection task. This work also proposes a new class of kernels, \\emph{the kernels on fuzzy sets}, that are reproducing kernels able to map fuzzy sets to a geometric feature spaces. Those kernels are similarity measures between fuzzy sets. We give from basic definitions to applications of those kernels in machine learning problems as supervised classification and a kernel two-sample test. Potential applications of those kernels include machine learning and patter recognition tasks over fuzzy data; and computational tasks requiring a similarity measure estimation between fuzzy sets.

Page generated in 0.0586 seconds