• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 239
  • 72
  • 28
  • 28
  • 18
  • 9
  • 9
  • 9
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 487
  • 487
  • 487
  • 159
  • 136
  • 113
  • 111
  • 82
  • 78
  • 73
  • 73
  • 65
  • 63
  • 57
  • 52
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
371

多項分配之分類方法比較與實證研究 / An empirical study of classification on multinomial data

高靖翔, Kao, Ching Hsiang Unknown Date (has links)
由於電腦科技的快速發展,網際網路(World Wide Web;簡稱WWW)使得資料共享及搜尋更為便利,其中的網路搜尋引擎(Search Engine)更是尋找資料的利器,最知名的「Google」公司就是藉由搜尋引擎而發跡。網頁搜尋多半依賴各網頁的特徵,像是熵(Entropy)即是最為常用的特徵指標,藉由使用者選取「關鍵字詞」,找出與使用者最相似的網頁,換言之,找出相似指標函數最高的網頁。藉由相似指標函數分類也常見於生物學及生態學,但多半會計算兩個社群間的相似性,再判定兩個社群是否相似,與搜尋引擎只計算單一社群的想法不同。 本文的目標在於研究若資料服從多項分配,特別是似幾何分配的多項分配(許多生態社群都滿足這個假設),單一社群的指標、兩個社群間的相似指標,何者會有較佳的分類正確性。本文考慮的指標包括單一社群的熵及Simpson指標、兩社群間的熵及相似指標(Yue and Clayton, 2005)、支持向量機(Support Vector Machine)、邏輯斯迴歸等方法,透過電腦模擬及交叉驗證(cross-validation)比較方法的優劣。本文發現單一社群熵指標之表現,在本文的模擬研究有不錯的分類結果,甚至普遍優於支持向量機,但單一社群熵指標分類法的結果並不穩定,為該分類方法之主要缺點。 / Since computer science had changed rapidly, the worldwide web made it much easier to share and receive the information. Search engines would be the ones to help us find the target information conveniently. The famous Google was also founded by the search engine. The searching process is always depends on the characteristics of the web pages, for example, entropy is one of the characteristics index. The target web pages could be found by combining the index with the keywords information given by user. Or in other words, it is to find out the web pages which are the most similar to the user’s demands. In biology and ecology, similarity index function is commonly used for classification problems. But in practice, the pairwise instead of single similarity would be obtained to check if two communities are similar or not. It is dislike the thinking of search engines. This research is to find out which has better classification result between single index and pairwise index for the data which is multinomial distributed, especially distributed like a geometry distribution. This data assumption is often satisfied in ecology area. The following classification methods would be considered into this research: single index including entropy and Simpson index, pairwise index including pairwise entropy and similarity index (Yue and Clayton, 2005), and also support vector machine and logistic regression. Computer simulations and cross validations would also be considered here. In this research, it is found that the single index, entropy, has good classification result than imagine. Sometime using entropy to classify would even better than using support vector machine with raw data. But using entropy to classify is not very robust, it is the one needed to be improved in future.
372

應用共變異矩陣描述子及半監督式學習於行人偵測 / Semi-supervised learning for pedestrian detection with covariance matrix feature

黃靈威, Huang, Ling Wei Unknown Date (has links)
行人偵測為物件偵測領域中一個極具挑戰性的議題。其主要問題在於人體姿勢以及衣著服飾的多變性,加之以光源照射狀況迥異,大幅增加了辨識的困難度。吾人在本論文中提出利用共變異矩陣描述子及結合單純貝氏分類器與級聯支持向量機的線上學習辨識器,以增進行人辨識之正確率與重現率。 實驗結果顯示,本論文所提出之線上學習策略在某些辨識狀況較差之資料集中能有效提升正確率與重現率達百分之十四。此外,即便於相同之初始訓練條件下,在USC Pedestrian Detection Test Set、 INRIA Person dataset 及 Penn-Fudan Database for Pedestrian Detection and Segmentation三個資料集中,本研究之正確率與重現率亦較HOG搭配AdaBoost之行人辨識方式為優。 / Pedestrian detection is an important yet challenging problem in object classification due to flexible body pose, loose clothing and ever-changing illumination. In this thesis, we employ covariance feature and propose an on-line learning classifier which combines naïve Bayes classifier and cascade support vector machine (SVM) to improve the precision and recall rate of pedestrian detection in a still image. Experimental results show that our on-line learning strategy can improve precision and recall rate about 14% in some difficult situations. Furthermore, even under the same initial training condition, our method outperforms HOG + AdaBoost in USC Pedestrian Detection Test Set, INRIA Person dataset and Penn-Fudan Database for Pedestrian Detection and Segmentation.
373

Décoder la localisation de l'attention visuelle spatiale grâce au signal EEG

Thiery, Thomas 09 1900 (has links)
L’attention visuo-spatiale peut être déployée à différentes localisations dans l’espace indépendamment de la direction du regard, et des études ont montré que les composantes des potentiels reliés aux évènements (PRE) peuvent être un index fiable pour déterminer si celle-ci est déployée dans le champ visuel droit ou gauche. Cependant, la littérature ne permet pas d’affirmer qu’il soit possible d’obtenir une localisation spatiale plus précise du faisceau attentionnel en se basant sur le signal EEG lors d’une fixation centrale. Dans cette étude, nous avons utilisé une tâche d’indiçage de Posner modifiée pour déterminer la précision avec laquelle l’information contenue dans le signal EEG peut nous permettre de suivre l’attention visuelle spatiale endogène lors de séquences de stimulation d’une durée de 200 ms. Nous avons utilisé une machine à vecteur de support (MVS) et une validation croisée pour évaluer la précision du décodage, soit le pourcentage de prédictions correctes sur la localisation spatiale connue de l’attention. Nous verrons que les attributs basés sur les PREs montrent une précision de décodage de la localisation du focus attentionnel significative (57%, p<0.001, niveau de chance à 25%). Les réponses PREs ont également prédit avec succès si l’attention était présente ou non à une localisation particulière, avec une précision de décodage de 79% (p<0.001). Ces résultats seront discutés en termes de leurs implications pour le décodage de l’attention visuelle spatiale, et des directions futures pour la recherche seront proposées. / Visuospatial attention can be deployed to different locations in space independently of ocular fixation, and studies have shown that event-related potential (ERP) components can effectively index whether such covert visuospatial attention is deployed to the left or right visual field. However, it is not clear whether we may obtain a more precise spatial localization of the focus of attention based on the EEG signals during central fixation. In this study, we used a modified Posner cueing task with an endogenous cue to determine the degree to which information in the EEG signal can be used to track visual spatial attention in presentation sequences lasting 200 ms. We used a machine learning classification method to evaluate how well EEG signals discriminate between four different locations of the focus of attention. We then used a multi-class support vector machine (SVM) and a leave-one-out cross-validation framework to evaluate the decoding accuracy (DA). We found that ERP-based features from occipital and parietal regions showed a statistically significant valid prediction of the location of the focus of visuospatial attention (DA = 57%, p < .001, chance-level 25%). The mean distance between the predicted and the true focus of attention was 0.62 letter positions, which represented a mean error of 0.55 degrees of visual angle. In addition, ERP responses also successfully predicted whether spatial attention was allocated or not to a given location with an accuracy of 79% (p < .001). These findings are discussed in terms of their implications for visuospatial attention decoding and future paths for research are proposed.
374

PROCESSING AND CLASSIFICATION OF PHYSIOLOGICAL SIGNALS USING WAVELET TRANSFORM AND MACHINE LEARNING ALGORITHMS

Bsoul, Abed Al-Raoof 27 April 2011 (has links)
Over the last century, physiological signals have been broadly analyzed and processed not only to assess the function of the human physiology, but also to better diagnose illnesses or injuries and provide treatment options for patients. In particular, Electrocardiogram (ECG), blood pressure (BP) and impedance are among the most important biomedical signals processed and analyzed. The majority of studies that utilize these signals attempt to diagnose important irregularities such as arrhythmia or blood loss by processing one of these signals. However, the relationship between them is not yet fully studied using computational methods. Therefore, a system that extract and combine features from all physiological signals representative of states such as arrhythmia and loss of blood volume to predict the presence and the severity of such complications is of paramount importance for care givers. This will not only enhance diagnostic methods, but also enable physicians to make more accurate decisions; thereby the overall quality of care provided to patients will improve significantly. In the first part of the dissertation, analysis and processing of ECG signal to detect the most important waves i.e. P, QRS, and T, are described. A wavelet-based method is implemented to facilitate and enhance the detection process. The method not only provides high detection accuracy, but also efficient in regards to memory and execution time. In addition, the method is robust against noise and baseline drift, as supported by the results. The second part outlines a method that extract features from ECG signal in order to classify and predict the severity of arrhythmia. Arrhythmia can be life-threatening or benign. Several methods exist to detect abnormal heartbeats. However, a clear criterion to identify whether the detected arrhythmia is malignant or benign still an open problem. The method discussed in this dissertation will address a novel solution to this important issue. In the third part, a classification model that predicts the severity of loss of blood volume by incorporating multiple physiological signals is elaborated. The features are extracted in time and frequency domains after transforming the signals with Wavelet Transformation (WT). The results support the desirable reliability and accuracy of the system.
375

Classification of Carpiodes Using Fourier Descriptors: A Content Based Image Retrieval Approach

Trahan, Patrick 06 August 2009 (has links)
Taxonomic classification has always been important to the study of any biological system. Many biological species will go unclassified and become lost forever at the current rate of classification. The current state of computer technology makes image storage and retrieval possible on a global level. As a result, computer-aided taxonomy is now possible. Content based image retrieval techniques utilize visual features of the image for classification. By utilizing image content and computer technology, the gap between taxonomic classification and species destruction is shrinking. This content based study utilizes the Fourier Descriptors of fifteen known landmark features on three Carpiodes species: C.carpio, C.velifer, and C.cyprinus. Classification analysis involves both unsupervised and supervised machine learning algorithms. Fourier Descriptors of the fifteen known landmarks provide for strong classification power on image data. Feature reduction analysis indicates feature reduction is possible. This proves useful for increasing generalization power of classification.
376

Bearing Diagnosis Using Fault Signal Enhancing Teqniques and Data-driven Classification

Lembke, Benjamin January 2019 (has links)
Rolling element bearings are a vital part in many rotating machinery, including vehicles. A defective bearing can be a symptom of other problems in the machinery and is due to a high failure rate. Early detection of bearing defects can therefore help to prevent malfunction which ultimately could lead to a total collapse. The thesis is done in collaboration with Scania that wants a better understanding of how external sensors such as accelerometers, can be used for condition monitoring in their gearboxes. Defective bearings creates vibrations with specific frequencies, known as Bearing Characteristic Frequencies, BCF [23]. A key component in the proposed method is based on identification and extraction of these frequencies from vibration signals from accelerometers mounted near the monitored bearing. Three solutions are proposed for automatic bearing fault detection. Two are based on data-driven classification using a set of machine learning methods called Support Vector Machines and one method using only the computed characteristic frequencies from the considered bearing faults. Two types of features are developed as inputs to the data-driven classifiers. One is based on the extracted amplitudes of the BCF and the other on statistical properties from Intrinsic Mode Functions generated by an improved Empirical Mode Decomposition algorithm. In order to enhance the diagnostic information in the vibration signals two pre-processing steps are proposed. Separation of the bearing signal from masking noise are done with the Cepstral Editing Procedure, which removes discrete frequencies from the raw vibration signal. Enhancement of the bearing signal is achieved by band pass filtering and amplitude demodulation. The frequency band is produced by the band selection algorithms Kurtogram and Autogram. The proposed methods are evaluated on two large public data sets considering bearing fault classification using accelerometer data, and a smaller data set collected from a Scania gearbox. The produced features achieved significant separation on the public and collected data. Manual detection of the induced defect on the outer race on the bearing from the gearbox was achieved. Due to the small amount of training data the automatic solutions were only tested on the public data sets. Isolation performance of correct bearing and fault mode among multiplebearings were investigated. One of the best trade offs achieved was 76.39 % fault detection rate with 8.33 % false alarm rate. Another was 54.86 % fault detection rate with 0 % false alarm rate.
377

SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION

Jingzhao Dai (6642491) 11 June 2019 (has links)
<p>Speech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].</p> <p>In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.</p> <p>Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.</p>
378

Empirical RF Propagation Modeling of Human Body Motions for Activity Classification

Fu, Ruijun 19 December 2012 (has links)
"Many current and future medical devices are wearable, using the human body as a conduit for wireless communication, which implies that human body serves as a crucial part of the transmission medium in body area networks (BANs). Implantable medical devices such as Pacemaker and Cardiac Defibrillators are designed to provide patients with timely monitoring and treatment. Endoscopy capsules, pH Monitors and blood pressure sensors are used as clinical diagnostic tools to detect physiological abnormalities and replace traditional wired medical devices. Body-mounted sensors need to be investigated for use in providing a ubiquitous monitoring environment. In order to better design these medical devices, it is important to understand the propagation characteristics of channels for in-body and on- body wireless communication in BANs. The IEEE 802.15.6 Task Group 6 is officially working on the standardization of Body Area Network, including the channel modeling and communication protocol design. This thesis is focused on the propagation characteristics of human body movements. Specifically, standing, walking and jogging motions are measured, evaluated and analyzed using an empirical approach. Using a network analyzer, probabilistic models are derived for the communication links in the medical implant communication service band (MICS), the industrial scientific medical band (ISM) and the ultra- wideband (UWB) band. Statistical distributions of the received signal strength and second order statistics are presented to evaluate the link quality and outage performance for on-body to on- body communications at different antenna separations. The Normal distribution, Gamma distribution, Rayleigh distribution, Weibull distribution, Nakagami-m distribution, and Lognormal distribution are considered as potential models to describe the observed variation of received signal strength. Doppler spread in the frequency domain and coherence time in the time domain from temporal variations is analyzed to characterize the stability of the channels induced by human body movements. The shape of the Doppler spread spectrum is also investigated to describe the relationship of the power and frequency in the frequency domain. All these channel characteristics could be used in the design of communication protocols in BANs, as well as providing features to classify different human body activities. Realistic data extracted from built-in sensors in smart devices were used to assist in modeling and classification of human body movements along with the RF sensors. Variance, energy and frequency domain entropy of the data collected from accelerometer and orientation sensors are pre- processed as features to be used in machine learning algorithms. Activity classifiers with Backpropagation Network, Probabilistic Neural Network, k-Nearest Neighbor algorithm and Support Vector Machine are discussed and evaluated as means to discriminate human body motions. The detection accuracy can be improved with both RF and inertial sensors."
379

Avaliação do estado nutricional de nitrogênio e estimativa da produtividade de biomassa de trigo por meio de mineração de dados de sensoriamento remoto

Stachak, Alessandro 15 March 2018 (has links)
Submitted by Angela Maria de Oliveira (amolivei@uepg.br) on 2018-05-09T13:45:12Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Alessandro Stachak.pdf: 2666037 bytes, checksum: b9d6cfbe55279b9d7942ae1afa0c2115 (MD5) / Made available in DSpace on 2018-05-09T13:45:12Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Alessandro Stachak.pdf: 2666037 bytes, checksum: b9d6cfbe55279b9d7942ae1afa0c2115 (MD5) Previous issue date: 2018-03-15 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Estimar a produtividade de biomassa na agricultura é uma ferramenta chave no manejo da lavoura, gerando informações que podem auxiliar a complexa tomada de decisões no campo. O nitrogênio (N), por ser é um nutriente que participa da estrutura e de funções celulares vitais à planta, apresenta estreita correlação com a produtividade de biomassa, principalmente na cultura do trigo (Triticum aestivum L.). Uma técnica muito utilizada na estimativa de biomassa e estado nutricional de N é o sensoriamento remoto (SR), que consiste na aquisição de informações de um objeto sem existir contato entre o sensor e o alvo. No SR existem três plataformas de obtenção de dados, sendo elas: orbital, por meio de satélites; aéreo, com aviões, helicópteros e aeronaves remotamente pilotadas (RPA); e terrestre, com sensores óticos e espectroradiômetros. Na criação de modelos de estimativa de produtividade de biomassa e de teor foliar de N, as três plataformas do SR são empregadas, já existindo produtos comerciais para tais finalidades. Entretanto, existe carência de informações a respeito da eficiência das tais plataformas em um mesmo estudo de campo. Tradicionalmente, os modelos preditivos com dados de SR na agricultura são gerados por técnicas clássicas de estatística, como a regressão linear. No entanto, técnicas da mineração de dados (MD) podem obter resultados mais relevantes. Dentre as técnicas da MD promissoras, a máquina de vetores de suporte para regressão (SVR), devido a sua grande capacidade de generalização e criação de modelos lineares e não lineares, tem sido empregada em dados de SR. Os objetivos deste trabalho foram:(i) avaliar a correlação entre os dados obtidos a partir das três plataformas do SR na estimativa da produtividade de biomassa seca da parte aérea e da concentração de N nas folhas de trigo, e (ii) comparar os resultados obtidos com a técnica clássica de regressão linear em relação aqueles gerados pela SVR. Para isso, plantas de trigo, cultivar TBIO Sinuelo, foram cultivadas em diferentes ambientes envolvendo manejos distintos de adubação nitrogenada. A avaliação da capacidade dos sensores foi abordada de duas formas: (i) com amostras aleatórias em diferentes estádios de desenvolvimento da cultura do trigo dentro de cada tratamento de adubação nitrogenada, verificando a capacidade do sensor em detectar a variabilidade em áreas com um mesmo tratamento, e (ii) com as médias das amostras em cada tratamento, avaliando a capacidade do sensor em detectar as diferenças provocadas por manejos variados de adubação nitrogenada. Os resultados obtidos demonstraram existência de correlação dos dados gerados pelos equipamentos utilizados (sensor terrestre GREENSEEKER, satélites RAPIDEYE e RPA EBEE) com a produtividade de biomassa seca da parte aérea e a concentração de N nas folhas de trigo. A SVR gerou coeficientes de correlação (r) mais expressivos do que a regressão linear sobre os dados obtidos com todos os equipamentos utilizados. Dentre as plataformas, considerando a abordagem com as amostras aleatórias no campo, os dados gerados com a RPA EBEE apresentaram correlação mais estreita com a estimativa de biomassa da parte aérea e a concentração foliar de N. Já, quando se consideraram as médias dos tratamentos de adubação nitrogenada, tanto a RPA EBEE como os satélites RAPIDEYE apresentaram resultados similares na estimativa de produtividade de biomassa da parte aérea. Porém, para a predição do teor foliar de N, a RPA EBEE proporcionou resultados superiores em relação aos obtidos com os satélites RAPIDEYE. Concluiu-se que a plataforma RPA EBEE foi mais eficiente do que as plataformas terrestre (GREENSEEKER) e orbital (satélites RAPIDEYE) para estimar a produtividade de biomassa da parte aérea e a concentração de N nas folhas de trigo, quando existe maior variabilidade na área de estudo, e que a SVR foi uma técnica mais eficiente do que a regressão linear para análise dos dados das três plataformas: orbital, aérea e terrestre. / Estimating biomass productivity in agriculture is a key part in crop management, providing information that can help the complex decision making in the field. Nitrogen (N), for being a nutrient that participates in the structure and vital cellular functions to the plant, has a close correlation with biomass productivity, mainly in wheat crop (Triticum aestivum L.). Remote sensing (RS), which consists of acquiring information from an object without contact between the sensor and the target, is a widely employed technique in estimating biomass and nutritional status of N. There are three RS platforms for obtaining data: orbital, with satellites; aerial, with aircraft, helicopters and remotely piloted aircraft (RPA); and terrestrial, with optical sensors and spectral radiometers. When determining biomass productivity and N foliar content estimation models, both RS platforms are employed, and commercial products for these purposes already exists. However, there is a lack of information regarding the efficiency of the three platforms in the same experimental area. Traditionally, predictive models with RS data in agriculture are generated by classic statistical techniques, such as linear regression. However, data mining (DM) techniques can provide more relevant results. Due to its generalization capacity and feature of creating linear and nonlinear models, support vector machine for regression (SVR) is a DM technique with intensive use over RS data. The goals of this work were: (i) to evaluate the correlation between data obtained from the three RS platforms for estimating dry biomass productivity and N concentration in wheat leaves, and (ii) to compare the results obtained with a classical linear regression technique against those of the SVR technique. Were cultivated wheat plants, TBIO Sinuelo variety, in different environments involving distinct management of nitrogen fertilization. The sensors evaluation was performed in two ways: (i) with random samples at different wheat crop development stages for each nitrogen fertilization treatment, aiming to verify the sensor ability to detect variability in areas with the same treatment, and (ii) considering the mean value of the samples in each treatment, evaluating the ability of the sensor to detect the differences caused by varied management of nitrogen fertilization. The results showed that data generated by the equipment (GREENSEEKER terrestrial sensor, RAPIDEYE satellites and RPA EBEE) displayed correlation with dry biomass productivity and N concentration in wheat leaves. More expressive correlation coefficients (r) were obtained with SVR against those of linear regression in the data obtained with all equipment used. Considering the approach with the random samples in the field, data generated with the RPA EBEE showed a closer correlation with the biomass estimation and the foliar concentration of N. When considering the mean value of nitrogen fertilization treatments, both RPA EBEE and RAPIDEYE satellites presented similar results for estimating biomass productivity, however, the RPA EBEE provided results slightly higher than those obtained with the RAPIDEYE satellites for the prediction of N foliar content. It was concluded that, for estimating the biomass productivity and the N concentration in the wheat leaves RPA EBEE platform is more efficient than the terrestrial (GREENSEEKER) and orbital platforms (RAPIDEYE satellites) when there is greater variability in the study area. Also, SVR was a more efficient technique than linear regression for data analysis of the three platforms: orbital, aerial and terrestrial.
380

Modelos de aprendizado supervisionado usando métodos kernel, conjuntos fuzzy e medidas de probabilidade / Supervised machine learning models using kernel methods, probability measures and fuzzy sets

Guevara Díaz, Jorge Luis 04 May 2015 (has links)
Esta tese propõe uma metodologia baseada em métodos de kernel, teoria fuzzy e probabilidade para tratar conjuntos de dados cujas observações são conjuntos de pontos. As medidas de probabilidade e os conjuntos fuzzy são usados para modelar essas observações. Posteriormente, graças a kernels definidos sobre medidas de probabilidade, ou em conjuntos fuzzy, é feito o mapeamento implícito dessas medidas de probabilidade, ou desses conjuntos fuzzy, para espaços de Hilbert com kernel reproduzível, onde a análise pode ser feita com algum método kernel. Usando essa metodologia, é possível fazer frente a uma ampla gamma de problemas de aprendizado para esses conjuntos de dados. Em particular, a tese apresenta o projeto de modelos de descrição de dados para observações modeladas com medidas de probabilidade. Isso é conseguido graças ao mergulho das medidas de probabilidade nos espaços de Hilbert, e a construção de esferas envolventes mínimas nesses espaços de Hilbert. A tese apresenta como esses modelos podem ser usados como classificadores de uma classe, aplicados na tarefa de detecção de anomalias grupais. No caso que as observações sejam modeladas por conjuntos fuzzy, a tese propõe mapear esses conjuntos fuzzy para os espaços de Hilbert com kernel reproduzível. Isso pode ser feito graças à projeção de novos kernels definidos sobre conjuntos fuzzy. A tese apresenta como esses novos kernels podem ser usados em diversos problemas como classificação, regressão e na definição de distâncias entre conjuntos fuzzy. Em particular, a tese apresenta a aplicação desses kernels em problemas de classificação supervisionada em dados intervalares e teste kernel de duas amostras para dados contendo atributos imprecisos. / This thesis proposes a methodology based on kernel methods, probability measures and fuzzy sets, to analyze datasets whose individual observations are itself sets of points, instead of individual points. Fuzzy sets and probability measures are used to model observations; and kernel methods to analyze the data. Fuzzy sets are used when the observation contain imprecise, vague or linguistic values. Whereas probability measures are used when the observation is given as a set of multidimensional points in a $D$-dimensional Euclidean space. Using this methodology, it is possible to address a wide range of machine learning problems for such datasets. Particularly, this work presents data description models when observations are modeled by probability measures. Those description models are applied to the group anomaly detection task. This work also proposes a new class of kernels, \\emph{the kernels on fuzzy sets}, that are reproducing kernels able to map fuzzy sets to a geometric feature spaces. Those kernels are similarity measures between fuzzy sets. We give from basic definitions to applications of those kernels in machine learning problems as supervised classification and a kernel two-sample test. Potential applications of those kernels include machine learning and patter recognition tasks over fuzzy data; and computational tasks requiring a similarity measure estimation between fuzzy sets.

Page generated in 0.0686 seconds