Global ETD Search

121	Functional data mining with multiscale statistical procedures Lee, Kichun 01 July 2010 (has links) Hurst exponent and variance are two quantities that often characterize real-life, highfrequency observations. We develop the method for simultaneous estimation of a timechanging Hurst exponent H(t) and constant scale (variance) parameter C in a multifractional Brownian motion model in the presence of white noise based on the asymptotic behavior of the local variation of its sample paths. We also discuss the accuracy of the stable and simultaneous estimator compared with a few selected methods and the stability of computations that use adapted wavelet filters. Multifractals have become popular as flexible models in modeling real-life data of high frequency. We developed a method of testing whether the data of high frequency is consistent with monofractality using meaningful descriptors coming from a wavelet-generated multifractal spectrum. We discuss theoretical properties of the descriptors, their computational implementation, the use in data mining, and the effectiveness in the context of simulations, an application in turbulence, and analysis of coding/noncoding regions in DNA sequences. The wavelet thresholding is a simple and effective operation in wavelet domains that selects the subset of wavelet coefficients from a noised signal. We propose the selection of this subset in a semi-supervised fashion, in which a neighbor structure and classification function appropriate for wavelet domains are utilized. The decision to include an unlabeled coefficient in the model depends not only on its magnitude but also on the labeled and unlabeled coefficients from its neighborhood. The theoretical properties of the method are discussed and its performance is demonstrated on simulated examples. Multifractality Wavelets Hurst exponent Fractional Brownian motion Multifractional Brownian motion Semi-supervised learning Data mining Correlation (Statistics) Wavelets (Mathematics) Supervised learning (Machine learning) Machine learning
122	Semi-Supervised Classification Using Gaussian Processes Patel, Amrish 01 1900 (has links) Gaussian Processes (GPs) are promising Bayesian methods for classiﬁcation and regression problems. They have also been used for semi-supervised classiﬁcation tasks. In this thesis, we propose new algorithms for solving semi-supervised binary classiﬁcation problem using GP regression (GPR) models. The algorithms are closely related to semi-supervised classiﬁcation based on support vector regression (SVR) and maximum margin clustering. The proposed algorithms are simple and easy to implement. Also, the hyper-parameters are estimated without resorting to expensive cross-validation technique. The algorithm based on sparse GPR model gives a sparse solution directly unlike the SVR based algorithm. Use of sparse GPR model helps in making the proposed algorithm scalable. The results of experiments on synthetic and real-world datasets demonstrate the eﬃcacy of proposed sparse GP based algorithm for semi-supervised classiﬁcation. Classification (A I) Gaussian Processes Gaussian Process Regression (GPR) Support Vector Regression (SVR) Classification Models Semi-supervised Learning Computer Science
123	Large-scale semi-supervised learning for natural language processing Bergsma, Shane A Unknown Date No description available. natural language processing semi-supervised learning NLP web-scale N-gram selectional preference string similarity non-referential pronoun pleonastic pronoun non-anaphoric pronoun computational linguistics
124	Adaptive Graph-Based Algorithms for Conditional Anomaly Detection and Semi-Supervised Learning Valko, Michal 01 August 2011 (has links) (PDF) We develop graph-based methods for semi-supervised learning based on label propagation on a data similarity graph. When data is abundant or arrive in a stream, the problems of computation and data storage arise for any graph-based method. We propose a fast approximate online algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby points into a set of local representative points that minimize distortion. Moreover, we regularize the harmonic solution to achieve better stability properties. We also present graph-based methods for detecting conditional anomalies and apply them to the identification of unusual clinical actions in hospitals. Our hypothesis is that patient-management actions that are unusual with respect to the past patients may be due to errors and that it is worthwhile to raise an alert if such a condition is encountered. Conditional anomaly detection extends standard unconditional anomaly framework but also faces new problems known as fringe and isolated points. We devise novel nonparametric graph-based methods to tackle these problems. Our methods rely on graph connectivity analysis and soft harmonic solution. Finally, we conduct an extensive human evaluation study of our conditional anomaly methods by 15 experts in critical care. [STAT:OT] Statistics/Other Statistics [STAT:OT] Statistiques/Autres Machine Learning Anomaly Detection Graph-Based Learning Online Learning Adaptive Learning Semi-Supervised Learning
125	Enhanced classification approach with semi-supervised learning for reliability-based system design Patel, Jiten 02 July 2012 (has links) Traditionally design engineers have used the Factor of Safety method for ensuring that designs do not fail in the field. Access to advanced computational tools and resources have made this process obsolete and new methods to introduce higher levels of reliability in an engineering systems are currently being investigated. However, even though high computational resources are available the computational resources required by reliability analysis procedures leave much to be desired. Furthermore, the regression based surrogate modeling techniques fail when there is discontinuity in the design space, caused by failure mechanisms, when the design is required to perform under severe externalities. Hence, in this research we propose efficient Semi-Supervised Learning based surrogate modeling techniques that will enable accurate estimation of a system's response, even under discontinuity. These methods combine the available set of labeled dataset and unlabeled dataset and provide better models than using labeled data alone. Labeled data is expensive to obtain since the responses have to be evaluated whereas unlabeled data is available in plenty, during reliability estimation, since the PDF information of uncertain variables is assumed to be known. This superior performance is gained by combining the efficiency of Probabilistic Neural Networks (PNN) for classification and Expectation-Maximization (EM) algorithm for treating the unlabeled data as labeled data with hidden labels. Labeled and unlabeled data Semi-supervised learning Probability of failure Structural reliability System design Classification Safety factor in engineering Reliability (Engineering) Surrogate-based optimization Supervised learning (Machine learning) Expectation-maximization algorithms
126	Construções de comitês de classificadores multirrótulos no aprendizado semissupervisionado multidescrição Silva, Wilamis Kleiton Nunes da 18 August 2017 (has links) Submitted by Lara Oliveira (lara@ufersa.edu.br) on 2017-09-19T21:25:54Z No. of bitstreams: 1 WilamisKNS_DISSERT.pdf: 2959360 bytes, checksum: f4e2b25f85638d49d61b7b5e7415d3fc (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2017-10-27T13:05:12Z (GMT) No. of bitstreams: 1 WilamisKNS_DISSERT.pdf: 2959360 bytes, checksum: f4e2b25f85638d49d61b7b5e7415d3fc (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2017-10-27T13:08:52Z (GMT) No. of bitstreams: 1 WilamisKNS_DISSERT.pdf: 2959360 bytes, checksum: f4e2b25f85638d49d61b7b5e7415d3fc (MD5) / Made available in DSpace on 2017-10-27T13:09:10Z (GMT). No. of bitstreams: 1 WilamisKNS_DISSERT.pdf: 2959360 bytes, checksum: f4e2b25f85638d49d61b7b5e7415d3fc (MD5) Previous issue date: 2017-08-18 / Multi-label problems have become increasingly common, for a label can be attributed to more than one instance, being called multi-label classification problems. Among the di_erent multilabel classification methods we can mention: BR (Binary Relevance), LP (Label Powerset) And RAkEL (RAndom k labELsets). Such methods have been recognized as methods for transforming the Problem, since they consist of turning the multi-label problem into several problems of traditional classification (mono label). However, the adoption of Classificatory committees in multi-label classification problems has still been new-found so far, With a great field to be explored for conducting researches as well. This work aims of doing a study on the construction of multilabel classifiers committees Built through the application of multi- description semisupervised learning techniques, in order to verify if application of this type of learning in the construction of committees results in improvements linked to the results. The committees of classifiers used in the experiments were Bagging, Boosting and Stacking as methods of transformation of the problems used were the BR, LP and Rakel methods and for classification multi-label multi-label semi-supervised multi-description was used Co-Training. At the end of the experimental analyzes, it was verified that the use of the semi-supervised approach presented satisfactory results, since the two approaches presented similar results / São cada vez mais comum problemas multirrótulos onde um rótulo pode ser atribuído a mais de uma instância, sendo chamados de problemas de classificação multirrótulo. Dentre os diferentes métodos de classificação multirrótulo, podemos citar os métodos BR (Binary Relevance), LP (Label Powerset) e RAkEL (RAndom k-labELsets). Tais métodos são ditos métodos de transformação do problema, pois consistem em transformar o problema multirrótulo em vários problemas de classificação tradicional (monorrótulo).A adoção de comitês de classificadores em problemas de classificação multirrótulo ainda é algo muito recente, com muito a ser explorado para a realização de pesquisas. O objetivo deste trabalho é realizar um estudo sobre a construção de comitês de classificadores multirrótulos construídos através da aplicação das técnicas de aprendizado semissupervisionado multidescrição, a fim de verificar se aplicação desse tipo de aprendizado na construção de comitês acarreta melhorias nos resultados. Os comitês de classificadores utilizados nos experimentos foram o Bagging, Boosting e Stacking como métodos de transformação do problemas foram utilizados os métodos BR, LP e Rakel e para a classificação multirrótulo semissupervisionada multidescrição foi utilizado o Co-Training. Ao fim das análises experimentais verificou-se que a utilização da abordagem semissupervisionado apresentou resultados satisfatórios, uma vez que as duas abordagens supervisionada e semissupervisionada utilizadas no trabalho apresentaram resultados semelhantes / 2017-09-19 Aprendizado de máquina Classificação multirrótulo Comitês de classificadores Machine learning Multi-label classification Classification committees CNPQ::CIENCIAS EXATAS E DA TERRA
127	Utilizando aprendizado emissupervisionado multidescrição em problemas de classificação hierárquica multirrótulo Araújo, Hiury Nogueira de 17 November 2017 (has links) Submitted by Lara Oliveira (lara@ufersa.edu.br) on 2018-03-14T20:25:58Z No. of bitstreams: 1 HiuryNA_DISSERT.pdf: 3188162 bytes, checksum: d40d42a78787557868ebc6d3cd5af945 (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2018-06-18T16:58:58Z (GMT) No. of bitstreams: 1 HiuryNA_DISSERT.pdf: 3188162 bytes, checksum: d40d42a78787557868ebc6d3cd5af945 (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2018-06-18T16:59:18Z (GMT) No. of bitstreams: 1 HiuryNA_DISSERT.pdf: 3188162 bytes, checksum: d40d42a78787557868ebc6d3cd5af945 (MD5) / Made available in DSpace on 2018-06-18T16:59:31Z (GMT). No. of bitstreams: 1 HiuryNA_DISSERT.pdf: 3188162 bytes, checksum: d40d42a78787557868ebc6d3cd5af945 (MD5) Previous issue date: 2017-11-17 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Data classification is a task applied in various areas of knowledge, therefore, the focus of ongoing research. Data classification can be divided according to the available data, which are labeled or not labeled. One approach has proven very effective when working with data sets containing labeled and unlabeled data, this called semi-supervised learning, your objective is to label the unlabeled data by using the amount of labeled data in the data set, improving their success rate. Such data can be classified with more than one label, known as multi-label classification. Furthermore, these data can be organized hierarchically, thus containing a relation therebetween, this called hierarchical classification. This work proposes the use of multi-view semi-supervised learning, which is one of the semissupervisionado learning aspects, in problems of hierarchical multi-label classification, with the objective of investigating whether semi-supervised learning is an appropriate approach to solve the problem of low dimensionality of data. An experimental analysis of the methods found that supervised learning had a better performance than semi-supervised approaches, however, semi-supervised learning may be a widely used approach, because, there is plenty to be contributed in this area / classificação de dados é uma tarefa aplicada em diversas áreas do conhecimento, sendo assim, foco de constantes pesquisas. A classificação de dados pode ser dividida de acordo com a disposição dos dados, sendo estes rotulados ou não rotulados. Uma abordagem vem se mostrando bastante eficiente ao se trabalhar com conjuntos de dados contendo dados rotulados e não rotulados, esta chamada de aprendizado semissupervisionado, seu objetivo é classificar os dados não rotulados através da quantidade de dados rotulados contidos no conjunto, melhorando sua taxa de acerto. Tais dados podem ser classificados com mais de um rótulo, conhecida como classificação multirrótulo. Além disso, estes dados podem estar organizados de forma hierárquica, contendo assim, uma relação entre os mesmos, esta, por sua vez, denominada classificação hierárquica. Neste trabalho é proposto a utilização do aprendizado semissupervisionado multidescrição, que é uma das vertentes do aprendizado semissupervisionado, em problemas de classificação hierárquica multirrótulo, com o objetivo de investigar se o aprendizado semissupervisionado é uma abordagem apropriada para resolver o problema de baixa dimensionalidade de dados. Uma análise experimental dos métodos verificou que o aprendizado supervisionado obteve melhor desempenho contra as abordagens semissupervisionadas, contudo, o aprendizado semissupervisionado pode vir a ser uma abordagem amplamente utilizada, pois, há bastante o que ser contribuído nesta área / 2018-03-14 Aprendizado semissupervisionado Classificação hierárquica Multirrótulo Co-training Self-training Semi-supervised learning Hierarchical multi-label classification Co-training Self-training CNPQ::CIENCIAS EXATAS E DA TERRA
128	Apports des ontologies à l'analyse exploratoire des images satellitaires / Contribution of ontologies to the exploratory analysis of satellite images Chahdi, Hatim 04 July 2017 (has links) A l'heure actuelle, les images satellites constituent une source d'information incontournable face à de nombreux enjeux environnementaux (déforestation, caractérisation des paysages, aménagement du territoire, etc.). En raison de leur complexité, de leur volume important et des besoins propres à chaque communauté, l'analyse et l'interprétation des images satellites imposent de nouveaux défis aux méthodes de fouille de données. Le parti-pris de cette thèse est d'explorer de nouvelles approches, que nous situons à mi-chemin entre représentation des connaissances et apprentissage statistique, dans le but de faciliter et d'automatiser l'extraction d'informations pertinentes du contenu de ces images. Nous avons, pour cela, proposé deux nouvelles méthodes qui considèrent les images comme des données quantitatives massives dépourvues de labels sémantiques et qui les traitent en se basant sur les connaissances disponibles. Notre première contribution est une approche hybride, qui exploite conjointement le raisonnement à base d'ontologie et le clustering semi-supervisé. Le raisonnement permet l'étiquetage sémantique des pixels à partir de connaissances issues du domaine concerné. Les labels générés guident ensuite la tâche de clustering, qui permet de découvrir de nouvelles classes tout en enrichissant l'étiquetage initial. Notre deuxième contribution procède de manière inverse. Dans un premier temps, l'approche s'appuie sur un clustering topographique pour résumer les données en entrée et réduire de ce fait le nombre de futures instances à traiter par le raisonnement. Celui-ci n'est alors appliqué que sur les prototypes résultant du clustering, l'étiquetage est ensuite propagé automatiquement à l'ensemble des données de départ. Dans ce cas, l'importance est portée sur l'optimisation du temps de raisonnement et à son passage à l'échelle. Nos deux approches ont été testées et évaluées dans le cadre de la classification et de l'interprétation d'images satellites. Les résultats obtenus sont prometteurs et montrent d'une part, que la qualité de la classification peut être améliorée par une prise en compte automatique des connaissances et que l'implication des experts peut être allégée, et d'autre part, que le recours au clustering topographique en amont permet d'éviter le calcul des inférences sur la totalité des pixels de l'image. / Satellite images have become a valuable source of information for Earth observation. They are used to address and analyze multiple environmental issues such as landscapes characterization, urban planning or biodiversity conservation to cite a few.Despite of the large number of existing knowledge extraction techniques, the complexity of satellite images, their large volume, and the specific needs of each community of practice, give rise to new challenges and require the development of highly efficient approaches.In this thesis, we investigate the potential of intelligent combination of knowledge representation systems with statistical learning. Our goal is to develop novel methods which allow automatic analysis of remote sensing images. We elaborate, in this context, two new approaches that consider the images as unlabeled quantitative data and examine the possible use of the available domain knowledge.Our first contribution is a hybrid approach, that successfully combines ontology-based reasoning and semi-supervised clustering for semantic classification. An inference engine first reasons over the available domain knowledge in order to obtain semantically labeled instances. These instances are then used to generate constraints that will guide and enhance the clustering. In this way, our method allows the improvement of the labeling of existing classes while discovering new ones.Our second contribution focuses on scaling ontology reasoning over large datasets. We propose a two step approach where topological clustering is first applied in order to summarize the data, in term of a set of prototypes, and reduces by this way the number of future instances to be treated by the reasoner. The representative prototypes are then labeled using the ontology and the labels automatically propagated to all the input data.We applied our methods to the real-word problem of satellite images classification and interpretation and the obtained results are very promising. They showed, on the one hand, that the quality of the classification can be improved by automatic knowledge integration and that the involvement of experts can be reduced. On the other hand, the upstream exploitation of topographic clustering avoids the calculation of the inferences on all the pixels of the image. Ontologie Apprentissage semi-supervisé Clustering par contraintes Raisonnement Images satellites Fossé sémantique Ontology Semi-Supervised learning Constrained clustering Reasoning Satellite images Semantic gap
129	Agrupamento de dados semissupervisionado na geração de regras fuzzy Lopes, Priscilla de Abreu 27 August 2010 (has links) Submitted by Izabel Franco (izabel-franco@ufscar.br) on 2016-09-06T18:25:30Z No. of bitstreams: 1 DissPAL.pdf: 2245333 bytes, checksum: 24abfad37e7d0675d6cef494f4f41d1e (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-12T14:03:53Z (GMT) No. of bitstreams: 1 DissPAL.pdf: 2245333 bytes, checksum: 24abfad37e7d0675d6cef494f4f41d1e (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-12T14:04:01Z (GMT) No. of bitstreams: 1 DissPAL.pdf: 2245333 bytes, checksum: 24abfad37e7d0675d6cef494f4f41d1e (MD5) / Made available in DSpace on 2016-09-12T14:04:09Z (GMT). No. of bitstreams: 1 DissPAL.pdf: 2245333 bytes, checksum: 24abfad37e7d0675d6cef494f4f41d1e (MD5) Previous issue date: 2010-08-27 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Inductive learning is, traditionally, categorized as supervised and unsupervised. In supervised learning, the learning method is given a labeled data set (classes of data are known). Those data sets are adequate for problems of classification and regression. In unsupervised learning, unlabeled data are analyzed in order to identify structures embedded in data sets. Typically, clustering methods do not make use of previous knowledge, such as classes labels, to execute their job. The characteristics of recently acquired data sets, great volume and mixed attribute structures, contribute to research on better solutions for machine learning jobs. The proposed research fits into this context. It is about semi-supervised fuzzy clustering applied to the generation of sets of fuzzy rules. Semi-supervised clustering does its job by embodying some previous knowledge about the data set. The clustering results are, then, useful for labeling the remaining unlabeled data in the set. Following that, come to action the supervised learning algorithms aimed at generating fuzzy rules. This document contains theoretic concepts, that will help in understanding the research proposal, and a discussion about the context wherein is the proposal. Some experiments were set up to show that this may be an interesting solution for machine learning jobs that have encountered difficulties due to lack of available information about data. / O aprendizado indutivo é, tradicionalmente, dividido em supervisionado e não supervisionado. No aprendizado supervisionado é fornecido ao método de aprendizado um conjunto de dados rotulados (dados que tem a classe conhecida). Estes dados são adequados para problemas de classificação e regressão. No aprendizado não supervisionado são analisados dados não rotulados, com o objetivo de identificar estruturas embutidas no conjunto. Tipicamente, métodos de agrupamento não se utilizam de conhecimento prévio, como rótulos de classes, para desempenhar sua tarefa. A característica de conjuntos de dados atuais, grande volume e estruturas de atributos mistas, contribui para a busca de melhores soluções para tarefas de aprendizado de máquina. É neste contexto em que se encaixa esta proposta de pesquisa. Trata-se da aplicação de métodos de agrupamento fuzzy semi-supervisionados na geração de bases de regras fuzzy. Os métodos de agrupamento semi-supervisionados realizam sua tarefa incorporando algum conhecimento prévio a respeito do conjunto de dados. O resultado do agrupamento é, então, utilizado para rotulação do restante do conjunto. Em seguida, entram em ação algoritmos de aprendizado supervisionado que tem como objetivo gerar regras fuzzy. Este documento contém conceitos teóricos para compreensão da proposta de trabalho e uma discussão a respeito do contexto onde se encaixa a proposta. Alguns experimentos foram realizados a fim de mostrar que esta pode ser uma solução interessante para tarefas de aprendizado de máquina que encontram dificuldades devido à falta de informação disponível sobre dados. Aprendizado Semi-Supervisionado Agrupamento Fuzzy de Dados Geração de Regras Fuzzy Semi-Supervised Learning Fuzzy Data Clustering Fuzzy Rules Generation
130	Investigando a combina??o de t?cnicas de aprendizado semissupervisionado e classifica??o hier?rquica multirr?tulo Santos, Araken de Medeiros 25 May 2012 (has links) Made available in DSpace on 2015-03-03T15:48:39Z (GMT). No. of bitstreams: 1 ArakenMS_TESE.pdf: 4060697 bytes, checksum: 5efe25ac134a602cc32c96b66e749ea0 (MD5) Previous issue date: 2012-05-25 / Data classification is a task with high applicability in a lot of areas. Most methods for treating classification problems found in the literature dealing with single-label or traditional problems. In recent years has been identified a series of classification tasks in which the samples can be labeled at more than one class simultaneously (multi-label classification). Additionally, these classes can be hierarchically organized (hierarchical classification and hierarchical multi-label classification). On the other hand, we have also studied a new category of learning, called semi-supervised learning, combining labeled data (supervised learning) and non-labeled data (unsupervised learning) during the training phase, thus reducing the need for a large amount of labeled data when only a small set of labeled samples is available. Thus, since both the techniques of multi-label and hierarchical multi-label classification as semi-supervised learning has shown favorable results with its use, this work is proposed and used to apply semi-supervised learning in hierarchical multi-label classication tasks, so eciently take advantage of the main advantages of the two areas. An experimental analysis of the proposed methods found that the use of semi-supervised learning in hierarchical multi-label methods presented satisfactory results, since the two approaches were statistically similar results / A classifica??o de dados ? uma tarefa com alta aplicabilidade em uma grande quantidade de dom?nios. A maioria dos m?todos para tratar problemas de classifica??o encontrados na literatura, tratam problemas tradicionais ou unirr?tulo. Nos ?ltimos anos vem sendo identificada uma s?rie de tarefas de classifica??o nas quais os exemplos podem ser rotulados a mais de uma classe simultaneamente (classifica??o multirr?tulo). Adicionalmente, tais classes podem estar hierarquicamente organizadas (classifica??o hier?rquica e classifica??o hier?rquica multirr?tulo). Por outro lado, tem-se estudado tamb?m uma nova categoria de aprendizado, chamada de aprendizado semissupervisionado, que combina dados rotulados (aprendizado supervisionado) e dados n?o-rotulados (aprendizado n?o-supervisionado), durante a fase de treinamento, reduzindo, assim, a necessidade de uma grande quantidade de dados rotulados quando somente um pequeno conjunto de exemplos rotulados est? dispon?- vel. Desse modo, uma vez que tanto as t?cnicas de classifica??o multirr?tulo e hier?rquica multirr?tulo quanto o aprendizado semissupervisionado vem apresentando resultados favor ?veis ? sua utiliza??o, neste trabalho ? proposta e utilizada a aplica??o de aprendizado semissupervisionado em tarefas de classifica??o hier?rquica multirr?tulo, de modo a se atender eficientemente as principais necessidades das duas ?reas. Uma an?lise experimental dos m?todos propostos verificou que a utiliza??o do aprendizado semissupervisionado em m?todos de classifica??o hier?rquica multirr?tulo apresentou resultados satisfat?rios, uma vez que as duas abordagens apresentaram resultados estatisticamente semelhantes Classifica??o multirr?tulo Classifica??o hier?rquica multirr?tulo Aprendizado semissupervisionado Multi-label classification Hierarchical multi-label classification Semi-supervised learning

Search results