• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 108
  • 42
  • 13
  • 9
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 204
  • 204
  • 204
  • 79
  • 55
  • 54
  • 42
  • 36
  • 29
  • 26
  • 25
  • 25
  • 24
  • 23
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Large-scale semi-supervised learning for natural language processing

Bergsma, Shane A Unknown Date
No description available.
122

Adaptive Graph-Based Algorithms for Conditional Anomaly Detection and Semi-Supervised Learning

Valko, Michal 01 August 2011 (has links) (PDF)
We develop graph-based methods for semi-supervised learning based on label propagation on a data similarity graph. When data is abundant or arrive in a stream, the problems of computation and data storage arise for any graph-based method. We propose a fast approximate online algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby points into a set of local representative points that minimize distortion. Moreover, we regularize the harmonic solution to achieve better stability properties. We also present graph-based methods for detecting conditional anomalies and apply them to the identification of unusual clinical actions in hospitals. Our hypothesis is that patient-management actions that are unusual with respect to the past patients may be due to errors and that it is worthwhile to raise an alert if such a condition is encountered. Conditional anomaly detection extends standard unconditional anomaly framework but also faces new problems known as fringe and isolated points. We devise novel nonparametric graph-based methods to tackle these problems. Our methods rely on graph connectivity analysis and soft harmonic solution. Finally, we conduct an extensive human evaluation study of our conditional anomaly methods by 15 experts in critical care.
123

Enhanced classification approach with semi-supervised learning for reliability-based system design

Patel, Jiten 02 July 2012 (has links)
Traditionally design engineers have used the Factor of Safety method for ensuring that designs do not fail in the field. Access to advanced computational tools and resources have made this process obsolete and new methods to introduce higher levels of reliability in an engineering systems are currently being investigated. However, even though high computational resources are available the computational resources required by reliability analysis procedures leave much to be desired. Furthermore, the regression based surrogate modeling techniques fail when there is discontinuity in the design space, caused by failure mechanisms, when the design is required to perform under severe externalities. Hence, in this research we propose efficient Semi-Supervised Learning based surrogate modeling techniques that will enable accurate estimation of a system's response, even under discontinuity. These methods combine the available set of labeled dataset and unlabeled dataset and provide better models than using labeled data alone. Labeled data is expensive to obtain since the responses have to be evaluated whereas unlabeled data is available in plenty, during reliability estimation, since the PDF information of uncertain variables is assumed to be known. This superior performance is gained by combining the efficiency of Probabilistic Neural Networks (PNN) for classification and Expectation-Maximization (EM) algorithm for treating the unlabeled data as labeled data with hidden labels.
124

Construções de comitês de classificadores multirrótulos no aprendizado semissupervisionado multidescrição

Silva, Wilamis Kleiton Nunes da 18 August 2017 (has links)
Submitted by Lara Oliveira (lara@ufersa.edu.br) on 2017-09-19T21:25:54Z No. of bitstreams: 1 WilamisKNS_DISSERT.pdf: 2959360 bytes, checksum: f4e2b25f85638d49d61b7b5e7415d3fc (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2017-10-27T13:05:12Z (GMT) No. of bitstreams: 1 WilamisKNS_DISSERT.pdf: 2959360 bytes, checksum: f4e2b25f85638d49d61b7b5e7415d3fc (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2017-10-27T13:08:52Z (GMT) No. of bitstreams: 1 WilamisKNS_DISSERT.pdf: 2959360 bytes, checksum: f4e2b25f85638d49d61b7b5e7415d3fc (MD5) / Made available in DSpace on 2017-10-27T13:09:10Z (GMT). No. of bitstreams: 1 WilamisKNS_DISSERT.pdf: 2959360 bytes, checksum: f4e2b25f85638d49d61b7b5e7415d3fc (MD5) Previous issue date: 2017-08-18 / Multi-label problems have become increasingly common, for a label can be attributed to more than one instance, being called multi-label classification problems. Among the di_erent multilabel classification methods we can mention: BR (Binary Relevance), LP (Label Powerset) And RAkEL (RAndom k labELsets). Such methods have been recognized as methods for transforming the Problem, since they consist of turning the multi-label problem into several problems of traditional classification (mono label). However, the adoption of Classificatory committees in multi-label classification problems has still been new-found so far, With a great field to be explored for conducting researches as well. This work aims of doing a study on the construction of multilabel classifiers committees Built through the application of multi- description semisupervised learning techniques, in order to verify if application of this type of learning in the construction of committees results in improvements linked to the results. The committees of classifiers used in the experiments were Bagging, Boosting and Stacking as methods of transformation of the problems used were the BR, LP and Rakel methods and for classification multi-label multi-label semi-supervised multi-description was used Co-Training. At the end of the experimental analyzes, it was verified that the use of the semi-supervised approach presented satisfactory results, since the two approaches presented similar results / São cada vez mais comum problemas multirrótulos onde um rótulo pode ser atribuído a mais de uma instância, sendo chamados de problemas de classificação multirrótulo. Dentre os diferentes métodos de classificação multirrótulo, podemos citar os métodos BR (Binary Relevance), LP (Label Powerset) e RAkEL (RAndom k-labELsets). Tais métodos são ditos métodos de transformação do problema, pois consistem em transformar o problema multirrótulo em vários problemas de classificação tradicional (monorrótulo).A adoção de comitês de classificadores em problemas de classificação multirrótulo ainda é algo muito recente, com muito a ser explorado para a realização de pesquisas. O objetivo deste trabalho é realizar um estudo sobre a construção de comitês de classificadores multirrótulos construídos através da aplicação das técnicas de aprendizado semissupervisionado multidescrição, a fim de verificar se aplicação desse tipo de aprendizado na construção de comitês acarreta melhorias nos resultados. Os comitês de classificadores utilizados nos experimentos foram o Bagging, Boosting e Stacking como métodos de transformação do problemas foram utilizados os métodos BR, LP e Rakel e para a classificação multirrótulo semissupervisionada multidescrição foi utilizado o Co-Training. Ao fim das análises experimentais verificou-se que a utilização da abordagem semissupervisionado apresentou resultados satisfatórios, uma vez que as duas abordagens supervisionada e semissupervisionada utilizadas no trabalho apresentaram resultados semelhantes / 2017-09-19
125

Utilizando aprendizado emissupervisionado multidescrição em problemas de classificação hierárquica multirrótulo

Araújo, Hiury Nogueira de 17 November 2017 (has links)
Submitted by Lara Oliveira (lara@ufersa.edu.br) on 2018-03-14T20:25:58Z No. of bitstreams: 1 HiuryNA_DISSERT.pdf: 3188162 bytes, checksum: d40d42a78787557868ebc6d3cd5af945 (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2018-06-18T16:58:58Z (GMT) No. of bitstreams: 1 HiuryNA_DISSERT.pdf: 3188162 bytes, checksum: d40d42a78787557868ebc6d3cd5af945 (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2018-06-18T16:59:18Z (GMT) No. of bitstreams: 1 HiuryNA_DISSERT.pdf: 3188162 bytes, checksum: d40d42a78787557868ebc6d3cd5af945 (MD5) / Made available in DSpace on 2018-06-18T16:59:31Z (GMT). No. of bitstreams: 1 HiuryNA_DISSERT.pdf: 3188162 bytes, checksum: d40d42a78787557868ebc6d3cd5af945 (MD5) Previous issue date: 2017-11-17 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Data classification is a task applied in various areas of knowledge, therefore, the focus of ongoing research. Data classification can be divided according to the available data, which are labeled or not labeled. One approach has proven very effective when working with data sets containing labeled and unlabeled data, this called semi-supervised learning, your objective is to label the unlabeled data by using the amount of labeled data in the data set, improving their success rate. Such data can be classified with more than one label, known as multi-label classification. Furthermore, these data can be organized hierarchically, thus containing a relation therebetween, this called hierarchical classification. This work proposes the use of multi-view semi-supervised learning, which is one of the semissupervisionado learning aspects, in problems of hierarchical multi-label classification, with the objective of investigating whether semi-supervised learning is an appropriate approach to solve the problem of low dimensionality of data. An experimental analysis of the methods found that supervised learning had a better performance than semi-supervised approaches, however, semi-supervised learning may be a widely used approach, because, there is plenty to be contributed in this area / classificação de dados é uma tarefa aplicada em diversas áreas do conhecimento, sendo assim, foco de constantes pesquisas. A classificação de dados pode ser dividida de acordo com a disposição dos dados, sendo estes rotulados ou não rotulados. Uma abordagem vem se mostrando bastante eficiente ao se trabalhar com conjuntos de dados contendo dados rotulados e não rotulados, esta chamada de aprendizado semissupervisionado, seu objetivo é classificar os dados não rotulados através da quantidade de dados rotulados contidos no conjunto, melhorando sua taxa de acerto. Tais dados podem ser classificados com mais de um rótulo, conhecida como classificação multirrótulo. Além disso, estes dados podem estar organizados de forma hierárquica, contendo assim, uma relação entre os mesmos, esta, por sua vez, denominada classificação hierárquica. Neste trabalho é proposto a utilização do aprendizado semissupervisionado multidescrição, que é uma das vertentes do aprendizado semissupervisionado, em problemas de classificação hierárquica multirrótulo, com o objetivo de investigar se o aprendizado semissupervisionado é uma abordagem apropriada para resolver o problema de baixa dimensionalidade de dados. Uma análise experimental dos métodos verificou que o aprendizado supervisionado obteve melhor desempenho contra as abordagens semissupervisionadas, contudo, o aprendizado semissupervisionado pode vir a ser uma abordagem amplamente utilizada, pois, há bastante o que ser contribuído nesta área / 2018-03-14
126

Apports des ontologies à l'analyse exploratoire des images satellitaires / Contribution of ontologies to the exploratory analysis of satellite images

Chahdi, Hatim 04 July 2017 (has links)
A l'heure actuelle, les images satellites constituent une source d'information incontournable face à de nombreux enjeux environnementaux (déforestation, caractérisation des paysages, aménagement du territoire, etc.). En raison de leur complexité, de leur volume important et des besoins propres à chaque communauté, l'analyse et l'interprétation des images satellites imposent de nouveaux défis aux méthodes de fouille de données. Le parti-pris de cette thèse est d'explorer de nouvelles approches, que nous situons à mi-chemin entre représentation des connaissances et apprentissage statistique, dans le but de faciliter et d'automatiser l'extraction d'informations pertinentes du contenu de ces images. Nous avons, pour cela, proposé deux nouvelles méthodes qui considèrent les images comme des données quantitatives massives dépourvues de labels sémantiques et qui les traitent en se basant sur les connaissances disponibles. Notre première contribution est une approche hybride, qui exploite conjointement le raisonnement à base d'ontologie et le clustering semi-supervisé. Le raisonnement permet l'étiquetage sémantique des pixels à partir de connaissances issues du domaine concerné. Les labels générés guident ensuite la tâche de clustering, qui permet de découvrir de nouvelles classes tout en enrichissant l'étiquetage initial. Notre deuxième contribution procède de manière inverse. Dans un premier temps, l'approche s'appuie sur un clustering topographique pour résumer les données en entrée et réduire de ce fait le nombre de futures instances à traiter par le raisonnement. Celui-ci n'est alors appliqué que sur les prototypes résultant du clustering, l'étiquetage est ensuite propagé automatiquement à l'ensemble des données de départ. Dans ce cas, l'importance est portée sur l'optimisation du temps de raisonnement et à son passage à l'échelle. Nos deux approches ont été testées et évaluées dans le cadre de la classification et de l'interprétation d'images satellites. Les résultats obtenus sont prometteurs et montrent d'une part, que la qualité de la classification peut être améliorée par une prise en compte automatique des connaissances et que l'implication des experts peut être allégée, et d'autre part, que le recours au clustering topographique en amont permet d'éviter le calcul des inférences sur la totalité des pixels de l'image. / Satellite images have become a valuable source of information for Earth observation. They are used to address and analyze multiple environmental issues such as landscapes characterization, urban planning or biodiversity conservation to cite a few.Despite of the large number of existing knowledge extraction techniques, the complexity of satellite images, their large volume, and the specific needs of each community of practice, give rise to new challenges and require the development of highly efficient approaches.In this thesis, we investigate the potential of intelligent combination of knowledge representation systems with statistical learning. Our goal is to develop novel methods which allow automatic analysis of remote sensing images. We elaborate, in this context, two new approaches that consider the images as unlabeled quantitative data and examine the possible use of the available domain knowledge.Our first contribution is a hybrid approach, that successfully combines ontology-based reasoning and semi-supervised clustering for semantic classification. An inference engine first reasons over the available domain knowledge in order to obtain semantically labeled instances. These instances are then used to generate constraints that will guide and enhance the clustering. In this way, our method allows the improvement of the labeling of existing classes while discovering new ones.Our second contribution focuses on scaling ontology reasoning over large datasets. We propose a two step approach where topological clustering is first applied in order to summarize the data, in term of a set of prototypes, and reduces by this way the number of future instances to be treated by the reasoner. The representative prototypes are then labeled using the ontology and the labels automatically propagated to all the input data.We applied our methods to the real-word problem of satellite images classification and interpretation and the obtained results are very promising. They showed, on the one hand, that the quality of the classification can be improved by automatic knowledge integration and that the involvement of experts can be reduced. On the other hand, the upstream exploitation of topographic clustering avoids the calculation of the inferences on all the pixels of the image.
127

Agrupamento de dados semissupervisionado na geração de regras fuzzy

Lopes, Priscilla de Abreu 27 August 2010 (has links)
Submitted by Izabel Franco (izabel-franco@ufscar.br) on 2016-09-06T18:25:30Z No. of bitstreams: 1 DissPAL.pdf: 2245333 bytes, checksum: 24abfad37e7d0675d6cef494f4f41d1e (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-12T14:03:53Z (GMT) No. of bitstreams: 1 DissPAL.pdf: 2245333 bytes, checksum: 24abfad37e7d0675d6cef494f4f41d1e (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-12T14:04:01Z (GMT) No. of bitstreams: 1 DissPAL.pdf: 2245333 bytes, checksum: 24abfad37e7d0675d6cef494f4f41d1e (MD5) / Made available in DSpace on 2016-09-12T14:04:09Z (GMT). No. of bitstreams: 1 DissPAL.pdf: 2245333 bytes, checksum: 24abfad37e7d0675d6cef494f4f41d1e (MD5) Previous issue date: 2010-08-27 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Inductive learning is, traditionally, categorized as supervised and unsupervised. In supervised learning, the learning method is given a labeled data set (classes of data are known). Those data sets are adequate for problems of classification and regression. In unsupervised learning, unlabeled data are analyzed in order to identify structures embedded in data sets. Typically, clustering methods do not make use of previous knowledge, such as classes labels, to execute their job. The characteristics of recently acquired data sets, great volume and mixed attribute structures, contribute to research on better solutions for machine learning jobs. The proposed research fits into this context. It is about semi-supervised fuzzy clustering applied to the generation of sets of fuzzy rules. Semi-supervised clustering does its job by embodying some previous knowledge about the data set. The clustering results are, then, useful for labeling the remaining unlabeled data in the set. Following that, come to action the supervised learning algorithms aimed at generating fuzzy rules. This document contains theoretic concepts, that will help in understanding the research proposal, and a discussion about the context wherein is the proposal. Some experiments were set up to show that this may be an interesting solution for machine learning jobs that have encountered difficulties due to lack of available information about data. / O aprendizado indutivo é, tradicionalmente, dividido em supervisionado e não supervisionado. No aprendizado supervisionado é fornecido ao método de aprendizado um conjunto de dados rotulados (dados que tem a classe conhecida). Estes dados são adequados para problemas de classificação e regressão. No aprendizado não supervisionado são analisados dados não rotulados, com o objetivo de identificar estruturas embutidas no conjunto. Tipicamente, métodos de agrupamento não se utilizam de conhecimento prévio, como rótulos de classes, para desempenhar sua tarefa. A característica de conjuntos de dados atuais, grande volume e estruturas de atributos mistas, contribui para a busca de melhores soluções para tarefas de aprendizado de máquina. É neste contexto em que se encaixa esta proposta de pesquisa. Trata-se da aplicação de métodos de agrupamento fuzzy semi-supervisionados na geração de bases de regras fuzzy. Os métodos de agrupamento semi-supervisionados realizam sua tarefa incorporando algum conhecimento prévio a respeito do conjunto de dados. O resultado do agrupamento é, então, utilizado para rotulação do restante do conjunto. Em seguida, entram em ação algoritmos de aprendizado supervisionado que tem como objetivo gerar regras fuzzy. Este documento contém conceitos teóricos para compreensão da proposta de trabalho e uma discussão a respeito do contexto onde se encaixa a proposta. Alguns experimentos foram realizados a fim de mostrar que esta pode ser uma solução interessante para tarefas de aprendizado de máquina que encontram dificuldades devido à falta de informação disponível sobre dados.
128

Investigando a combina??o de t?cnicas de aprendizado semissupervisionado e classifica??o hier?rquica multirr?tulo

Santos, Araken de Medeiros 25 May 2012 (has links)
Made available in DSpace on 2015-03-03T15:48:39Z (GMT). No. of bitstreams: 1 ArakenMS_TESE.pdf: 4060697 bytes, checksum: 5efe25ac134a602cc32c96b66e749ea0 (MD5) Previous issue date: 2012-05-25 / Data classification is a task with high applicability in a lot of areas. Most methods for treating classification problems found in the literature dealing with single-label or traditional problems. In recent years has been identified a series of classification tasks in which the samples can be labeled at more than one class simultaneously (multi-label classification). Additionally, these classes can be hierarchically organized (hierarchical classification and hierarchical multi-label classification). On the other hand, we have also studied a new category of learning, called semi-supervised learning, combining labeled data (supervised learning) and non-labeled data (unsupervised learning) during the training phase, thus reducing the need for a large amount of labeled data when only a small set of labeled samples is available. Thus, since both the techniques of multi-label and hierarchical multi-label classification as semi-supervised learning has shown favorable results with its use, this work is proposed and used to apply semi-supervised learning in hierarchical multi-label classication tasks, so eciently take advantage of the main advantages of the two areas. An experimental analysis of the proposed methods found that the use of semi-supervised learning in hierarchical multi-label methods presented satisfactory results, since the two approaches were statistically similar results / A classifica??o de dados ? uma tarefa com alta aplicabilidade em uma grande quantidade de dom?nios. A maioria dos m?todos para tratar problemas de classifica??o encontrados na literatura, tratam problemas tradicionais ou unirr?tulo. Nos ?ltimos anos vem sendo identificada uma s?rie de tarefas de classifica??o nas quais os exemplos podem ser rotulados a mais de uma classe simultaneamente (classifica??o multirr?tulo). Adicionalmente, tais classes podem estar hierarquicamente organizadas (classifica??o hier?rquica e classifica??o hier?rquica multirr?tulo). Por outro lado, tem-se estudado tamb?m uma nova categoria de aprendizado, chamada de aprendizado semissupervisionado, que combina dados rotulados (aprendizado supervisionado) e dados n?o-rotulados (aprendizado n?o-supervisionado), durante a fase de treinamento, reduzindo, assim, a necessidade de uma grande quantidade de dados rotulados quando somente um pequeno conjunto de exemplos rotulados est? dispon?- vel. Desse modo, uma vez que tanto as t?cnicas de classifica??o multirr?tulo e hier?rquica multirr?tulo quanto o aprendizado semissupervisionado vem apresentando resultados favor ?veis ? sua utiliza??o, neste trabalho ? proposta e utilizada a aplica??o de aprendizado semissupervisionado em tarefas de classifica??o hier?rquica multirr?tulo, de modo a se atender eficientemente as principais necessidades das duas ?reas. Uma an?lise experimental dos m?todos propostos verificou que a utiliza??o do aprendizado semissupervisionado em m?todos de classifica??o hier?rquica multirr?tulo apresentou resultados satisfat?rios, uma vez que as duas abordagens apresentaram resultados estatisticamente semelhantes
129

Semi-supervised co-selection : instances and features : application to diagnosis of dry port by rail / Co-selection instances-variables en mode semi-supervisé : application au diagnostic de transport ferroviaire.

Makkhongkaew, Raywat 15 December 2016 (has links)
Depuis la prolifération des bases de données partiellement étiquetées, l'apprentissage automatique a connu un développement important dans le mode semi-supervisé. Cette tendance est due à la difficulté de l'étiquetage des données d'une part et au coût induit de cet étiquetage quand il est possible, d'autre part.L'apprentissage semi-supervisé consiste en général à modéliser une fonction statistique à partir de base de données regroupant à la fois des exemples étiquetés et d'autres non-étiquetés. Pour aborder une telle problématique, deux familles d'approches existent : celles basées sur la propagation de la supervision en vue de la classification supervisée et celles basées sur les contraintes en vue du clustering (non-supervisé). Nous nous intéressons ici à la deuxième famille avec une difficulté particulière. Il s'agit d'apprendre à partir de données avec une partie étiquetée relativement très réduite par rapport à la partie non-étiquetée.Dans cette thèse, nous nous intéressons à l'optimisation des bases de données statistiques en vue de l'amélioration des modèles d'apprentissage. Cette optimisation peut être horizontale et/ou verticale. La première définit la sélection d'instances et la deuxième définit la tâche de la sélection de variables.Les deux taches sont habituellement étudiées de manière indépendante avec une série de travaux considérable dans la littérature. Nous proposons ici de les étudier dans un cadre simultané, ce qui définit la thématique de la co-sélection. Pour ce faire, nous proposons deux cadres unifiés considérant à la fois la partie étiquetée des données et leur partie non-étiquetée. Le premier cadre est basé sur un clustering pondéré sous contraintes et le deuxième sur la préservation de similarités entre les données. Les deux approches consistent à qualifier les instances et les variables pour en sélectionner les plus pertinentes de manière simultanée.Enfin, nous présentons une série d'études empiriques sur des données publiques connues de la littérature pour valider les approches proposées et les comparer avec d'autres approches connues dans la littérature. De plus, une validation expérimentale est fournie sur un problème réel, concernant le diagnostic de transport ferroviaire de l'état de la Thaïlande / We are drowning in massive data but starved for knowledge retrieval. It is well known through the dimensionality tradeoff that more data increase informative but pay a price in computational complexity, which has to be made up in some way. When the labeled sample size is too little to bring sufficient information about the target concept, supervised learning fail with this serious challenge. Unsupervised learning can be an alternative in this problem. However, as these algorithms ignore label information, important hints from labeled data are left out and this will generally downgrades the performance of unsupervised learning algorithms. Using both labeled and unlabeled data is expected to better procedure in semi-supervised learning, which is more adapted for large domain applications when labels are hardly and costly to obtain. In addition, when data are large, feature selection and instance selection are two important dual operations for removing irrelevant information. Both of tasks with semisupervised learning are different challenges for machine learning and data mining communities for data dimensionality reduction and knowledge retrieval. In this thesis, we focus on co-selection of instances and features in the context of semi-supervised learning. In this context, co-selection becomes a more challenging problem as the data contains labeled and unlabeled examples sampled from the same population. To do such semi-supervised coselection, we propose two unified frameworks, which efficiently integrate labeled and unlabeled parts into the co-selection process. The first framework is based on weighting constrained clustering and the second one is based on similarity preserving selection. Both approaches evaluate the usefulness of features and instances in order to select the most relevant ones, simultaneously. Finally, we present a variety of empirical studies over high-dimensional data sets, which are well-known in the literature. The results are promising and prove the efficiency and effectiveness of the proposed approaches. In addition, the developed methods are validated on a real world application, over data provided by the State Railway of Thailand (SRT). The purpose is to propose the application models from our methodological contributions to diagnose the performance of rail dry port systems. First, we present the results of some ensemble methods applied on a first data set, which is fully labeled. Second, we show how can our co-selection approaches improve the performance of learning algorithms over partially labeled data provided by SRT
130

Constrained graph-based semi-supervised learning with higher order regularization / Aprendizado semissupervisionado restrito baseado em grafos com regularização de ordem elevada

Celso Andre Rodrigues de Sousa 10 August 2017 (has links)
Graph-based semi-supervised learning (SSL) algorithms have been widely studied in the last few years. Most of these algorithms were designed from unconstrained optimization problems using a Laplacian regularizer term as smoothness functional in an attempt to reflect the intrinsic geometric structure of the datas marginal distribution. Although a number of recent research papers are still focusing on unconstrained methods for graph-based SSL, a recent statistical analysis showed that many of these algorithms may be unstable on transductive regression. Therefore, we focus on providing new constrained methods for graph-based SSL. We begin by analyzing the regularization framework of existing unconstrained methods. Then, we incorporate two normalization constraints into the optimization problem of three of these methods. We show that the proposed optimization problems have closed-form solution. By generalizing one of these constraints to any distribution, we provide generalized methods for constrained graph-based SSL. The proposed methods have a more flexible regularization framework than the corresponding unconstrained methods. More precisely, our methods can deal with any graph Laplacian and use higher order regularization, which is effective on general SSL taks. In order to show the effectiveness of the proposed methods, we provide comprehensive experimental analyses. Specifically, our experiments are subdivided into two parts. In the first part, we evaluate existing graph-based SSL algorithms on time series data to find their weaknesses. In the second part, we evaluate the proposed constrained methods against six state-of-the-art graph-based SSL algorithms on benchmark data sets. Since the widely used best case analysis may hide useful information concerning the SSL algorithms performance with respect to parameter selection, we used recently proposed empirical evaluation models to evaluate our results. Our results show that our methods outperforms the competing methods on most parameter settings and graph construction methods. However, we found a few experimental settings in which our methods showed poor performance. In order to facilitate the reproduction of our results, the source codes, data sets, and experimental results are freely available. / Algoritmos de aprendizado semissupervisionado baseado em grafos foram amplamente estudados nos últimos anos. A maioria desses algoritmos foi projetada a partir de problemas de otimização sem restrições usando um termo regularizador Laplaciano como funcional de suavidade numa tentativa de refletir a estrutura geométrica intrínsica da distribuição marginal dos dados. Apesar de vários artigos científicos recentes continuarem focando em métodos sem restrição para aprendizado semissupervisionado em grafos, uma análise estatística recente mostrou que muitos desses algoritmos podem ser instáveis em regressão transdutiva. Logo, nós focamos em propor novos métodos com restrições para aprendizado semissupervisionado em grafos. Nós começamos analisando o framework de regularização de métodos sem restrições existentes. Então, nós incorporamos duas restrições de normalização no problema de otimização de três desses métodos. Mostramos que os problemas de otimização propostos possuem solução de forma fechada. Ao generalizar uma dessas restrições para qualquer distribuição, provemos métodos generalizados para aprendizado semissupervisionado restrito baseado em grafos. Os métodos propostos possuem um framework de regularização mais flexível que os métodos sem restrições correspondentes. Mais precisamente, nossos métodos podem lidar com qualquer Laplaciano em grafos e usar regularização de ordem elevada, a qual é efetiva em tarefas de aprendizado semissupervisionado em geral. Para mostrar a efetividade dos métodos propostos, nós provemos análises experimentais robustas. Especificamente, nossos experimentos são subdivididos em duas partes. Na primeira parte, avaliamos algoritmos de aprendizado semissupervisionado em grafos existentes em dados de séries temporais para encontrar possíveis fraquezas desses métodos. Na segunda parte, avaliamos os métodos restritos propostos contra seis algoritmos de aprendizado semissupervisionado baseado em grafos do estado da arte em conjuntos de dados benchmark. Como a amplamente usada análise de melhor caso pode esconder informações relevantes sobre o desempenho dos algoritmos de aprendizado semissupervisionado com respeito à seleção de parâmetros, nós usamos modelos de avaliação empírica recentemente propostos para avaliar os nossos resultados. Nossos resultados mostram que os nossos métodos superam os demais métodos na maioria das configurações de parâmetro e métodos de construção de grafos. Entretanto, encontramos algumas configurações experimentais nas quais nossos métodos mostraram baixo desempenho. Para facilitar a reprodução dos nossos resultados, os códigos fonte, conjuntos de dados e resultados experimentais estão disponíveis gratuitamente.

Page generated in 0.4907 seconds