Global ETD Search

31	On sparse representations and new meta-learning paradigms for representation learning Mehta, Nishant A. 27 August 2014 (has links) Given the "right" representation, learning is easy. This thesis studies representation learning and meta-learning, with a special focus on sparse representations. Meta-learning is fundamental to machine learning, and it translates to learning to learn itself. The presentation unfolds in two parts. In the first part, we establish learning theoretic results for learning sparse representations. The second part introduces new multi-task and meta-learning paradigms for representation learning. On the sparse representations front, our main pursuits are generalization error bounds to support a supervised dictionary learning model for Lasso-style sparse coding. Such predictive sparse coding algorithms have been applied with much success in the literature; even more common have been applications of unsupervised sparse coding followed by supervised linear hypothesis learning. We present two generalization error bounds for predictive sparse coding, handling the overcomplete setting (more original dimensions than learned features) and the infinite-dimensional setting. Our analysis led to a fundamental stability result for the Lasso that shows the stability of the solution vector to design matrix perturbations. We also introduce and analyze new multi-task models for (unsupervised) sparse coding and predictive sparse coding, allowing for one dictionary per task but with sharing between the tasks' dictionaries. The second part introduces new meta-learning paradigms to realize unprecedented types of learning guarantees for meta-learning. Specifically sought are guarantees on a meta-learner's performance on new tasks encountered in an environment of tasks. Nearly all previous work produced bounds on the expected risk, whereas we produce tail bounds on the risk, thereby providing performance guarantees on the risk for a single new task drawn from the environment. The new paradigms include minimax multi-task learning (minimax MTL) and sample variance penalized meta-learning (SVP-ML). Regarding minimax MTL, we provide a high probability learning guarantee on its performance on individual tasks encountered in the future, the first of its kind. We also present two continua of meta-learning formulations, each interpolating between classical multi-task learning and minimax multi-task learning. The idea of SVP-ML is to minimize the task average of the training tasks' empirical risks plus a penalty on their sample variance. Controlling this sample variance can potentially yield a faster rate of decrease for upper bounds on the expected risk of new tasks, while also yielding high probability guarantees on the meta-learner's average performance over a draw of new test tasks. An algorithm is presented for SVP-ML with feature selection representations, as well as a quite natural convex relaxation of the SVP-ML objective. Learning theory Data-dependent complexity Luckiness Dictionary learning Sparse coding Lasso Multi-task learning Meta-learning Learning to learn
32	Uma hiper-heurística híbrida para a otimização de algorítmos MIRANDA, Pericles Barbosa Cunha de 22 August 2016 (has links) Submitted by Rafael Santana (rafael.silvasantana@ufpe.br) on 2017-05-04T18:13:43Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Teste - Péricles Miranda.pdf: 1959669 bytes, checksum: 8b0b1e3f94dd3295bce6153865564a12 (MD5) / Made available in DSpace on 2017-05-04T18:13:43Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Teste - Péricles Miranda.pdf: 1959669 bytes, checksum: 8b0b1e3f94dd3295bce6153865564a12 (MD5) Previous issue date: 2016-08-22 / A escolha de algoritmos ou heurísticas para a resolução de um dado problema é uma tarefa desafiadora devido à variedade de possíveis escolhas de variações/configurações de algoritmos e a falta de auxílio em como escolhê-las ou combiná-las. Por exemplo, o desempenho de algoritmo de otimização depende da escolha dos seus operadores de busca e do ajuste adequado de seus hiper-parâmetros, cada um deles com muitas possibilidades de opções a serem escolhidas. Por este motivo, existe um interesse de pesquisa crescente na automatização da otimização de algoritmos de modo a tornar esta tarefa mais independente da interação humana. Diferentes abordagens têm lidado com a tarefa de ajuste de algoritmos como sendo outro problema de (meta)otimização. Estas abordagens são comumente chamadas de hiper-heurísticas, onde cada solução do espaço de busca, neste caso, é um possível algoritmo avaliado em um dado problema. Inicialmente, hiper-heurísticas foram aplicadas na seleção de valores de hiper-parâmetros em um espaço de busca pré-definido e limitado. No entanto, recentemente, hiper-heurísticas têm sido desenvolvidas para gerar algoritmos a partir de componentes e funções especificados. Hiperheurísticas de geração são consideradas mais flexíveis que as de seleção devido à sua capacidade de criar algoritmos novos e personalizados para um dado problema. As hiper-heurísticas têm sido largamente utilizadas na otimização de meta-heurísticas. No entanto, o processo de busca torna-se bastante custoso, pois a avaliação das soluções trata-se da execução do algoritmo no problema de entrada. Neste trabalho, uma nova hiper-heurística foi desenvolvida para a otimização de algoritmos considerando um dado problema. Esta solução visa prover algoritmos otimizados que sejam adequados para o problema dado e reduzir o custo computacional do processo de geração significativamente quando comparado ao de outras hiper-heurísticas. A hiper-heurística proposta combina uma abordagem de seleção de algoritmos com uma hiper-heurística de geração. A hiperheurística de geração é responsável por criar uma base de conhecimento, que contém algoritmos que foram gerados para um conjunto de problemas. Uma vez que esta base de conhecimento esteja disponível, ela é usada como fonte de algoritmos a serem recomendados pela abordagem de seleção de algoritmos. A ideia é reusar algoritmos previamente construídos pela hiper-heurística de geração em problemas similares. Vale salientar que a criação de hiper-heurísticas visando reduzir o custo de geração de algoritmos sem comprometer a qualidade destes algoritmos não foi estudada na literatura. Além disso, hiper-heurísticas híbridas que combinam de abordagens de seleção de algoritmos e hiper-heurísticas de geração para a otimização de algoritmos, proposta nesta tese, é novidade. Para avaliar o algoritmo proposto, foi considerada como estudo de caso a otimização do algoritmo baseado em enxames (PSO). Nos experimentos realizados, foram considerados 32 problemas de otimização. O algoritmo proposto foi avaliado quanto à sua capacidade de recomendar bons algoritmos para problemas de entrada, se estes algoritmos atingem resultados competitivos frente à literatura. Além disso, o sistema foi avaliado quanto à sua precisão na recomendação, ou seja, se o algoritmo recomendado seria, de fato, o melhor a ser selecionado. Os resultados mostraram que a hiper-heurística proposta é capaz de recomendar algoritmos úteis para os problemas de entrada e de forma eficiente. Adicionalmente, os algoritmos recomendados atingiram resultados competitivos quando comparados com algoritmos estado da arte e a recomendação dos algoritmos atingiu um alto percentual de precisão. / Designing an algorithm or heuristic to solve a given problem is a challenging task due to the variety of possible design choices and the lack of clear guidelines on how to choose and/or combine them. For instance, the performance of an optimization algorithm depends on the designofitssearchoperatorsaswellasanadequatesettingofspeciﬁchyper-parameters,eachof them with many possible options to choose from. Because of that, there is a growing research interest in automating the design of algorithms by exploring mainly optimization and machine learningapproaches,aimingtomakethealgorithmdesignprocessmoreindependentfromhuman interaction. Different approaches have dealt with the task of optimizing algorithms as another (meta)optimization problem. These approaches are commonly called hyper-heuristics, where each solution of the search space is a possible algorithm. Initially, hyper-heuristics were applied for the selection of parameters in a predeﬁned and limited search space. Nonetheless, recently, generation hyper-heuristics have been developed to generate algorithms from a set of speciﬁed components and functions. Generation hyper-heuristics are considered more ﬂexible than the selection ones due to its capacity to create new and customized algorithms for a given problem. Hyper-heuristics have been widely used for the optimization of meta-heuristics. However, the search process becomes expensive because the evaluation of each solution depends on the execution of an algorithm in a problem. In this work, a novel hyper-heuristic was developed to optimize algorithms considering a given problem. The proposed approach aims to provide optimizedalgorithmsfortheinputproblemandreducethecomputationalcostoftheoptimization process signiﬁcantly when compared to other hyper-heuristics. The proposed hyper-heuristics combines an automated algorithm selection method with a generation hyper-heuristic. The generation hyper-heuristic is responsible for the creation of the knowledge base, which contains previously built algorithms for a set of problems. Once the knowledge base is available, it is used as a source of algorithms to be recommended by the automated algorithm selection method. The idea is to reuse the algorithms already built by the generation hyper-heuristic on similar problems. It is worth mentioning that the creation of hyper-heuristics aiming to reduce the cost of the algorithm generation without harming the quality of these algorithms were not studied yet. Besides, hybrid hyper-heuristics which combine an algorithm selection approach with a generation hyper-heuristic for the algorithm optimization, proposed in this thesis, are a novelty. To evaluate the proposed algorithm, it was considered as case study the optimization of the Particle Swarm Optimization algorithm (PSO). In our experiments, we considered 32 optimizationproblems.Theproposedsystemwasevaluatedregardingitscapacitytorecommend adequate algorithms for an input problem, the quality of the recommended algorithms, and, ﬁnally, regarding its accuracy to recommend algorithms. The results showed that the proposed system recommends useful algorithms for the input problem. Besides, the algorithms achieved competitive results when compared to state-of-the-art algorithms, and also, the system presented a high percentage of accuracy in the recommendation.
33	Seleção de algoritmos para a tarefa de agrupamento de dados: uma abordagem via meta-aprendizagem Ferrari, Daniel Gomes 27 March 2014 (has links) Made available in DSpace on 2016-03-15T19:38:50Z (GMT). No. of bitstreams: 1 Daniel Gomes Ferrari.pdf: 2637416 bytes, checksum: 535856887beb7ff04af53570120bc1f9 (MD5) Previous issue date: 2014-03-27 / Natcomp Informatica e Equipamentos Eletronicos LTDA / Data clustering is an important data mining task that aims to segment a database into groups of objects based on their similarity or dissimilarity. Due to the unsupervised nature of clustering, the search for a good quality solution can become a complex process. There is currently a wide range of clustering algorithms and selecting the most suitable one for a given problem can be a slow and costly process. In 1976, Rice formulated the algorithm selection problem (PSA) postulating that a good performance algorithm can be chosen according to the problem s structural characteristics. Meta-learning brings the concept of learning about learning, that is, the meta-knowledge obtained from the algorithms learning process allows it to improve its performance. Meta-learning has a major intersection with data mining in classification problems, where it is used to select algorithms. This thesis proposes an approach to the algorithm selection problem by using meta-learning techniques for clustering. The characterization of 84 problems is performed by a classical approach, based on the problems, and a new proposal based on the similarity among the objects. Ten internal indices are used to provide different performance assessments of seven algorithms, where the combination of the indices determine the ranking for the algorithms. Several analyzes are performed in order to assess the quality of the obtained meta-knowledge in facilitating the mapping between the problem s features and the performance of the algorithms. The results show that the new characterization approach and method to combine the indices provide a good quality algorithm selection mechanism for data clustering problems. / Agrupamento é uma tarefa importante na mineração de dados, tendo como objetivo segmentar uma base de dados em grupos de objetos baseando-se na similaridade ou dissimilaridade entre os mesmos. Devido à natureza não supervisionada da tarefa, a busca por uma solução de boa qualidade pode se tornar um processo complexo. Atualmente, existe na literatura acadêmica uma grande quantidade de algoritmos que podem ser utilizados na resolução deste problema. A seleção do algoritmo mais adequado para um determinado problema pode ser um processo lento e custoso. Em 1976, Rice formulou o Problema de Seleção de Algoritmos (PSA), postulando que um algoritmo de bom desempenho pode ser escolhido de acordo com as características estruturais do problema em que o mesmo será aplicado. A meta-aprendizagem traz consigo o conceito de aprender sobre o aprender, isto é, por meio do meta-conhecimento obtido do processo de aprendizagem dos algoritmos é possível aprimorar o desempenho do processo. Meta-aprendizagem possui grande interseção com mineração de dados no que tange problemas de classificação, sendo utilizada no desenvolvimento de sistemas de seleção de algoritmos. Nesta tese é proposta a abordagem ao PSA por meio de técnicas de meta-aprendizagem para agrupamento de dados. A caracterização de 84 problemas é realizada pela abordagem clássica, baseada nos problemas, e por uma nova proposta baseada na similaridade entre os objetos. São utilizados dez índices internos para promover diferentes avaliações do desempenho de sete algoritmos, onde a combinação desses índices determina o ranking dos algoritmos. São realizadas diversas análises no intuito de avaliar a qualidade do meta-conhecimento obtido em viabilizar o mapeamento entre as características do problema e o desempenho dos algoritmos. Os resultados mostram que a nova caracterização e combinação dos índices proporcionam a seleção, com qualidade, de algoritmos para agrupamento de dados. agrupamento de dados meta-aprendizagem meta-conhecimento seleção de algoritmos data clustering meta-learning meta-knowledge algorithm selection CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
34	Meta-learning / Meta-learning Hovorka, Martin January 2008 (has links) Goal of this work is to make acquaintance and study meta-learningu methods, program algorithm and compare with other machine learning methods.
35	A Systematic Literature Review on Meta Learning for Predictive Maintenance in Industry 4.0 Fisenkci, Ahmet January 2022 (has links) Recent refinements in Industry 4.0 and Machine Learning demonstrate the positive effects of using deep learning models for intelligent maintenance. The primary benefit of Deep Learning (DL) is its capability to extract attributes and make fast, accurate, and automated predictions without supervision. However, DL requires high computational power, significant data preprocessing, and vast amounts of data to make accurate predictions for intelligent maintenance. Given the considerable obstacles, meta-learning has been developed as a novel way to overcome these challenges. As a learning technique, meta-learning aims to quickly acquire knowledge of new tasks using theminimal available data by learning through meta-knowledge. There has been less research in the area of using meta-learning for Predictive Maintenance (PdM) and we considered it necessary to conduct this review to understand the applicability of meta-learning’s capabilities and functions to PdM since the outcomes of this technique seem to be rather promising. The review started with the development of a methodology and four research questions: (1) What is the taxonomy of meta-learning for PdM?, (2) What are the current state-of-the-art methodologies? (3) Which datasets are available for meta-learning in PdM?, and (4) What are the open issues, challenges, and opportunities of meta-learning in PdM?. To answer the first and second questions, a new taxonomy was proposed and meta-learnings role in predictive maintenance was identified from selected 55 papers. To answer the third question, we determined which types of datasets and their characteristics exist for this domain. Finally, the challenges, open issues, and opportunities of meta-learning in predictive maintenance were examined to answer the final question. The results of the research questions provided suggestions for future research topics. Meta-learning Few-Shot Predictive Maintenance Industry 4.0 Fault detection Fault Prognostics Fault Diagnosis Computer Sciences Datavetenskap (datalogi)
36	Machine learning under budget constraints / Apprentissage statistique sous contraintes de budget Contardo, Gabriella 10 July 2017 (has links) Cette thèse propose de s'intéresser au problème de la prédiction en apprentissage statistique sous contrainte de coût, notamment du coût de l'information utilisée par le système de prédiction. Les approches classiques d'apprentissage statistique utilisent généralement le seul aspect de la performance en prédiction pour évaluer la qualité d'un modèle, ignorant le coût potentiel du modèle, par exemple en quantité de données utilisées en apprentissage (nombre d'exemples, nombre d'étiquette, mémoire) ou en inférence (quantité de features -ou caractéristiques-). Nous proposons plus particulièrement dans ce manuscrit plusieurs approches pour l'inférence sous contrainte de coût en terme de caractéristiques. Nous développons trois modèles qui intègrent pendant l'apprentissage une notion du coût de l'information utilisée pour la prédiction, avec pour objectif de contraindre le coût de la prédiction en inférence. Nous présentons un modèle de sélection de features appliqué au démarrage à froid en recommendation, puis deux méthodes adaptatives d'acquisition de caractéristiques, qui permettent un meilleur compromis coût/prédiction, dans un cadre plus général. Nous utilisons des méthodes d'apprentissage de représentations avec des architectures type réseau de neurones récurrents et des algorithmes par descente de gradient pour l'apprentissage. La dernière partie du manuscrit s'intéresse au coût lié aux étiquettes, usuellement dénommé apprentissage actif dans la littérature. Nous présentons nos travaux pour une approche nouvelle de ce problème en utilisant le méta-apprentissage ainsi qu'une première instanciation basée sur des réseaux récurrents bi-directionnels. / This thesis studies the problem of machine learning under budget constraints, in particular we propose to focus on the cost of the information used by the system to predict accurately. Most methods in machine learning usually defines the quality as the performance (e.g accuracy) on the task at hand, but ignores the cost of the model itself: for instance, the number of examples and/or labels needed during learning, the memory used, or the number of features required to predict at test-time. We propose more specifically in this manuscript several methods for cost-sensitive prediction w.r.t. the quantity of features used. We present three models that learn to predict under such constraint, i.e that learn a strategy to gather only the necessary information in order to predict well but with a small cost. The first model is a static approach applied on cold-start recommendation. We then define two adaptive methods that allow for a better trade-off between cost and accuracy, in a more generic setting. We rely on representation learning techniques, along with recurrent neural networks architecture and gradient descent algorithms for learning. In the last part of the thesis, we propose to study the problem of active-learning, where one aims at constraining the amount of labels used to train a model. We present our work for a novel approach of the problem using meta-learning, with an instantiation using bi-directional recurrent neural networks. Apprentissage statistique Acquisition de caractéristiques Apprentissage sous contrainte Apprentissage actif Meta apprentissage Machine learning Feature acquisition Meta-learning 004
37	EXPLORING TEST CASE DESIGN APPROACHES FOR META-LEARNING MODELS Seyedshahi, Farzaneh Alsadat January 2022 (has links) Meta-learning, which allows individuals to learn from a collection of algorithms, is currently one of the most essential and cutting-edge deep-learning issues. Because of their widespread applicability, these algorithms are inextricably linked to essential systems and human lives, and the necessity to test and debug such crucial systems is apparent. We investigated the use of common software tools for the production of test cases to ensure the quality of meta-learning models. The goal of this study is to look at some of the challenges and benefits of test approaches used to develop test cases for meta-learning system models. As a case study, we use a model-agnostic meta-learning method and a combination of comparative studies to extract the obstacles and benefits of each technique. We highlighted the challenges and drawbacks of each testing strategy for the Black-box, White-box, and Gray-box categories by comparing post-train tests to pre-train tests. The results suggest that traditional testing procedures can help analyze meta-learning models, and using these kinds of tests allows one to save testing time while also improving the performance of meta-learner models. Meta-learning Test Engineering and Technology Teknik och teknologier Elektroteknik och elektronik Computer Systems Datorsystem
38	Uso de meta-aprendizado na recomendação de meta-heurísticas para o problema do caixeiro viajante / Using meta-learning on the recommendation of meta-heuristics for the traveling salesman problem Kanda, Jorge Yoshio 07 December 2012 (has links) O problema do caixeiro viajante (PCV) é um problema clássico de otimização que possui diversas variações, aplicações e instâncias. Encontrar a solução ótima para muitas instâncias desse problema é geralmente muito difícil devido o alto custo computacional. Vários métodos de otimização, conhecidos como meta-heurísticas (MHs), são capazes de encontrar boas soluções para o PCV. Muitos algoritmos baseados em diversas MHs têm sido propostos e investigados para diferentes variações do PCV. Como não existe um algoritmo universal que encontre a melhor solução para todas as instâncias de um problema, diferentes MHs podem prover a melhor solução para diferentes instâncias do PCV. Desse modo, a seleção a priori da MH que produza a melhor solução para uma dada instância é uma tarefa difícil. A pesquisa desenvolvida nesta tese investiga o uso de abordagens de meta-aprendizado para selecionar as MHs mais promissoras para novas instâncias de PCV. Essas abordagens induzem meta-modelos preditivos a partir do treinamento das técnicas de aprendizado de máquina em um conjunto de meta-dados. Cada meta-exemplo, em nosso conjunto de meta-dados, representa uma instância de PCV descrita por características (meta-atributos) do PCV e pelo desempenho das MHs (meta-atributo alvo) para essa instância. Os meta-modelos induzidos são usados para indicar os valores do meta-atributo alvo para novas instâncias do PCV. Vários experimentos foram realizados durante a investigação desta pesquisa e resultados importantes foram obtidos / The traveling salesman problem (TSP) is a classical optimization problem that has several variations, applications and instances. To find the optimal solution for many instances of this problem is usually a very hard task due to high computational cost. Various optimization methods, known as metaheuristics (MHs), are capable to generate good solutions for the TSP. Many algorithms based on different MHs have been proposed and investigated for different variations of the TSP. Different MHs can provide the best optimization solution for different TSP instances, since there is no a universal algorithm able to find the best solution for all instances. Thus, a priori selection of the MH that produces the best solution for a given instance is a hard task. The research developed in this thesis investigates the use of meta-learning approaches to select the most promising MHs for new TSP instances. These approaches induce predictive meta-models from the training of machine learning techniques on a set of meta-data. In our meta-data, each meta-example is a TSP instance described by problem characteristics (meta-features) and performance of MHs (target meta-features) for this instance. The induced meta-models are used to indicate the values of the target meta-feature for new TSP instances. During the investigation of this research, several experiments were performed and important results were obtained Algorithm selection problem Aprendizado de máquina Machine learning Meta-aprendizado Meta-heurísticas Meta-heuristics Meta-learning Problema de seleção de algoritmos Problema do caixeiro viajante Traveling salesman problem
39	Uma abordagem para a escolha do melhor método de seleção de instâncias usando meta-aprendizagem MOURA, Shayane de Oliveira 21 August 2015 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-04-05T14:16:18Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Shayane_FINAL.pdf: 7778172 bytes, checksum: bef887b2265bc2ffe53c75c2c275d796 (MD5) / Made available in DSpace on 2016-04-05T14:16:18Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Shayane_FINAL.pdf: 7778172 bytes, checksum: bef887b2265bc2ffe53c75c2c275d796 (MD5) Previous issue date: 2015-08-21 / IF Sertão - PE / Os sistemas de Descoberta de Conhecimentos em Bases de Dados (mais conhecidos como sistemas KDD) e métodos de Aprendizagem de Máquinas preveem situações, agrupam e reconhecem padrões, entre outras tarefas que são demandas de um mundo no qual a maioria dos serviços está sendo oferecido por meio virtual. Apesar dessas aplicações se preocuparem em gerar informações de fácil interpretação, rápidas e confiáveis, as extensas bases de dados utilizadas dificultam o alcance de precisão unida a um baixo custo computacional. Para resolver esse problema, as bases de dados podem ser reduzidas com o objetivo de diminuir o tempo de processamento e facilitar o seu armazenamento, bem como, guardar apenas informações suficientes e relevantes para a extração do conhecimento. Nesse contexto, Métodos de Seleção de Instâncias (MSIs) têm sido propostos para reduzir e filtrar as bases de dados, selecionando ou criando novas instâncias que melhor as descrevam. Todavia, aqui se aplica o Teorema do No Free Lunch, ou seja, a performance dos MSIs varia conforme a base e nenhum dos métodos sempre sobrepõe seu desempenho aos demais. Por isso, esta dissertação propõe uma arquitetura para selecionar o “melhor” MSI para uma dada base de dados (mais adequado emrelação à precisão), chamadaMeta-CISM (Metalearning for Choosing Instance SelectionMethod). Estratégias de meta-aprendizagem são utilizadas para treinar um meta-classificador que aprende sobre o relacionamento entre a taxa de acerto de MSIs e a estrutura das bases. O Meta-CISM utiliza ainda reamostragem e métodos de seleção de atributos para melhorar o desempenho do meta-classificador. A proposta foi avaliada com os MSIs: C-pruner, DROP3, IB3, ICF e ENN-CNN. Os métodos de reamostragem utilizados foram: Bagging e Combination (método proposto neste trabalho). Foram utilizados como métodos de seleção de atributos: Relief-F, CFS, Chi Square Feature Evaluation e Consistency-Based Subset Evaluation. Cinco classificadores contribuíram para rotular as meta-instâncias: C4.5, PART, MLP-BP, SMO e KNN. Uma MLP-BP treinou o meta-classificador. Os experimentos foram realizados com dezesseis bases de dados públicas. O método proposto (Meta-CISM) foi melhor que todos os MSIs estudados, na maioria dos experimentos realizados. Visto que eficientemente seleciona um dos três melhores MSIs em mais de 85% dos casos, a abordagemé adequada para ser automaticamente utilizada na fase de pré-processamento das base de dados. / The systems for Knowledge Discovery in Databases (better known as KDD systems) andMachine Learning methods predict situations, recognize and group (cluster) patterns, among other tasks that are demands of a world in which the most of the services is being offered by virtual ways. Although these applications are concerned in generate fast, reliable and easy to interpret information, extensive databases used for such applications make difficult achieving accuracy with a low computational cost. To solve this problem, the databases can be reduced aiming to decrease the processing time and facilitating its storage, as well as, to save only sufficient and relevant information for the knowledge extraction. In this context, Instances SelectionMethods (ISMs) have been proposed to reduce and filter databases, selecting or creating new instances that best describe them. Nevertheless, No Free Lunch Theorem is applied, that is, the ISMs performance varies according to the base and none of the methods always overcomes their performance over others. Therefore, this work proposes an architecture to select the "best"ISM for a given database (best suited in relation to accuracy), called Meta-CISM (Meta-learning for Choosing Instance SelectionMethod). Meta-learning strategies are used to train a meta-classifier that learns about the relationship between the accuracy rate of ISMs and the bases structures. TheMeta-CISM still uses resampling and feature selection methods to improve the meta-classifier performance. The proposal was evaluated with the ISMs: C-pruner, DROP3, IB3, ICF and ENN-CNN. Resampling methods used were: Bagging and Combination (method proposed in this work). The Feature SelectionMethods used were: Relief-F, CFS, Chi Square Feature Evaluation e Consistency-Based Subset Evaluation. Five classifiers contributed to label the meta-instances: C4.5, PART, MLP-BP, SMO e KNN. The meta-classifier was trained by a MLP-BP. Experiments were carried with sixteen public databases. The proposed method (Meta-CISM) was better than all ISMs studied in the most of the experiments performed. Since that efficiently selects one of the three best ISMs in more than 85% of cases, the approach is suitable to be automatically used in the pre-processing of the databases. Aprendizagem de Máquinas Mineração de Dados Meta-aprendizagem Meta-CISM Machine Learning Data Mining Meta-learning Instance Selection Method (ISM) Meta-CISM
40	Similaridade de algoritmos em cenários sensíveis a custo MELO, Carlos Eduardo Castor de 27 August 2015 (has links) Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2016-09-06T17:26:12Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação Mestrado- Carlos Eduardo Castor de Melo.pdf: 2325318 bytes, checksum: 1a456db1f76d03f35cc83b12a6026b6b (MD5) / Made available in DSpace on 2016-09-06T17:26:12Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação Mestrado- Carlos Eduardo Castor de Melo.pdf: 2325318 bytes, checksum: 1a456db1f76d03f35cc83b12a6026b6b (MD5) Previous issue date: 2015-08-27 / FACEPE / análise da similaridade entre algoritmos de aprendizagem de máquina é um importante aspecto na área de Meta-Aprendizado, onde informações obtidas a partir de processos de aprendizagem conhecidos podem ser utilizadas para guiar a seleção de algoritmos para tratar novos problemas apresentados. Essa similaridade é geralmente calculada através de métricas globais de desempenho, que omitem informações importantes para o melhor entendimento do comportamento dos algoritmos. Também existem abordagens onde é verificado o desempenho individualmente em cada instância do problema. Ambas as abordagens não consideram os custos associados a cada classe do problema, negligenciando informações que podem ser muito importantes em vários contextos de aprendizado. Nesse trabalho são apresentadas métricas para a avaliação do desempenho de algoritmos em cenários sensíveis a custo. Cada cenário é descrito a partir de um método para escolha de limiar para a construção de um classificador a partir de um modelo aprendido. Baseado nos valores de desempenho em cada instância, é proposta uma forma de avaliar a similaridade entre os algoritmos tanto em nível de problema como em nível global. Os experimentos realizados para ilustrar as métricas apresentadas neste trabalho foram realizados em um estudo de Meta-Aprendizado utilizando 19 algoritmos para a classificação das instâncias de 152 problemas. As medidas de similaridades foram utilizadas para a criação de agrupamentos hierárquicos. Os agrupamentos criados mostram como o comportamento entre os algoritmos diversifica de acordo com o cenário de custo a ser tratado. / The analysis of the similarity between machine learning algorithms is an important aspect of Meta-Learning, where knowledge gathered from known learning processes can be used to guide the selection of algorithms to tackle new learning problems presented. This similarity is usually calculated through global performance metrics that omit important information about the algorithm behavior. There are also approaches where the performance is verified individually on each instance of a problem. Both these approaches do not consider the costs associated with each problem class, hence they neglect information that can be very important in different learning contexts. In this study, metrics are presented to evaluate the performance of algorithms in cost sensitive scenarios. Each scenario is described by a threshold choice method, used to build a crisp classifier from a learned model. Based on the performance values for each problem instance, it is proposed a method to measure the similarity between the algorithms in a local level (for each problem) and in a global level (across all problems observed). The experiments used to illustrate the metrics presented in this paper were performed in a Meta-Learning study using 19 algorithms for the classification of the instances of 152 learning problems. The similarity measures were used to create hierarchical clusters. The clusters created show how the behavior of the algorithms diversifies according to the cost scenario to be treated.

Search results