1 |
Seleção Ativa de Exemplos de Treinamento para Meta-AprendizadoSousa, Arthur Fernandes Minduca de 29 July 2013 (has links)
Submitted by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-10T11:54:25Z
No. of bitstreams: 2
Dissertaçao Arthur Minduca.pdf: 1331924 bytes, checksum: c5fbf43c427a68b5d9b2a75d156766cb (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-10T11:54:25Z (GMT). No. of bitstreams: 2
Dissertaçao Arthur Minduca.pdf: 1331924 bytes, checksum: c5fbf43c427a68b5d9b2a75d156766cb (MD5)
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Previous issue date: 2013-07-29 / Várias abordagens têm sido aplicadas à tarefa de seleção de algoritmos. Nesse
contexto, Meta-Aprendizado surge como uma abordagem eficiente para predizer o
desempenho de algoritmos adotando uma estratégia supervisionada. Os exemplos de
treinamento de Meta-Aprendizado (ou meta-exemplos) são construídos a partir de um
repositório de instâncias de problemas (como, por exemplo, um repositório de bases de
dados de classificação). Cada meta-exemplo armazena características descritivas de
uma instância de problema e um rótulo indicando o melhor algoritmo para o problema
(empiricamente identificado entre um conjunto de algoritmos candidatos). Os melhores
algoritmos para novos problemas podem ser preditos se baseando apenas em suas
características descritivas, sem a necessidade de qualquer avaliação empírica
adicional dos algoritmos candidatos. Apesar dos resultados Meta-Aprendizado
requererem a implementação de um número suficiente de instâncias de problemas
para produzir um conjunto rico de meta-exemplos. Abordagens recentes para gerar
conjuntos de dados sintéticos ou manipulado foram adotados com sucesso no contexto
de Meta-Aprendizado. Essas propostas incluem a abordagem de Datasetoids, que é
uma técnica simples de manipulação de dados que permite a geração de novos
conjuntos de dados a partir de bases existentes. Apesar dessas propostas produzirem
dados relevantes para Meta-Aprendizado, eles podem eventualmente produzir
instâncias de problemas redundantes ou até mesmo irrelevantes. Meta-Aprendizado
Ativo surge nesse contexto para selecionar somente as instâncias mais informativas
para a geração de meta-exemplos. Neste trabalho, investigamos o uso de Meta-
Aprendizado Ativo combinado com Datasetoids, focando no uso do algoritmo Random
forest em Meta-Aprendizado. Para selecionar as instâncias de problemas,
implementamos um critério de incerteza baseado em entropia, específico para o
Random forest. Também investigamos o uso de uma técnica de detecção de outliers a
fim de remover a priori os problemas considerados outliers, objetivando melhorar o
desempenho dos métodos de Aprendizagem Ativa. Nossos experimentos revelaram
uma melhora no desempenho do Meta-Aprendizado e uma redução no custo
computacional para a geração de meta-exemplos.
|
2 |
An Empirical Active Learning Study for Temporal Segment NetworksMao, Jilei January 2022 (has links)
Video classification is the task of producing a label that is relevant to the video given its frames. Active learning aims to achieve greater accuracy with fewer labeled training instances through a designed query strategy that can select representative instances from the unlabeled training instances and send them to be labeled by an oracle. It is successfully used in many modern machine learning problems. To figure out how different active learning strategies work on the video classification task, we test several active learning strategies including margin sampling, standard deviation sampling, and center sampling on Temporal Segment Networks (TSN, a classic neural network designed for video classification). We profile these three active learning strategies on systematic control experiments and get the respective models, then we compare these models’ confusion matrix, data distribution, and training log with the baseline models after the first round of query. We observe that the comparison results among models are different under different evaluation criteria. Among all the evaluation criteria we use, the average performance of center sampling is better than that of random sampling, while margin sampling and standard deviation sampling get much worse performance than random sampling and center sampling. The training log and data distribution indicate that margin sampling and standard deviation are prone to select outliers inside the data which are hard to learn but apparently not helpful to improve the model performance. Center sampling will easily outperform random sampling by F1-score. Therefore, the evaluation criteria should be formulated according to the actual application requirements. / Videoklassificering är uppgiften att producera en etikett som är relevant för videon uifrån videons bildsekvens. Aktivt lärande syftar till att uppnå större noggrannhet med färre märkta träningsexempel genom en designad frågestrategi för att välja representativa instanser som ska märkas av ett orakel från de omärkta träningsexemplen, och används framgångsrikt i många moderna maskininlärningsproblem. För att ta reda på hur olika aktiva inlärningsstrategier fungerar på videoklassificeringsuppgifter testar vi flera aktiva strategier inklusive marginalsampling, standardavvikelsessampling samt sampling baserat på Temporal Segment Networks (TSN, som är ett klassiskt neuralt nätverk designat för videoklassificeringsuppgift). Vi testar dessa tre aktiva inlärningsstrategier på systematiska kontrollexperiment, sedan jämför vi dessa modellers förvirringsmatris, datamängdsdistribution, träningslogg med baslinjemodellens efter den första frågeomgången. Vi observerar att endast metoden ”urval av centra” överträffar slumpmässigt urval. Metoden med slumpmässiga provtagningar samt metoden med är benägna att välja extremvärden som är svåra att lära sig men tydligen inte till hjälp för att förbättra modellens prestanda.
|
3 |
Enhancing Deep Active Learning Using Selective Self-Training For Image ClassificationPanagiota Mastoropoulou, Emmeleia January 2019 (has links)
A high quality and large scale training data-set is an important guarantee to teach an ideal classifier for image classification. Manually constructing a training data- set with appropriate labels is an expensive and time consuming task. Active learning techniques have been used to improved the existing models by reducing the number of required annotations. The present work aims to investigate the way to build a model for identifying and utilizing potential informative and representativeness unlabeled samples. To this end, two approaches for deep image classification using active learning are proposed, implemented and evaluated. The two versions of active leaning for deep image classification differ in the input space exploration so as to investigate how classifier performance varies when automatic labelization on the high confidence unlabeled samples is performed. Active learning heuristics based on uncertainty measurements on low confidence predicted samples, a pseudo-labelization technique to boost active learning by reducing the number of human interactions and knowledge transferring form pre-trained models, are proposed and combined into our methodology. The experimental results on two benchmark image classification data-sets verify the effectiveness of the proposed methodology. In addition, a new pool-based active learning query strategy is proposed. Dealing with retraining-based algorithms we define a ”forgetting event” to have occurred when an individual training example transitions the maximum predicted probability class over the course of retraining. We integrated the new approach with the semi- supervised learning method in order to tackle the above challenges and observedgood performance against existing methods. / En högkvalitativ och storskalig träningsdataset är en viktig garanti för att bli en idealisk klassificerare för bildklassificering. Att manuellt konstruera en träningsdatasats med lämpliga etiketter är en dyr och tidskrävande uppgift. Aktiv inlärningstekniker har använts för att förbättra de befintliga modellerna genom att minska antalet nödvändiga annoteringar. Det nuvarande arbetet syftar till att undersöka sättet att bygga en modell för att identifiera och använda potentiella informativa och representativa omärkta prover. För detta ändamål föreslås, genomförs och genomförs två metoder för djup bildklassificering med aktivt lärande utvärderas. De två versionerna av aktivt lärande för djup bildklassificering skiljer sig åt i undersökningen av ingångsutrymmet för att undersöka hur klassificeringsprestanda varierar när automatisk märkning på de omärkta proverna med hög konfidens utförs. Aktiv lärande heuristik baserad på osäkerhetsmätningar på förutsagda prover med låg konfidens, en pseudo- märkningsteknik för att öka aktivt lärande genom att minska antalet mänskliga interaktioner och kunskapsöverföring av förutbildade modeller, föreslås och kombineras i vår metod. Experimentella resultat på två riktmärken för bildklassificering datauppsättningar verifierar effektiviteten hos den föreslagna metodiken. Dessutom föreslås en ny poolbaserad aktiv inlärningsfrågestrategi. När vi använder omskolningsbaserade algoritmer definierar vi en ”glömmer händelse” som skulle ha inträffat när ett individuellt träningsexempel överskrider den maximala förutsagda sannolikhetsklassen under omskolningsprocessen. Vi integrerade den nya metoden med den semi-övervakad inlärning för att hanteraovanstående utmaningar och observeras bra prestanda mot befintliga metoder.
|
4 |
A Bayesian Decision Theoretical Approach to Supervised Learning, Selective Sampling, and Empirical Function OptimizationCarroll, James Lamond 10 March 2010 (has links) (PDF)
Many have used the principles of statistics and Bayesian decision theory to model specific learning problems. It is less common to see models of the processes of learning in general. One exception is the model of the supervised learning process known as the "Extended Bayesian Formalism" or EBF. This model is descriptive, in that it can describe and compare learning algorithms. Thus the EBF is capable of modeling both effective and ineffective learning algorithms. We extend the EBF to model un-supervised learning, semi-supervised learning, supervised learning, and empirical function optimization. We also generalize the utility model of the EBF to deal with non-deterministic outcomes, and with utility functions other than 0-1 loss. Finally, we modify the EBF to create a "prescriptive" learning model, meaning that, instead of describing existing algorithms, our model defines how learning should optimally take place. We call the resulting model the Unified Bayesian Decision Theoretical Model, or the UBDTM. WE show that this model can serve as a cohesive theory and framework in which a broad range of questions can be analyzed and studied. Such a broadly applicable unified theoretical framework is one of the major missing ingredients of machine learning theory. Using the UBDTM, we concentrate on supervised learning and empirical function optimization. We then use the UBDTM to reanalyze many important theoretical issues in Machine Learning, including No-Free-Lunch, utility implications, and active learning. We also point forward to future directions for using the UBDTM to model learnability, sample complexity, and ensembles. We also provide practical applications of the UBDTM by using the model to train a Bayesian variation to the CMAC supervised learner in closed form, to perform a practical empirical function optimization task, and as part of the guiding principles behind an ongoing project to create an electronic and print corpus of tagged ancient Syriac texts using active learning.
|
Page generated in 0.0946 seconds