Global ETD Search

1	Active learning in cost-sensitive environments Liu, Alexander Yun-chung 21 June 2010 (has links) Active learning techniques aim to reduce the amount of labeled data required for a supervised learner to achieve a certain level of performance. This can be very useful in domains where unlabeled data is easy to obtain but labelling data is costly. In this dissertation, I introduce methods of creating computationally efficient active learning techniques that handle different misclassification costs, different evaluation metrics, and different label acquisition costs. This is accomplished in part by developing techniques from utility-based data mining typically not studied in conjunction with active learning. I first address supervised learning problems where labeled data may be scarce, especially for one particular class. I revisit claims about resampling, a particularly popular approach to handling imbalanced data, and cost-sensitive learning. The presented research shows that while resampling and cost-sensitive learning can be equivalent in some cases, the two approaches are not identical. This work on resampling and cost-sensitive learning motivates a need for active learners that can handle different misclassification costs. After presenting a cost-sensitive active learning algorithm, I show that this algorithm can be combined with a proposed framework for analyzing evaluation metrics in order to create an active learning approach that can optimize any evaluation metric that can be expressed as a function of terms in a confusion matrix. Finally, I address methods for active learning in terms of different utility costs incurred when labeling different types of points, particularly when label acquisition costs are spatially driven. / text Active learning Labeled data Supervised learners Utility-based data mining Resampling Cost-sensitive learning Label acquisition
2	Active Learning using a Sample Selector Network / Aktiva Inlärning med ett Provväljarnätverk Tan, Run Yan January 2020 (has links) In this work, we set the stage of a limited labelling budget and propose using a sample selector network to learn and select effective training samples, whose labels we would then acquire to train the target model performing the required machine learning task. We make the assumption that the sample features, the state of the target model and the training loss of the target model are informative for training the sample selector network. In addition, we approximate the state of the target model with its intermediate and final network outputs. We investigate if under a limited labelling budget, the sample selector network is capable of learning and selecting training samples that train the target model at least as effectively as using another training subset of the same size that is uniformly randomly sampled from the full training dataset, the latter being the common procedure used to train machine learning models without active learning. We refer to this common procedure as the traditional machine learning uniform random sampling method. We perform experiments on the MNIST and CIFAR-10 datasets; and demonstrate with empirical evidence that under a constrained labelling budget and some other conditions, active learning using a sample selector network enables the target model to learn more effectively. / I detta arbete sätter vi steget i en begränsad märkningsbudget och föreslår att vi använder ett provväljarnätverk för att lära och välja effektiva träningsprover, vars etiketter vi sedan skulle skaffa för att träna målmodellen som utför den nödvändiga maskininlärningsuppgiften. Vi antar att provfunktionerna, tillståndet för målmodellen och utbildningsförlusten för målmodellen är informativa för att träna provväljarnätverket. Dessutom uppskattar vi målmodellens tillstånd med dess mellanliggande och slutliga nätverksutgångar. Vi undersöker om provväljarnätverket enligt en begränsad märkningsbudget kan lära sig och välja utbildningsprover som tränar målmodellen minst lika effektivt som att använda en annan träningsdel av samma storlek som är enhetligt slumpmässigt samplad från hela utbildningsdatasystemet, det senare är det vanliga förfarandet som används för att utbilda maskininlärningsmodeller utan aktivt lärande. Vi hänvisar till denna vanliga procedur som den traditionella maskininlärning enhetliga slumpmässig sampling metod. Vi utför experiment på datasätten MNIST och CIFAR-10; och visa med empiriska bevis att under en begränsad märkningsbudget och vissa andra förhållanden, aktivt lärande med hjälp av ett provvalnätverk gör det möjligt för målmodellen att lära sig mer effektivt. Active Learning Machine Learning Label budget Label acquisition Informative samples Aktivt Lärande Maskininlärning Etikettbudget Etikettförvärv Informativa prover Computer and Information Sciences Data- och informationsvetenskap

Search results

Active learning in cost-sensitive environments

Active Learning using a Sample Selector Network / Aktiva Inlärning med ett Provväljarnätverk