Global ETD Search

Return to search

Minimizing Dataset Size Requirements for Machine Learning

abstract: Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset for classification is then extended to active machine learning methodology where, first a one class classifier will predict the outliers in the data and then the outlier samples are added to a training set for support vector machine classifier for labeling the unlabeled data. The labeling of dataset can be scaled up to avoid manual labeling and building more robust machine learning methodologies. / Dissertation/Thesis / Masters Thesis Engineering 2017

http://hdl.handle.net/2286/R.I.44214

Computer science

Active Learning

Machine Learning

One Class Classification

Identifer	oai:union.ndltd.org:asu.edu/item:44214
Date	January 2017
Contributors	Batra, Salil (Author), Femiani, John (Advisor), Amresh, Ashish (Advisor), Bansal, Ajay (Committee member), Arizona State University (Publisher)
Source Sets	Arizona State University
Language	English
Detected Language	English
Type	Masters Thesis
Format	60 pages
Rights	http://rightsstatements.org/vocab/InC/1.0/, All Rights Reserved

Page generated in 0.0017 seconds

Minimizing Dataset Size Requirements for Machine Learning

Description

Links & Downloads

Tags

Additional Fields