Global ETD Search

Return to search

Multiple classifier combination through ensembles and data generation

This thesis introduces new approaches, namely the DataBoost and DataBoost-IM algorithms, to extend Boosting algorithms' predictive performance.
The DataBoost algorithm is designed to assist Boosting algorithms to avoid over-emphasizing hard examples. In the DataBoost algorithm, new synthetic data with bias information towards hard examples are added to the original training set when training the component classifiers. The DataBoost approach was evaluated against ten data sets, using both decision trees and neural networks as base classifiers. The experiments show promising results, in terms of overall accuracy when compared to a standard benchmarking Boosting algorithm.
The DataBoost-IM algorithm is developed to learn from two-class imbalanced data sets. In the DataBoost-IM approach, the class frequencies and the total weights against different classes within the ensemble's training set are rebalanced by adding new synthetic data. The DataBoost-IM method was evaluated, in terms of the F-measures, G-mean and overall accuracy, against seventeen highly and moderately imbalanced data sets using decision trees as base classifiers. (Abstract shortened by UMI.)

Engineering, System Science.

Identifer	oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/26648
Date	January 2004
Creators	Guo, Hong Yu
Publisher	University of Ottawa (Canada)
Source Sets	Université d’Ottawa
Language	English
Detected Language	English
Type	Thesis
Format	114 p.

Page generated in 0.1475 seconds

Multiple classifier combination through ensembles and data generation

Description

Links & Downloads

Tags

Additional Fields