Return to search

Multiple classifier combination through ensembles and data generation

This thesis introduces new approaches, namely the DataBoost and DataBoost-IM algorithms, to extend Boosting algorithms' predictive performance.
The DataBoost algorithm is designed to assist Boosting algorithms to avoid over-emphasizing hard examples. In the DataBoost algorithm, new synthetic data with bias information towards hard examples are added to the original training set when training the component classifiers. The DataBoost approach was evaluated against ten data sets, using both decision trees and neural networks as base classifiers. The experiments show promising results, in terms of overall accuracy when compared to a standard benchmarking Boosting algorithm.
The DataBoost-IM algorithm is developed to learn from two-class imbalanced data sets. In the DataBoost-IM approach, the class frequencies and the total weights against different classes within the ensemble's training set are rebalanced by adding new synthetic data. The DataBoost-IM method was evaluated, in terms of the F-measures, G-mean and overall accuracy, against seventeen highly and moderately imbalanced data sets using decision trees as base classifiers. (Abstract shortened by UMI.)

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/26648
Date January 2004
CreatorsGuo, Hong Yu
PublisherUniversity of Ottawa (Canada)
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format114 p.

Page generated in 0.1475 seconds