Return to search

Ensemble Learning Algorithms for the Analysis of Bioinformatics Data

Developments in advanced technologies, such as DNA microarrays, have generated
tremendous amounts of data available to researchers in the field of bioinformatics.
These state-of-the-art technologies present not only unprecedented opportunities to
study biological phenomena of interest, but significant challenges in terms of processing
the data. Furthermore, these datasets inherently exhibit a number of challenging
characteristics, such as class imbalance, high dimensionality, small dataset size, noisy
data, and complexity of data in terms of hard to distinguish decision boundaries
between classes within the data.
In recognition of the aforementioned challenges, this dissertation utilizes a variety
of machine-learning and data-mining techniques, such as ensemble classification
algorithms in conjunction with data sampling and feature selection techniques to alleviate
these problems, while improving the classification results of models built on
these datasets. However, in building classification models researchers and practitioners
encounter the challenge that there is not a single classifier that performs relatively
well in all cases. Thus, numerous classification approaches, such as ensemble learning
methods, have been developed to address this problem successfully in a majority of circumstances. Ensemble learning is a promising technique that generates multiple
classification models and then combines their decisions into a single final result.
Ensemble learning often performs better than single-base classifiers in performing
classification tasks.
This dissertation conducts thorough empirical research by implementing a series
of case studies to evaluate how ensemble learning techniques can be utilized to
enhance overall classification performance, as well as improve the generalization ability
of ensemble models. This dissertation investigates ensemble learning techniques
of the boosting, bagging, and random forest algorithms, and proposes a number of
modifications to the existing ensemble techniques in order to improve further the
classification results. This dissertation examines the effectiveness of ensemble learning
techniques on accounting for challenging characteristics of class imbalance and
difficult-to-learn class decision boundaries. Next, it looks into ensemble methods
that are relatively tolerant to class noise, and not only can account for the problem
of class noise, but improves classification performance. This dissertation also examines
the joint effects of data sampling along with ensemble techniques on whether
sampling techniques can further improve classification performance of built ensemble
models. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2016. / FAU Electronic Theses and Dissertations Collection

Identiferoai:union.ndltd.org:fau.edu/oai:fau.digital.flvc.org:fau_33446
ContributorsFazelpour, Alireza (author), Khoshgoftaar, Taghi M. (Thesis advisor), Florida Atlantic University (Degree grantor), College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science
PublisherFlorida Atlantic University
Source SetsFlorida Atlantic University
LanguageEnglish
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation, Text
Format177 p., application/pdf
RightsCopyright © is held by the author, with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder., http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0021 seconds