Return to search

New ant colony optimisation algorithms for hierarchial classification of protein functions

Ant colony optimisation (ACO) is a metaheuristic to solve optimisation problems inspired by the foraging behaviour of ant colonies. It has been successfully applied to several types of optimisation problems, such as scheduling and routing, and more recently for the discovery of classification rules. The classification task in data mining aims at predicting the value of a given goal attribute for an example, based on the values of a set of predictor attributes for that example. Since real-world classification problems are generally described by nominal (categorical or discrete) and continuous (real-valued) attributes, classification algorithms are required to be able to cope with both nominal and continuous attributes. Current ACO classification algorithms have been designed with the limitation of discovering rules using nominal attributes describing the data. Furthermore, they also have the limitation of not coping with more complex types of classification problems e.g., hierarchical multi-label classification problems. This thesis investigates the extension of ACO classification algorithms to cope with the aforementioned limitations. Firstly, a method is proposed to extend the rule construction process of ACO classification algorithms to cope with continuous attributes directly. Four new ACO classification algorithms are presented, as well as a comparison between them and well-known classification algorithms from the literature. Secondly, an ACO classification algorithm for the hierarchical problem of protein function prediction which is a major type of bioinformatics problem addressed in this thesis is presented. Finally, three different approaches to extend ACO classification algorithms to the more complex case of hierarchical multi-label classification are described, elaborating on the ideas of the proposed hierarchical classification ACO algorithm. These algorithms are compare against state-of-the-art decision tree induction algorithms for hierarchical multi-label classification in the context of protein function prediction. The computational results of experiments with a wide range of data sets including challenging protein function prediction data sets with very large number.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:516208
Date January 2010
CreatorsOtero, Fernando E. B.
PublisherUniversity of Kent
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://www.cs.kent.ac.uk/pubs/2010/3057

Page generated in 0.2674 seconds