Return to search

A High Growth-Rate Emerging Pattern for Data Classification in Microarray Datasets

Data classification is one of important techniques in data mining. This technique has
been applied widely in many applications, e.g., disease diagnosis. Recently, the data
classification technique has been be used for microarray datasets, where a microarray
is a very good tool to study the gene expression levels in Bioinformatics. In the
part of data classification problem for microarray datasets, we consider two biology
datasets which reflect two extreme different classes for the given same sets of tests.
Basically, the classification process contains two phases: (1) the training phase, and
(2) the testing phase. The propose of the training phase is to find the representative
Emerging Patterns (EPs) in each of these two datasets, where an EP is an itemset
which satisfies some conditions of the growth rate from one dataset to another dataset.
Note that the growth rate represents the differences between these two datasets. After
the training phase, we take the collections of EPs in each dataset as a classifier. A
test sample in the testing phase will be predicted to one of the two datasets based on
the result of a similarity function, which takes the growth rate and the support into
consideration. The evaluating criteria of a classifier is the accuracy. Obviously, the
higher the accuracy of a classifier is, the better the performance is. Therefore, several
EP-based classifiers, e.g., the EJEP and the NEP strategies, have been proposed to
achieve this goal. The EJEP strategy considers only those itemsets whose growth
rates are infinite, since it claims that the high growth rates may result in the high
accuracy. However, the EJEP strategy will not keep those useful EPs whose growth
rates are very high but not infinite. On the other hand, the real-world data always
contains noises. The NEP strategy considers noises and provides the higher accuracy
than the EJEP strategy. However, it still may miss some itemsets with high growth
rates, which may result in the low accuracy. Therefore, in this thesis, we propose
a High Growth-rate EP (HGEP) strategy to improve the disadvantages of the NEP
and the EJEP strategies. In addition to considering itemsets whose growth rates
are infinite in the EJEP strategy and noise patterns in the NEP strategy, our HGEP
strategy considers those itemsets which have the growth rate higher than all its proper
subsets when the growth rates are finite. In this way, the itemsets with high growth
rates could result in high similarity, and the high similarity predicts the sets of tests
into the correct class. Therefore, our HGEP can provide high accuracy. In our
performance study, we use several real datasets to evaluate the average accuracy
of them. Moreover, we also do simulation study of increasing noises. From the
experiment results, we show that the average accuracy of our HGEP strategy is
higher than that of the NEP strategy.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0713107-180703
Date13 July 2007
CreatorsYang, Tsung-Bin
ContributorsRen-Hung Hwang, Tei-Wei Kuo, Gen-huey Chen, Ye-In Chang, Chien-I Lee
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0713107-180703
Rightsnot_available, Copyright information available at source archive

Page generated in 0.1027 seconds