Return to search

High Specificity Literature Mining Method Based on Microarray Expression Profile for Discovering Hidden Connections among Diseases, Genes, and Drugs

In recent years, with the microarray technique widely adopted, a large amount of biomedical literatures are published to provide a lot of useful information. However, some relationships among disease, genes and drug are still to be explored, since the authors only focus on part of the significant genes to the disease or the significant genes to the drug but not connect them to obtain new relationships. There are several methods proposed for finding out the hidden relationships, however many of them requires manual involvements. The main objective of this dissertation is to discover the hidden connections between human diseases and genes and the connections between drugs and the same genes. In order achieve this goal, the intermediate nodes (signification genes) must be found first. When a gene has more significant difference in observed group (abnormal patients) than in control group (normal persons), this gene is called significant genes to the disease. These signification genes often play a crucial role in cancer diagnosis and treatment. Via classifying the microarray gene expression data to find these significant genes, doctors can obtain the feasible and appropriate information for treatments that can give to the patients according to their cancer symptoms. A variety of existing classifiers have been proposed for this problem. However, most of them often work inefficiently when attributes grow up over thousands. To further improve the accuracy and the speed of the existing classifiers, a novel microarray attribute reduction scheme (MARS) is proposed for selecting significant genes to the disease.
Experimental results demonstrate that combining the proposed scheme with multiclass support vector machine (MCSVM) obtains better performance than other different gene selection methods with the same MCSVM. In addition, the proposed scheme with MCSVM performs better than the results listed in the existing literature.. Furthermore, 19 of 22 genes selected by the proposed scheme in acute lymphoblastic leukemia and acute myeloid leukemia (AML-ALL) dataset are related to the AML and ALL diseases that have been reported in the literatures. Thus the proposed scheme not only can significantly reduce large amount of attributes (genes) for gene expression classification problem, but also increase the classification accuracy.
MARS finds related gene set according to a threshold determined by using receiver operating characteristic (ROC) curve. However, it requires repeating the experiment many times to determine the best threshold. Hence, we propose a novel disease-oriented feature selection algorithm (DOFA) to improve MARS. DOFA uses the Genetic Algorithm (GA) in the selection method for automatic picking up the related genes and Support Vector Machine (SVM) and K-nearest-neighborhood (KNN) as the classifier. DOFA is tested on picking up related genes for AML-ALL and Colon datasets. For AML-ALL and Colon datasets, it selects 21 genes and 25 genes, respectively. Based on the literatures, it shows that 20 of 21 genes are related to the disease or cancers related for AML-ALL dataset and one of these genes is still uncertain. And 20 of 25 genes are directly related to the disease colon cancer or cancers related and 5 of these genes are still uncertain. Three more experiments are conducted to verify the discriminability of the genes selected by DOFA. Experimental results all indicate that DOFA obtains better performance than other competing methods. Thus DOFA not only can select the genes related to the diseases, but also increase the classification accuracy.
After obtaining the significant gene group, we can further use these genes to obtain the hidden connections. We propose a high specificity literature mining method based on microarray expression profile for discovering hidden connections among disease, drug, and genes. The proposed method can automatically select related genes from the disease or drug microarray expression profiles, and use the disease names or the drug names and gene names or aliases of the selected genes to obtain the related abstract collections. An alias expansion scheme and a weight function are used to eliminate the unrelated literatures. We perform three scenarios to verify the proposed method. Experimental results show that using the proposed method can obtain the hidden connections among diseases, genes and drugs. The (ROC) curve shows that the proposed method can not only find the hidden connections between diseases and drugs but also have high specificity.
Concluding this dissertation, our goal is to discover the hidden connections between the diseases and the drugs. In order to achieve this goal, we first proposed MARS to select the significant genes to the diseases. And then, we proposed DOFA to improve the ability of MARS. We proposed a high specificity literature mining method based on microarray expression profile for discovering the hidden connections among diseases, genes, and drugs. The proposed method combines the power of searching significant genes to the disease of DOFA to further obtain the hidden connections. Experimental results show that the proposed method not only can obtain the hidden connections among diseases, genes, and drugs, but also has high specificity.

Identiferoai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0905111-174611
Date05 September 2011
CreatorsWu, Jain-Shing
ContributorsChang-Biau Yang, Hsueh-Wei Chang, E-Fong Kao, Shin-Mu Tseng, Chuan-Wen Chiang, Chung-Nan Lee, Tzung-Pei Hong
PublisherNSYSU
Source SetsNSYSU Electronic Thesis and Dissertation Archive
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0905111-174611
Rightsuser_define, Copyright information available at source archive

Page generated in 0.0194 seconds