Return to search

Clustering Genes by Using Different Types of Genomic Data and Self-Organizing Maps

The aim of the project was to identify biologically relevant novel gene clusters by using combined genomic data instead of using only gene expression data in isolation. The clustering algorithm based on self-organizing maps (Kasturi et al., 2005) was extended and implemented in order to use gene location data together with the gene expression and the motif occurrence data for gene clustering. A distance function was defined to be used with gene location data. The algorithm was also extended in order to use vector angle distance for gene expression data. Arabidopsis thaliana is chosen as a data source to evaluate the developed algorithm. A test data set was created by using 100 Arabidopsis genes that have gene expression data with seven different time points during cold stress condition, motif occurrence data which indicates the occurrence frequency of 614 different motifs and the chromosomal location data of each gene. Gene Ontology (http://www.geneontology.org) and TAIR (http://arabidopsis.org) databases were used to find the molecular function and biological process information of each gene in order to examine the biological accuracy of newly discovered clusters after using combined genomic data. The biological evaluation of the results showed that using combined genomic data to cluster genes resulted in new biologically relevant clusters.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:his-2265
Date January 2008
CreatorsÖzdogan, Alper
PublisherHögskolan i Skövde, Institutionen för vård och natur
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds