Global ETD Search

Return to search

Clustering Genes by Using Different Types of Genomic Data and Self-Organizing Maps

The aim of the project was to identify biologically relevant novel gene clusters by using combined genomic data instead of using only gene expression data in isolation. The clustering algorithm based on self-organizing maps (Kasturi et al., 2005) was extended and implemented in order to use gene location data together with the gene expression and the motif occurrence data for gene clustering. A distance function was defined to be used with gene location data. The algorithm was also extended in order to use vector angle distance for gene expression data. Arabidopsis thaliana is chosen as a data source to evaluate the developed algorithm. A test data set was created by using 100 Arabidopsis genes that have gene expression data with seven different time points during cold stress condition, motif occurrence data which indicates the occurrence frequency of 614 different motifs and the chromosomal location data of each gene. Gene Ontology (http://www.geneontology.org) and TAIR (http://arabidopsis.org) databases were used to find the molecular function and biological process information of each gene in order to examine the biological accuracy of newly discovered clusters after using combined genomic data. The biological evaluation of the results showed that using combined genomic data to cluster genes resulted in new biologically relevant clusters.

http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-2265

Identifer	oai:union.ndltd.org:UPSALLA/oai:DiVA.org:his-2265
Date	January 2008
Creators	Özdogan, Alper
Publisher	University of Skövde, School of Life Sciences
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, text

Page generated in 0.0016 seconds

Clustering Genes by Using Different Types of Genomic Data and Self-Organizing Maps

Description

Links & Downloads

Tags

Additional Fields