Global ETD Search

Return to search

Proximity based association rules for spatial data mining in genomes

Our knowledge discovery algorithm employs a combination of association rule mining and graph mining to identify frequent spatial proximity relationships in genomic data where the data is viewed as a one-dimensional space. We apply mining techniques and metrics from association rule mining to identify frequently co-occurring features in genomes followed by graph mining to extract sets of co-occurring features. Using a case study of ab initio repeat finding, we have shown that our algorithm, ProxMiner, can be successfully applied to identify weakly conserved patterns among features in genomic data. The application of pairwise spatial relationships increases the sensitivity of our algorithm while the use of a confidence threshold based on false discovery rate reduces the noise in our results. Unlike available defragmentation algorithms, ProxMiner discovers associations among ab initio repeat families to identify larger more complete repeat families. ProxMiner will increase the effectiveness of repeat discovery techniques for newly sequenced genomes where ab initio repeat finders are only able to identify partial repeat families. In this dissertation, we provide two detailed examples of ProxMiner-discovered novel repeat families and one example of a known rice repeat family that has been extended by ProxMiner. These examples encompass some of the different types of repeat families that can be discovered by our algorithm. We have also discovered many other potentially interesting novel repeat families that can be further studied by biologists.

association rule mining

association rule mining

Identifer	oai:union.ndltd.org:MSSTATE/oai:scholarsjunction.msstate.edu:td-4675
Date	08 August 2009
Creators	Saha, Surya
Publisher	Scholars Junction
Source Sets	Mississippi State University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations

Page generated in 0.0155 seconds

Proximity based association rules for spatial data mining in genomes

Description

Links & Downloads

Tags

Additional Fields