Return to search

Parallelisation of EST clustering

Master of Science - Science / The field of bioinformatics has been developing steadily, with computational problems related
to biology taking on an increased importance as further advances are sought. The large data sets involved in problems within computational biology have dictated a search for good, fast approximations to computationally complex problems. This research aims to improve a method used to discover and understand genes, which are small subsequences of DNA. A difficulty arises because genes contain parts we know to be functional and other parts we assume are non-functional as there functions have not been
determined. Isolating the functional parts requires the use of natural biological processes
which perform this separation. However, these processes cannot read long sequences, forcing
biologists to break a long sequence into a large number of small sequences, then reading these. This creates the computational difficulty of categorizing the short fragments according to gene membership.

Expressed Sequence Tag Clustering is a technique used to facilitate the identification of expressed genes by grouping together similar fragments with the assumption that they belong to the same gene.

The aim of this research was to investigate the usefulness of distributed memory parallelisation
for the Expressed Sequence Tag Clustering problem. This was investigated empirically,
with a distributed system tested for speed against a sequential one. It was found that distributed memory parallelisation can be very effective in this domain.

The results showed a super-linear speedup for up to 100 processors, with higher numbers not tested, and likely to produce further speedups. The system was able to cluster 500000 ESTs in 641 minutes using 101 processors.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:wits/oai:wiredspace.wits.ac.za:10539/281
Date23 March 2006
CreatorsRanchod, Pravesh
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format325670 bytes, application/pdf, application/pdf

Page generated in 0.0026 seconds