Apart from contributing to Computer Science, this research also contributes to Bioinformatics, a subset of the subject discipline Computational Biology. The main focus of this dissertation is the development of a data-analytical and theoretical algorithm to contribute to the analysis of DNA, and in particular, to detect microsatellites. Microsatellites, considered in the context of this dissertation, refer to consecutive patterns contained by genomic sequences. A perfect tandem repeat is defined as a string of nucleotides which is repeated at least twice in a sequence. An approximate tandem repeat is a string of nucleotides repeated consecutively at least twice, with small differences between the instances. The research presented in this dissertation was inspired by molecular biologists who were discovered to be visually scanning genetic sequences in search of short approximate tandem repeats or so called microsatellites. The aim of this dissertation is to present three algorithms that search for short approximate tandem repeats. The algorithms comprise the implementation of finite automata. Thus the hypothesis posed is as follows: Finite automata can detect microsatellites effectively in DNA. "Effectively" includes the ability to fine-tune the detection process so that redundant data is avoided, and relevant data is not missed during search. In order to verify whether the hypothesis holds, three theoretical related algorithms have been proposed based on theorems from finite automaton theory. They are generically referred to as the FireìSat algorithms. These algorithms have been implemented, and the performance of FireìSat2 has been investigated and compared to other software packages. From the results obtained, it is clear that the performance of these algorithms differ in terms of attributes such as speed, memory consumption and extensibility. In respect of speed performance, FireìSat outperformed rival software packages. It will be seen that the FireìSat algorithms have several parameters that can be used to tune their search. It should be emphasized that these parameters have been devised in consultation with the intended user community, in order to enhance the usability of the software. It was found that the parameters of FireìSat can be set to detect more tandem repeats than rival software packages, but also tuned to limit the number of detected tandem repeats. Copyright / Dissertation (MSc)--University of Pretoria, 2010. / Computer Science / unrestricted
Identifer | oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:up/oai:repository.up.ac.za:2263/27335 |
Date | 17 August 2010 |
Creators | De Ridder, Corne |
Contributors | Prof D G Kourie, driddc@unisa.ac.za |
Source Sets | South African National ETD Portal |
Detected Language | English |
Type | Dissertation |
Rights | © 2010, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. |
Page generated in 0.0029 seconds