Return to search

Comparative genomics of microsatellite abundance: a critical analysis of methods and definitions

This PhD dissertation is focused on short tandemly repeated nucleotide patterns which
occur extremely often across DNA sequences, called microsatellites. The main characteristic
of microsatellites, and probably the reason why they are so abundant across genomes, is the
extremely high frequency of specific replication errors occurring within their sequences,
which usually cause addition or deletion of one or more complete tandem repeat units. Due
to these errors, frequent fluctuations in the number of repetitive units can be observed
among cellular and organismal generations. The molecular mechanisms as well as the
consequences of these microsatellite mutations, both, on a generational as well as on an
evolutionary scale, have sparked debate and controversy among the scientific community.
Furthermore, the bioinformatic approaches used to study microsatellites and the ways
microsatellites are referred to in the general literature are often not rigurous, leading to
misinterpretations and inconsistencies among studies. As an introduction to this complex
topic, in Chapter I I present a review of the knowledge accumulated on microsatellites
during the past two decades. A major part of this chapter has been published in the
Encyclopedia of Life Sciences in a Chapter about microsatellite evolution (see Publication 1
in Appendix II).
The ongoing controversy about the rates and patterns of microsatellite mutation was
evident to me since before starting this PhD thesis. However, the subtler problems inherent
to the computational analyses of microsatellites within genomes only became apparent
when retrieving information on microsatellite distribution and abundance for the design of
comparative genomic analyses. There are numerous publications analyzing the
microsatellite content of genomes but, in most cases, the results presented can neither be
reliably compared nor reproduced, mainly due to the lack of details on the microsatellite
search process (particularly the program’s algorithm and the search parameters used) and
because the results are expressed in terms that are relative to the search process (i.e.
measures based on the absolute number of microsatellites). Therefore, in Chapter II I
present a critical review of all available software tools designed to scan DNA sequences for
microsatellites. My aim in undertaking this review was to assess the comparability of search
results among microsatellite programs, and to identify the programs most suitable for the
generation of microsatellite datasets for a thorough and reproducible comparative analysis
of microsatellite content among genomic sequences. Using sequence data where the
number and types of microsatellites were empirical know I compared the ability of 19
programs to accurately identify and report microsatellites. I then chose the two programs
which, based on the algorithm and its parameters as well as the output informativity,
offered the information most suitable for biological interpretation, while also reflecting as
close as possible the microsatellite content of the test files.
From the analysis of microsatellite search results generated by the various programs
available, it became apparent that the program’s search parameters, which are specified by
the user in order to define the microsatellite characteristics to the program, influence
dramatically the resulting datasets. This is especially true for programs suited to allow
imperfections within tandem repeats, because imperfect repetitions can not be defined
accurately as is the case for perfect ones, and because several different algorithms have
been proposed to address this problem. The detection of approximate microsatellites is,
however, essential for the study of microsatellite evolution and for comparative analyses
based on microsatellites. It is now well accepted that small deviations from perfect tandem
repeat structure are common within microsatellites and larger repeats, and a number of
different algorithms have been developed to confront the challenge of finding and
registering microsatellites with all expectable kinds of imperfection. However, biologists
have still to apply these tools to their full potential. In biological analyses single tandem
repeat hits are consistently interpreted as isolated and independent repeats. This
interpretation also depends on the search strategy used to report the microsatellites in DNA
sequences and, therefore, I was particularly interested in the capacity of repeat finding
programs to report imperfect microsatellites allowing interpretations that are useful in a
biological sense. After analzying a series of tandem repeat finding programs I optimized my
microsatellite searches to yield the best possible datasets for assessing and comparing the
degree of imperfection of microsatellites among different genomes (Chapter III)
During the program comparisons performed in Chapter II, I show that the most critical
search parameter influencing microsatellite search results is the minimum length threshold.
Biologically speaking, there is no consensus with respect to the minimum length, beyond
which a short tandem repeat is expected to become prone to microsatellite-like mutations.
Usually, a single absolute value of ~12 nucleotides is assigned irrespective of motif length..
In other cases thresholds are assigned in terms of number of repeat units (i.e. 3 to 5 repeats
or more), which are better applied individually for each motif. The variation in these
thresholds is considerable and not always justifiable. In addition, any current minimum
length measures are likely naïve because it is clear that different microsatellite motifs
undergo replication slippage at different length thresholds. Therefore, in Chapter III, I apply
two probabilistic models to predict the minimum length at which microsatellites of varying
motif types become overrepresented in different genomes based on the individual
oligonucleotide frequency data of these genomes.
Finally, after a range of optimizations and critical analyses, I performed a preliminary
analysis of microsatellite abundance among 24 high quality complete eukaryotic genomes,
including also 8 prokaryotic and 5 archaeal genomes for contrast. The availability of the
methodologies and the microsatellite datasets generated in this project will allow informed
formulation of questions for more specific genome research, either about microsatellites, or
about other genomic features microsatellites could influence. These datasets are what I
would have needed at the beginning of my PhD to support my experimental design, and are
essential for the adequate data interpretation of microsatellite data in the context of the
major evolutionary units; chromosomes and genomes.

Identiferoai:union.ndltd.org:canterbury.ac.nz/oai:ir.canterbury.ac.nz:10092/4282
Date January 2009
CreatorsJentzsch, Iris Miriam Vargas
PublisherUniversity of Canterbury. Biological Sciences
Source SetsUniversity of Canterbury
LanguageEnglish
Detected LanguageEnglish
TypeElectronic thesis or dissertation, Text
RightsCopyright Iris Miriam Vargas Jentzsch, http://library.canterbury.ac.nz/thesis/etheses_copyright.shtml
RelationNZCU

Page generated in 0.0035 seconds