Global ETD Search

1	Comparative genomics of microsatellite abundance: a critical analysis of methods and definitions Jentzsch, Iris Miriam Vargas January 2009 (has links) This PhD dissertation is focused on short tandemly repeated nucleotide patterns which occur extremely often across DNA sequences, called microsatellites. The main characteristic of microsatellites, and probably the reason why they are so abundant across genomes, is the extremely high frequency of specific replication errors occurring within their sequences, which usually cause addition or deletion of one or more complete tandem repeat units. Due to these errors, frequent fluctuations in the number of repetitive units can be observed among cellular and organismal generations. The molecular mechanisms as well as the consequences of these microsatellite mutations, both, on a generational as well as on an evolutionary scale, have sparked debate and controversy among the scientific community. Furthermore, the bioinformatic approaches used to study microsatellites and the ways microsatellites are referred to in the general literature are often not rigurous, leading to misinterpretations and inconsistencies among studies. As an introduction to this complex topic, in Chapter I I present a review of the knowledge accumulated on microsatellites during the past two decades. A major part of this chapter has been published in the Encyclopedia of Life Sciences in a Chapter about microsatellite evolution (see Publication 1 in Appendix II). The ongoing controversy about the rates and patterns of microsatellite mutation was evident to me since before starting this PhD thesis. However, the subtler problems inherent to the computational analyses of microsatellites within genomes only became apparent when retrieving information on microsatellite distribution and abundance for the design of comparative genomic analyses. There are numerous publications analyzing the microsatellite content of genomes but, in most cases, the results presented can neither be reliably compared nor reproduced, mainly due to the lack of details on the microsatellite search process (particularly the program’s algorithm and the search parameters used) and because the results are expressed in terms that are relative to the search process (i.e. measures based on the absolute number of microsatellites). Therefore, in Chapter II I present a critical review of all available software tools designed to scan DNA sequences for microsatellites. My aim in undertaking this review was to assess the comparability of search results among microsatellite programs, and to identify the programs most suitable for the generation of microsatellite datasets for a thorough and reproducible comparative analysis of microsatellite content among genomic sequences. Using sequence data where the number and types of microsatellites were empirical know I compared the ability of 19 programs to accurately identify and report microsatellites. I then chose the two programs which, based on the algorithm and its parameters as well as the output informativity, offered the information most suitable for biological interpretation, while also reflecting as close as possible the microsatellite content of the test files. From the analysis of microsatellite search results generated by the various programs available, it became apparent that the program’s search parameters, which are specified by the user in order to define the microsatellite characteristics to the program, influence dramatically the resulting datasets. This is especially true for programs suited to allow imperfections within tandem repeats, because imperfect repetitions can not be defined accurately as is the case for perfect ones, and because several different algorithms have been proposed to address this problem. The detection of approximate microsatellites is, however, essential for the study of microsatellite evolution and for comparative analyses based on microsatellites. It is now well accepted that small deviations from perfect tandem repeat structure are common within microsatellites and larger repeats, and a number of different algorithms have been developed to confront the challenge of finding and registering microsatellites with all expectable kinds of imperfection. However, biologists have still to apply these tools to their full potential. In biological analyses single tandem repeat hits are consistently interpreted as isolated and independent repeats. This interpretation also depends on the search strategy used to report the microsatellites in DNA sequences and, therefore, I was particularly interested in the capacity of repeat finding programs to report imperfect microsatellites allowing interpretations that are useful in a biological sense. After analzying a series of tandem repeat finding programs I optimized my microsatellite searches to yield the best possible datasets for assessing and comparing the degree of imperfection of microsatellites among different genomes (Chapter III) During the program comparisons performed in Chapter II, I show that the most critical search parameter influencing microsatellite search results is the minimum length threshold. Biologically speaking, there is no consensus with respect to the minimum length, beyond which a short tandem repeat is expected to become prone to microsatellite-like mutations. Usually, a single absolute value of ~12 nucleotides is assigned irrespective of motif length.. In other cases thresholds are assigned in terms of number of repeat units (i.e. 3 to 5 repeats or more), which are better applied individually for each motif. The variation in these thresholds is considerable and not always justifiable. In addition, any current minimum length measures are likely naïve because it is clear that different microsatellite motifs undergo replication slippage at different length thresholds. Therefore, in Chapter III, I apply two probabilistic models to predict the minimum length at which microsatellites of varying motif types become overrepresented in different genomes based on the individual oligonucleotide frequency data of these genomes. Finally, after a range of optimizations and critical analyses, I performed a preliminary analysis of microsatellite abundance among 24 high quality complete eukaryotic genomes, including also 8 prokaryotic and 5 archaeal genomes for contrast. The availability of the methodologies and the microsatellite datasets generated in this project will allow informed formulation of questions for more specific genome research, either about microsatellites, or about other genomic features microsatellites could influence. These datasets are what I would have needed at the beginning of my PhD to support my experimental design, and are essential for the adequate data interpretation of microsatellite data in the context of the major evolutionary units; chromosomes and genomes. microsatellites microsatellite evolution bioinformatics genomics
2	Evolution and applications of pine microsatellites Karhu, A. (Auli) 27 February 2001 (has links) Abstract The evolution of microsatellites was studied within and between the pine species. Sequences showed that microsatellites do not necessarily mutate in a stepwise fashion and that size homoplasy is common due to flanking sequence and repeat area changes within and between the species. Thus, some assumptions of statistical methods based on changes in repeat numbers may not hold. Sequences from cross-species amplifications revealed evidence of duplications of microsatellite loci in pines. On two independent occasions, the repeat area of the microsatellite had undergone a rapid expansion during the last 10-25 million of years. Microsatellite markers were used together with other molecular markers (allozymes, RFLPs, RAPDs, rDNA RFLPs) and an adaptive trait (date of bud set) to study patterns of genetic variation in Scots pine (Pinus sylvestris) in Finland. All molecular markers showed high level of within population variation, while differentiation among populations was low (FST = 0.02). Of the total variation in bud set, 36.4 % was found among the populations which experience a steep climatic gradient. Thus, the markers applied were poor predictors of population differentiation of the quantitative trait studied The distribution of genetic variation was studied in five natural populations of radiata pine (Pinus radiata), species which has gone through bottlenecks in the past. Null allele frequencies were estimated and used in later analyses. Microsatellites showed high level of variability within populations (He = 0.68-0.77). Allele length distributions and average number of alleles per locus showed some traces of bottlenecks. Instead, comparison of observed genetic diversities and expected diversities suggested post-bottleneck expansion of populations. Genetic differentiation (FST and RST) among populations was over 10 %, reflecting situation in the isolated radiata pine populations. Using microsatellites and a newly developed Bayesian method, individual inbreeding coefficients were estimated in five populations of radiata pine. Most individuals were outbred while some were selfed. Presumably, in ancestral radiata pine populations the recessive deleterious alleles have been eliminated after bottlenecks and the mating system has changed as a consequence. Pinus inbreeding coefficient microsatellite evolution population structure
3	An Investigation of Links Between Simple Sequences and Meiotic Recombination Hotspots Bagshaw, Andrew Tobias Matthew January 2008 (has links) Previous evidence has shown that the simple sequences microsatellites and poly-purine/poly-pyrimidine tracts (PPTs) could be both a cause, and an effect, of meiotic recombination. The causal link between simple sequences and recombination has not been much explored, however, probably because other evidence has cast doubt on its generality, though this evidence has never been conclusive. Several questions have remained unanswered in the literature, and I have addressed aspects of three of them in my thesis. First, what is the scale and magnitude of the association between simple sequences and recombination? I found that microsatellites and PPTs are strongly associated with meiotic double-strand break (DSB) hotspots in yeast, and that PPTs are generally more common in human recombination hotspots, particularly in close proximity to hotspot central regions, in which recombination events are markedly more frequent. I also showed that these associations can't be explained by coincidental mutual associations between simple sequences, recombination and other factors previously shown to correlate with both. A second question not conclusively answered in the literature is whether simple sequences, or their high levels of polymorphism, are an effect of recombination. I used three methods to address this question. Firstly, I investigated the distributions of two-copy tandem repeats and short PPTs in relation to yeast DSB hotspots in order to look for evidence of an involvement of recombination in simple sequence formation. I found no significant associations. Secondly, I compared the fraction of simple sequences containing polymorphic sites between human recombination hotspots and coldspots. The third method I used was generalized linear model analysis, with which I investigated the correlation between simple sequence variation and recombination rate, and the influence on the correlation of additional factors with potential relevance including GC-content and gene density. Both the direct comparison and correlation methods showed a very weak and inconsistent effect of recombination on simple sequence polymorphism in the human genome.Whether simple sequences are an important cause of recombination events is a third question that has received relatively little previous attention, and I have explored one aspect of it. Simple sequences of the types I studied have previously been shown to form non-B-DNA structures, which can be recombinagenic in model systems. Using a previously described sodium bisulphite modification assay, I tested for the presence of these structures in sequences amplified from the central regions of hotspots and cloned into supercoiled plasmids. I found significantly higher sensitivity to sodium bisulphite in humans in than in chimpanzees in three out of six genomic regions in which there is a hotspot in humans but none in chimpanzees. In the DNA2 hotspot, this correlated with a clear difference in numbers of molecules showing long contiguous strings of converted cytosines, which are present in previously described intramolecular quadruplex and triplex structures. Two out of the five other hotspots tested show evidence for secondary structure comparable to a known intramolecular triplex, though with similar patterns in humans and chimpanzees. In conclusion, my results clearly motivate further investigation of a functional link between simple sequences and meiotic recombination, including the putative role of non-B-DNA structures. bioinformatics meiotic recombination hotspots microsatellite evolution poly-purine non-B-DNA structure
4	An Investigation of Links Between Simple Sequences and Meiotic Recombination Hotspots Bagshaw, Andrew Tobias Matthew January 2008 (has links) Previous evidence has shown that the simple sequences microsatellites and poly-purine/poly-pyrimidine tracts (PPTs) could be both a cause, and an effect, of meiotic recombination. The causal link between simple sequences and recombination has not been much explored, however, probably because other evidence has cast doubt on its generality, though this evidence has never been conclusive. Several questions have remained unanswered in the literature, and I have addressed aspects of three of them in my thesis. First, what is the scale and magnitude of the association between simple sequences and recombination? I found that microsatellites and PPTs are strongly associated with meiotic double-strand break (DSB) hotspots in yeast, and that PPTs are generally more common in human recombination hotspots, particularly in close proximity to hotspot central regions, in which recombination events are markedly more frequent. I also showed that these associations can't be explained by coincidental mutual associations between simple sequences, recombination and other factors previously shown to correlate with both. A second question not conclusively answered in the literature is whether simple sequences, or their high levels of polymorphism, are an effect of recombination. I used three methods to address this question. Firstly, I investigated the distributions of two-copy tandem repeats and short PPTs in relation to yeast DSB hotspots in order to look for evidence of an involvement of recombination in simple sequence formation. I found no significant associations. Secondly, I compared the fraction of simple sequences containing polymorphic sites between human recombination hotspots and coldspots. The third method I used was generalized linear model analysis, with which I investigated the correlation between simple sequence variation and recombination rate, and the influence on the correlation of additional factors with potential relevance including GC-content and gene density. Both the direct comparison and correlation methods showed a very weak and inconsistent effect of recombination on simple sequence polymorphism in the human genome.Whether simple sequences are an important cause of recombination events is a third question that has received relatively little previous attention, and I have explored one aspect of it. Simple sequences of the types I studied have previously been shown to form non-B-DNA structures, which can be recombinagenic in model systems. Using a previously described sodium bisulphite modification assay, I tested for the presence of these structures in sequences amplified from the central regions of hotspots and cloned into supercoiled plasmids. I found significantly higher sensitivity to sodium bisulphite in humans in than in chimpanzees in three out of six genomic regions in which there is a hotspot in humans but none in chimpanzees. In the DNA2 hotspot, this correlated with a clear difference in numbers of molecules showing long contiguous strings of converted cytosines, which are present in previously described intramolecular quadruplex and triplex structures. Two out of the five other hotspots tested show evidence for secondary structure comparable to a known intramolecular triplex, though with similar patterns in humans and chimpanzees. In conclusion, my results clearly motivate further investigation of a functional link between simple sequences and meiotic recombination, including the putative role of non-B-DNA structures. bioinformatics meiotic recombination hotspots microsatellite evolution poly-purine non-B-DNA structure
5	Statistical inference in population genetics using microsatellites Csilléry, Katalin January 2009 (has links) Statistical inference from molecular population genetic data is currently a very active area of research for two main reasons. First, in the past two decades an enormous amount of molecular genetic data have been produced and the amount of data is expected to grow even more in the future. Second, drawing inferences about complex population genetics problems, for example understanding the demographic and genetic factors that shaped modern populations, poses a serious statistical challenge. Amongst the many different kinds of genetic data that have appeared in the past two decades, the highly polymorphic microsatellites have played an important role. Microsatellites revolutionized the population genetics of natural populations, and were the initial tool for linkage mapping in humans and other model organisms. Despite their important role, and extensive use, the evolutionary dynamics of microsatellites are still not fully understood, and their statistical methods are often underdeveloped and do not adequately model microsatellite evolution. In this thesis, I address some aspects of this problem by assessing the performance of existing statistical tools, and developing some new ones. My work encompasses a range of statistical methods from simple hypothesis testing to more recent, complex computational statistical tools. This thesis consists of four main topics. First, I review the statistical methods that have been developed for microsatellites in population genetics applications. I review the different models of the microsatellite mutation process, and ask which models are the most supported by data, and how models were incorporated into statistical methods. I also present estimates of mutation parameters for several species based on published data. Second, I evaluate the performance of estimators of genetic relatedness using real data from five vertebrate populations. I demonstrate that the overall performance of marker-based pairwise relatedness estimators mainly depends on the population relatedness composition and may only be improved by the marker data quality within the limits of the population relatedness composition. Third, I investigate the different null hypotheses that may be used to test for independence between loci. Using simulations I show that testing for statistical independence (i.e. zero linkage disequilibrium, LD) is difficult to interpret in most cases, and instead a null hypothesis should be tested, which accounts for the “background LD” due to finite population size. I investigate the utility of a novel approximate testing procedure to circumvent this problem, and illustrate its use on a real data set from red deer. Fourth, I explore the utility of Approximate Bayesian Computation, inference based on summary statistics, to estimate demographic parameters from admixed populations. Assuming a simple demographic model, I show that the choice of summary statistics greatly influences the quality of the estimation, and that different parameters are better estimated with different summary statistics. Most importantly, I show how the estimation of most admixture parameters can be considerably improved via the use of linkage disequilibrium statistics from microsatellite data. 519

1

Page generated in 0.1212 seconds