1 |
Species Tree Likelihood Computation Given SNP Data Using Ancestral ConfigurationsFan, Hang January 2013 (has links)
No description available.
|
2 |
Bayesian Model Selection for High-dimensional High-throughput DataJoshi, Adarsh 2010 May 1900 (has links)
Bayesian methods are often criticized on the grounds of subjectivity. Furthermore, misspecified
priors can have a deleterious effect on Bayesian inference. Noting that model
selection is effectively a test of many hypotheses, Dr. Valen E. Johnson sought to eliminate
the need of prior specification by computing Bayes' factors from frequentist test statistics.
In his pioneering work that was published in the year 2005, Dr. Johnson proposed
using so-called local priors for computing Bayes? factors from test statistics. Dr. Johnson
and Dr. Jianhua Hu used Bayes' factors for model selection in a linear model setting. In
an independent work, Dr. Johnson and another colleage, David Rossell, investigated two
families of non-local priors for testing the regression parameter in a linear model setting.
These non-local priors enable greater separation between the theories of null and alternative
hypotheses.
In this dissertation, I extend model selection based on Bayes' factors and use nonlocal
priors to define Bayes' factors based on test statistics. With these priors, I have been
able to reduce the problem of prior specification to setting to just one scaling parameter.
That scaling parameter can be easily set, for example, on the basis of frequentist operating
characteristics of the corresponding Bayes' factors. Furthermore, the loss of information by basing a Bayes' factors on a test statistic is minimal.
Along with Dr. Johnson and Dr. Hu, I used the Bayes' factors based on the likelihood
ratio statistic to develop a method for clustering gene expression data. This method has
performed well in both simulated examples and real datasets. An outline of that work is
also included in this dissertation. Further, I extend the clustering model to a subclass of
the decomposable graphical model class, which is more appropriate for genotype data sets,
such as single-nucleotide polymorphism (SNP) data. Efficient FORTRAN programming has
enabled me to apply the methodology to hundreds of nodes.
For problems that produce computationally harder probability landscapes, I propose a
modification of the Markov chain Monte Carlo algorithm to extract information regarding
the important network structures in the data. This modified algorithm performs well in
inferring complex network structures. I use this method to develop a prediction model for
disease based on SNP data. My method performs well in cross-validation studies.
|
3 |
Impact of pre-imputation SNP-filtering on genotype imputation resultsRoshyara, Nab Raj, Kirsten, Holger, Horn, Katrin, Ahnert, Peter, Scholz, Markus 10 September 2014 (has links) (PDF)
Background: Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. Results: We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of ompletely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. Conclusion: Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing time.
|
4 |
Tracing the genetic origin of african descendants from South America / Origine génétique des descendants Africains de l'Amérique du SudFortes Lima, César Augusto 17 December 2015 (has links)
Introduction La traite transatlantique, du 15ième au 19ième siècle, a changé radicalement la démographie des Amériques. Des milliers d'esclaves africains ont réussi à échapper aux plantations des colonisateurs européens, et ont formé des colonies indépendantes de peuples libres (ou 'Marron'). Dans notre travail, nous étudions quatre communautés Noir Marron de la Guyane française et du Surinam, ainsi que d'autres populations ayant un héritage africain : Brésil et Colombie, ainsi que des populations d'Afrique de l'Ouest : Bénin, Côte-d'Ivoire et Mali. Afin de définir les différentes histoires démographiques, ces populations ont été caractérisées à l'aide de plusieurs marqueurs génétiques des lignées uniparentales: chromosome Y (17 Y-STR et 96 Y-SNP), ADN mitochondrial (génomes complet), et de données pan-génomiques (4,5 millions de SNP). Résultats Les ADN paternels et maternels ont mis en évidence différents modèles de biais sexuels dans les populations afro-brésiliennes et afro-colombiennes, ce qui suggère des comportements de mariages préférentiels. À l'opposé, les communautés Noir Marron présentent l'origine africaine la plus élevée pour tous les systèmes génétiques analysés (supérieure à 98%). Dans ces communautés, on note l'absence de flux génique avec les groupes non-africains, et également des coefficients de consanguinité très élevés. En accord avec les études linguistiques, les communautés Noir Marron montrent une origine géographique africaine associée aux royaumes historiques de l'Afrique de l'Ouest qui existaient au Bénin durant la traite des esclaves. En accord avec les études historiques, l'origine des afro-colombiens montre des liens génétiques avec la région de la Côte de l'Or, et celle des afro-brésiliens avec la région de l'Afrique centrale. Conclusions Cette étude fournit une importante information génétique sur les afro-américains et nous permet de reconstruire les liens brisés avec leur passé africain. Les communautés Noir Marron montrent une identité africaine très élevée, reliée au Golfe du Bénin. Les populations afro-brésiliennes et afro-colombiennes font apparaitre différentes histoires démographiques en raison de leur passé colonial différent. Confronté avec les études historiques, la génétique permet de mieux appréhender l'identité ethnique africaine sur les deux rives de l'Atlantique. / Background The transatlantic slave trade, from the 15th to the 19th centuries, changed dramatically the demography of the Americas. Thousands of enslaved Africans managed to escape from the plantations of European colonizers, and formed independent African settlements of free people (or 'Marron'). Here, we study four Noir Marron communities from French Guiana and Surinam, as well as other populations with noteworthy African heritage in Brazil and Colombia, and West African populations in Benin, Ivory Coast, and Mali. To uncover different population histories, these populations were specifically characterized using different genetic markers based on 17 Y-STRs, 96 Y-SNPs, whole mtDNA genome, and genome-wide SNP data (4.5 million autosomal SNP). Results Paternally and maternally inherited DNA highlighted different patterns of sex-biased gene flow in both Afro-Brazilian and Afro-Colombian populations that suggest different preferential marriage behaviours. In sharp contrast, the Noir Marron communities presented the highest African ancestry in all genetic systems analysed (above 98%). These communities have apparently a null gene flow with non-African groups, and also present elevated inbreeding coefficients. In good agreement with linguistic studies, the Noir Marron communities showed a biogeographical ancestry associated with historical West African Kingdoms that existed in modern Benin during the slave trade. Afro-Colombians indicated genetic ancestry linked with the Gold Coast region. While Afro-Brazilian genetic ancestry was linked with the West Central African region, also supported by historical research. Conclusions This study provides specific genetic information in African Americans and thereby helps us to reconstruct broken links with their African past. The Noir Marron communities revealed a remarkably high African identity, which is still linked to Bight of Benin region. The Afro-Brazilian and Afro-Colombian populations present different demographic histories because of their different colonial pasts. Within an appropriate historical framework, genetic ancestry can add further understanding of ethnicity in African populations throughout the Atlantic world.
|
5 |
Impact of pre-imputation SNP-filtering on genotype imputation resultsRoshyara, Nab Raj, Kirsten, Holger, Horn, Katrin, Ahnert, Peter, Scholz, Markus January 2014 (has links)
Background: Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. Results: We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of ompletely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. Conclusion: Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing time.
|
Page generated in 0.0539 seconds