101 |
Méthodes de factorisation matricielle pour la génomique des populations et les tests d'association / Matrix factorization methods for population genomics and association mappingCaye, Kévin 11 December 2017 (has links)
Nous présentons des méthodes statistiques reposant sur des problèmes de factorisation matricielle. Une première méthode permet l'inférence rapide de la structure de populations à partir de données génétiques en incluant l'information de proximité géographique. Une deuxième méthode permet de corriger les études d'association pour les facteurs de confusion. Nous présentons dans ce manuscrit les modèles, ainsi que les aspects théoriques des algorithmes d'inférence. De plus, à l'aide de simulations numériques, nous comparons les performances de nos méthodes à celles des méthodes existantes. Enfin, nous utilisons nos méthodes sur des données biologiques réelles. Nos méthodes ont été implémentées et distribuées sous la forme de packages R : tess3r et lfmm. / We present statistical methods based on matrix factorization problems. A first method allows efficient inference of population structure from genetic data and including geographic proximity information. A second method corrects the association studies for confounding factors. We present in this manuscript the models, as well as the theoretical aspects of the inference algorithms. Moreover, using numerical simulations, we compare the performance of our methods with those of existing methods. Finally, we use our methods on real biological data. Our methods have been implemented and distributed as R packages: tess3r and lfmm.
|
102 |
Identify SNPs associated with type 2 diabetes using self-organizing maps and random forests.January 2009 (has links)
Zhang, Ji. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 100-104). / Abstracts in English and Chinese. / Chapter CHAPTER 1. --- Introduction / Chapter 1.1. --- Introduction of genetic association studies --- p.1 / Chapter 1.1.1. --- Application of genetic association studies in complex diseases --- p.3 / Chapter 1.1.2. --- Application of genetic association studies in type-2 diabetes --- p.4 / Chapter 1.2. --- Study design of genetic association studies --- p.7 / Chapter 1.3. --- Overview of statistical approaches in association studies --- p.10 / Chapter 1.3.1. --- Preliminary analyses --- p.10 / Chapter 1.3.1.1. --- HardýؤWeinberg equilibrium testing --- p.10 / Chapter 1.3.1.2. --- Inference of missing genotype data --- p.12 / Chapter 1.3.1.3. --- SNP tagging --- p.14 / Chapter 1.3.2. --- Single-point and multipoint tests for association --- p.15 / Chapter 1.4. --- Other relevant methods employed in this study --- p.20 / Chapter 1.4.1. --- Self-Organizing Maps (SOM) with further classification by K-means clustering --- p.20 / Chapter 1.4.2. --- Random forests --- p.27 / Chapter 1.5. --- Main objectives of this study --- p.31 / Chapter CHAPTER 2. --- Materials and methods / Chapter 2.1. --- Study cohort --- p.32 / Chapter 2.2. --- Study design --- p.34 / Chapter 2.2.1. --- Construction of sample sets for each stage using SOM and K-means clustering --- p.34 / Chapter 2.2.2. --- Stage 1 analysis by random forests --- p.37 / Chapter 2.2.3. --- Stage 2 analysis by chi-square test --- p.42 / Chapter 2.2.4. --- Two-stage genetic association study by chi-square test --- p.43 / Chapter 2.2.5. --- Comparison of results: random forests plus chi-square test versus chi-square test --- p.43 / Chapter 2.2.6. --- Validation of results in the whole sample set by allelic chi-square test --- p.44 / Chapter 2.2.7. --- Extensions of the study: cumulative effects of candidate SNPs on risk of type-2 diabetes --- p.45 / Chapter CHAPTER 3. --- Results / Chapter 3.1. --- Effects of sample classification by SOM and K-means clustering --- p.50 / Chapter 3.2. --- Genetic associations in stage 1 --- p.64 / Chapter 3.3. --- Genetic associations in stage 2 and validation of results --- p.69 / Chapter 3.4. --- Cumulative effects of candidate SNPs on risk of type-2 diabetes --- p.76 / Chapter CHAPTER 4. --- Discussion / Chapter 4.1. --- Overall strategy --- p.81 / Chapter 4.1.1. --- Effects of SOM and K-means clustering --- p.82 / Chapter 4.1.2. --- Effects of random forests in the first stage of association study --- p.83 / Chapter 4.1.3. --- Comparison of our method with traditional chi-square test --- p.84 / Chapter 4.1.4. --- Joint effects of candidate SNPs selected by the hybrid method --- p.86 / Chapter 4.2. --- Biological significance of candidate SNPs --- p.88 / Chapter 4.2.1. --- Gene CDKAL1 --- p.89 / Chapter 4.2.2. --- Gene KIAA1305 --- p.90 / Chapter 4.2.3. --- Gene DACH1 --- p.91 / Chapter 4.2.4. --- Gene FUCA1 --- p.92 / Chapter 4.2.5. --- Gene KCNQ1 --- p.93 / Chapter 4.2.6. --- Gene SLC27A1 --- p.94 / Chapter 4.3. --- Limits and improvement of this study --- p.96 / Chapter 4.4. --- Conclusion --- p.99 / REFERENCES --- p.100
|
103 |
Caractérisation des déterminants génétiques et moléculaires liés à la résistance au dépérissement bactérien chez l'abricotier et analyse des risques associés / Caracterization of genetic and molecular determinants of resistance to bacterial canker in apricot and analysis of the associated risksOmrani, Mariem 06 November 2018 (has links)
Parmi les Prunus, genre botanique d’intérêt économique important, l’abricotier (Prunusarmeniaca L.) est une culture emblématique du Bassin Méditerranéen. Il y est soumis à des contraintes biotiques importantes, parmi lesquelles le dépérissement bactérien, causé par Pseudomonas syringae (Psy), peut mener à des phénomènes de mortalité en verger au niveau des régions à hivers froids et humides. La mise en évidence de différences variétales en verger offre potentiellement des perspectives de contrôle de la maladie à travers le levier génétique. Aussi, ce travail de thèse avait pour principaux objectifs (i) d’identifier chez la plante des régions génomiques liées à la résistance partielle à la bactérie et (ii) d’étudier un plan factoriel d’interaction entre les diversités de la plante et de la bactérie (GxG) afin d’apprécier la généricité de la résistance et sa durabilité. Afin de répondre au premier objectif, deux approches complémentaires ont été mobilisées : une cartographie de QRLs (Quantitative Resistance Loci) sur quatre populations biparentales dont trois sont issues du croisement avec un géniteur commun ainsi qu’une analyse d’association sur une core-collection. Les données phénotypiques mobilisées correspondent à des symptômes issus d’inoculations contrôlées ainsi que des notes de mortalité obtenues suite à infection naturelle en verger. Ces deux approches (analyse de liaison et d’association) ont permis de mettre en évidence 22 QRLs de résistance, parmi lesquels seuls 2 QRLs sur les chromosomes 6 et 7 colocalisent entre les deux approches. Deux régions majeures détectées en étude d’association sur les chromosomes 5 et 6 se sont révélées être en déséquilibre de liaison et contrôlent près de 26 et 43% de la variation des symptômes. Deux mécanismes complémentaires reposant sur le blocage de l’infection de Psy et sur la limitation de la progression locale de la bactérie dans les tissus ont été mis en évidence à travers la détection de QRLs sur les chromosomes 3, 6, 8 d’une part et 1,4et 6 d’autre part. Le second objectif a été abordé grâce à une étude d’un plan factoriel d’interaction entre 20 accessions d’abricotier et 9 souches de Psy, échantillonnées d’après la connaissance de l’épidémiologie de la maladie en verger. L’analyse statistique de ce dispositif mis en œuvre à la fois en verger et en laboratoire a démontré la prédominance de l’effet du facteur souche dans la variabilité des symptômes étudiés et la très faible importance du facteur d’interaction GxG, indiquant une potentielle généricité des facteurs de résistance et des perspectives favorables à leur durabilité en verger.Les résultats issus de cette thèse contribuent à offrir une meilleure compréhension des mécanismes de résistance partielle au dépérissement bactérien de l’abricotier et fournissent des marqueurs et haplotypes, potentiellement mobilisables dans le cadre de programmes d’innovation variétale. / Within the genus Prunus, that contains highly valuable species, apricot (Prunusarmeniaca L.) is an emblematic Mediterranean crop. But apricot cultivation is constrainedby many biotic stresses, among which bacterial canker caused by Pseudomonas syringae(Psy) is particularly severe and can lead to the death of the trees in regions with humidand cold winters. Differences of susceptibilities have been observed between cultivars inorchards and create opportunities for disease management through genetic improvement.This thesis aimed to (i) identify genetic determinants linked to partial resistance to thebacterium and to (ii) study a factorial interaction design between both diversities of theplant and the pathogen (GxG interaction) in order to assess resistance genericity anddurability. With regard to the first objective, two complementary approaches were used :QRL (Quantitative Resistance Loci) mapping over four biparental progenies, amongwhich three were obtained with a cross involving a common genitor, and a genome-wideassociation study on a core-collection. The phenotypic data mobilized in this work rely onsymptoms issued from controlled inoculations and on mortality notations followingnatural infections in the orchard. These approaches led to the detection of 22 QRLs amongwhich only 2 QRLs, located on chromosomes 6 and 7, co-localized between the twomethods. Two main regions detected in the association study, over the chromosomes 5and 6, appeared to be in linkage disequilibrium and controlled 26 and 43% of the variationof the symptoms. A complementarity between two mechanisms, one that involves blockingthe infection of Psy and the other that limits bacterial mobility in the tissues has beenrevealed through the detection of QRLs over chromosomes 3, 6, 8 for one mechanism and1,4, 6 for the other, respectively. The second objective was fulfilled with a study of afactorial interaction design between 20 apricot accessions and 9 Psy strains, which weresampled according to the previous knowledge of the disease epidemiology in the orchard.Statistical analyses of phenotypic data obtained both from the orchard and a laboratorytest showed a clear predominance of the strain effect on symptom variability and a weakimportance of the GxG interaction factor. This last result highlighted a potentialgenericity of the resistance factors and favorable perspectives of durability in the orchard.The results issued from this thesis contribute to a better understanding of the mechanismsunderlying partial resistance of apricot to bacterial canker. Moreover, it provide markersand haplotypes of interest which could be mobilized in breeding programs.
|
104 |
Signals and Noise in Complex Biological SystemsRung, Johan January 2007 (has links)
<p>In every living cell, millions of different types of molecules constantly interact and react chemically in a complex system that can adapt to fluctuating environments and extreme conditions, living to survive and reproduce itself. The information required to produce these components is stored in the genome, which is copied in each cell division and transferred and mixed with another genome from parent to child. The regulatory mechanisms that control biological systems, for instance the regulation of expression levels for each gene, has evolved so that global robustness and ability to survive under harsh conditions is a strength, at the same time as biological tasks on a detailed molecular level must be carried out with good precision and without failures. This has resulted in systems that can be described as a hierarchy of levels of complexity: from the lowest level, where molecular mechanisms control other components at the same level, to pathways of coordinated interactions between components, formed to carry out particular biological tasks, and up to large-scale systems consisting of all components, connected in a network with a topology that makes the system robust and flexible. This thesis reports on work that model and analyze complex biological systems, and the signals and noise that regulate them, at all different levels of complexity. Also, it shows how signals are transduced vertically from one level to another, as when a single mutation can cause errors in low level mechanisms, disrupting pathways and create systemwide imbalances, such as in type 2 diabetes. The advancement of our knowledge of biological systems requires both that we go deeper and towards more detail, of single molecules in single cells, as well as taking a step back to understand the organisation and dynamics in the large networks of all components, and unite the different levels of complexity.</p>
|
105 |
Etudes génétiques et immunomodulatoires de la ghréline sur les traits de production et de conformation en races bovines ainsi que sur la croissance chez le ratColinet, Frédéric 23 October 2008 (has links)
En production animale, notamment dans les filières bovines, il est dun intérêt économique daugmenter la quantité dhormone de croissance dans la circulation sanguine. La ghréline est un peptide principalement produit au niveau de la paroi stomacale. Ce ligand endogène au GHSR stimule la sécrétion hypophysaire de lhormone de croissance. Peptide orexigène, la ghréline est impliquée dans les mécanismes relatifs au maintien de lhoméostasie énergétique. Dans loptique daméliorer les performances animales, deux approches de la ghréline ont été effectuées. La première approche consiste en létude des gènes bovins codant pour la ghréline (bGHRL) et son récepteur (bGHSR). Ces deux gènes ont été respectivement localisés sur BTA 22 et BTA 1. Quatorze polymorphismes ont été détectés sur ces deux gènes et trois dentre eux affectent la structure primaire du GHSR bovin. Des associations, à différents niveaux de signification, entre certains de ces 14 sites polymorphiques et des traits de production et de conformation ont été mis en évidence au sein dun groupe de 127 taureaux Holstein sur base de leurs descendances directes présentes en Région Wallonne. La seconde approche aborde les effets dune immunisation passive contre la ghréline sur des rats mâles en croissance en comparaison à celles contre la leptine et la cholécystokinine. Lors dune alimentation équilibrée, le traitement envers la ghréline sur ces rats na pas influencé la croissance et lingestion par rapport aux animaux témoins. Des effets ont été observés entre les différentes immunomodulations au niveau des paramètres de croissance, dingestion et endocrinologiques. Les présents résultats invitent à de nouvelles investigations des gènes bGHRL et bGHSR sur des données relatives à dautres populations/races bovines et de limmunomodulation de la ghréline lors de conditions dexpérimentation différentes (alimentation déséquilibrée, stade physiologique, espèce, etc.). Ces investigations pourraient être valorisées en sélection et production animale mais également en médecine tant humaine que vétérinaire.
|
106 |
Novel Statistical Methods in Quantitative Genetics : Modeling Genetic Variance for Quantitative Trait Loci Mapping and Genomic EvaluationShen, Xia January 2012 (has links)
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
|
107 |
Signals and Noise in Complex Biological SystemsRung, Johan January 2007 (has links)
In every living cell, millions of different types of molecules constantly interact and react chemically in a complex system that can adapt to fluctuating environments and extreme conditions, living to survive and reproduce itself. The information required to produce these components is stored in the genome, which is copied in each cell division and transferred and mixed with another genome from parent to child. The regulatory mechanisms that control biological systems, for instance the regulation of expression levels for each gene, has evolved so that global robustness and ability to survive under harsh conditions is a strength, at the same time as biological tasks on a detailed molecular level must be carried out with good precision and without failures. This has resulted in systems that can be described as a hierarchy of levels of complexity: from the lowest level, where molecular mechanisms control other components at the same level, to pathways of coordinated interactions between components, formed to carry out particular biological tasks, and up to large-scale systems consisting of all components, connected in a network with a topology that makes the system robust and flexible. This thesis reports on work that model and analyze complex biological systems, and the signals and noise that regulate them, at all different levels of complexity. Also, it shows how signals are transduced vertically from one level to another, as when a single mutation can cause errors in low level mechanisms, disrupting pathways and create systemwide imbalances, such as in type 2 diabetes. The advancement of our knowledge of biological systems requires both that we go deeper and towards more detail, of single molecules in single cells, as well as taking a step back to understand the organisation and dynamics in the large networks of all components, and unite the different levels of complexity.
|
108 |
Tracing selection and adaptation along an environmental gradient in Populus tremulaHall, David January 2009 (has links)
The distribution of the expressed genotype is moved around in the population over time byevolution. Natural selection is one of the forces that act on the phenotype to change the patterns ofnucleotide variation underlying those distributions. How the phenotype changes over aheterogeneous environment describes the type of evolutionary force acting on this trait and thisshould be reflected in the variation at loci underlying this trait. While the variation in phenotypesand at the nucleotide level in a population indicates the same evolutionary force, it does notnecessarily mean that they are connected. In natural populations the continuous shifting of geneticmaterial through recombination events break down possible associations between loci facilitates theexamination of possible causal loci to single base pair differences in DNA-sequences. Connecting thegenotype and the phenotype thus provides an important step in the understanding the geneticarchitecture of complex traits and the forces that shape the observed patterns.This thesis examines the European aspen, Populus tremula, sampled from subpopulations overan extensive latitudinal gradient covering most of Sweden. Results show a clear geneticdifferentiation in the timing of bud set, a measure of the autumnal cessation of growth, betweendifferent parts of Sweden pointing at local adaptation. In the search for candidate genes thatunderlie the local adaptation found, most genes (25) in the photoperiodic gene network wereexamined for signals of selection. Genes in the photoperiodic network show an increase in theheterogeneity of differentiation between sampled subpopulations in Sweden. Almost half (12) of theexamined genes are under some form of selection. Eight of these genes show positive directionalselection on protein evolution and the gene that code for a photoreceptor, responsible for mediatingchanging light conditions to downstream targets in the network, has the hallmarks of a selectivesweep. The negative correlation between positive directional selection and synonymous diversityindicates that the majority of the photoperiod gene network has undergone recurrent selectivesweeps. A phenomenon that likely has occurred when P. tremula has readapted to the northern lightregimes during population expansion following retracting ice between periods of glaciations. Two ofthe genes under selection also have single nucleotide polymorphisms (SNP) that associate with budset, two in the PHYB2 gene and one in the LHY2 gene. Furthermore, there is an additional SNP inLHY1 that explain part of the variation in timing of bud set, despite the lack of a signal of selection atthe LHY1 gene. Together these SNPs explain 10-15% of the variation in the timing of bud set and 20-30% more if accounting for the positive co-variances between SNPs. There is thus rather extensiveevidence that genes in the photoperiod gene network control the timing of bud set, and reflect localadaptation in this trait.
|
109 |
Genetic Analyses of Multiple Sclerosis and Systemic Lupus Erythematosus : From Single Markers to Genome-Wide DataSandling, Johanna K January 2010 (has links)
In autoimmune diseases an individual’s immune system becomes targeted at the body’s own healthy cells. The aim of this thesis was to identify genetic risk factors for the two autoimmune diseases multiple sclerosis (MS) and systemic lupus erythematosus (SLE). In Study I, we found that genetic variation in the interferon regulatory factor 5 gene (IRF5), previously shown to be associated with SLE, rheumatoid arthritis and inflammatory bowel diseases, was associated also with MS. An insertion/deletion polymorphism in the first intron of IRF5 is as a good functional candidate for this association. IRF5, together with the signal transducer and activator of transcription 4 gene (STAT4), are the most important genetic risk factors for SLE, outside the HLA region. In Study II we showed using a family-based study design that genetic variation in STAT4 is associated with SLE also in the Finnish population. In Study III, we investigated a STAT4 risk allele for SLE for its association with cardiovascular disease in SLE patients. The risk allele of STAT4 proved to be strongly associated with ischemic cerebrovascular disease and anti-phospholipid antibodies in SLE patients. A possible mechanism for this association is that the risk allele leads to increased production of pro-thrombotic anti-phospholipid antibodies, which in turn increases the risk for stroke. Both IRF5 and STAT4 are involved in signalling of the type I interferon system. In Study IV, we investigated 78 additional genes in this system for their association with SLE in a Swedish cohort. The most promising results were followed up in additional patients and controls from Sweden and the US. Two novel SLE genes were identified. In Study V a large follow-up of a genome-wide association study was performed. Five new SLE loci were identified: TNIP1, PRDM1, JAZF1, UHRF1BP1 and IL10. A number of genes previously shown to be associated with other autoimmune diseases were also tested for association with SLE. This analysis identified the type I interferon system gene IFIH1 as a novel SLE risk locus. These studies confirms the central role of the type I interferon system in SLE and further suggests common genetic risk factors in autoimmunity.
|
110 |
Novel Statistical Methods in Quantitative Genetics : Modeling Genetic Variance for Quantitative Trait Loci Mapping and Genomic EvaluationShen, Xia January 2012 (has links)
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
|
Page generated in 0.1472 seconds