Global ETD Search

11	Rare and low-frequency variants and predisposition to complex disease Albers, Patrick K. January 2017 (has links) Advances in high-throughput genomic technologies have facilitated the collection of DNA information for thousands of individuals, providing unprecedented opportunities to explore the genetic architecture of complex disease. One important finding has been that the majority of variants in the human genome are low in frequency or rare. It has been hypothesised that recent explosive growth of the human population afforded unexpectedly large amounts of rare variants with potentially deleterious effects, suggesting that rare variants may play a role in disease predisposition. But, importantly, rare variants embody a source of information through which we may learn more about our recent evolutionary history. In this thesis, I developed several statistical and computational methods to address problems associated with the analysis of rare variants and, foremost, to leverage the genealogical information they encode. First, one constraint in genome-wide association studies is that lower-frequency variants are not well captured by genotyping methods, but instead are predicted through imputation from a reference dataset. I developed the meta-imputation method to improve imputation accuracy by integrating genotype data from multiple, independent reference panels, which outperformed imputations from separate references in almost all comparisons (mean correlation with masked genotypes r<sup>2</sup>&GT;0.9). I further demonstrated in simulated case-control studies that meta-imputation increased the statistical power to identify low-frequency variants of intermediate or high penetrance by 2.2-3.6%. Second, rare variants are likely to have originated recently through mutation and thereby sit on relatively long haplotype regions identical by descent (IBD). I developed a method that exploits rare variants as identifiers for shared haplotype segments around which the breakpoints of recombination are detected using non-probabilistic approaches. In coalescent simulations, I show that such breakpoints can be inferred with high accuracy (r<sup>2</sup>&GT;0.99) around rare variants at frequencies &LT;0.05%, using either haplotype or genotype data. Third, I show that technical error poses a major problem for the analysis of whole-genome sequencing or genotyping data, particularly for alleles below 0.05% frequency (false positive rate, FPR=0.1). I therefore propose a novel approach to infer IBD segments using a Hidden Markov Model (HMM) which operates on genotype data alone. I incorporated an empirical error model constructed from error rates I estimated in publicly available sequencing and genotyping datasets. The HMM was robust in presence of error in simulated data (r<sup>2</sup>&GT;0.98) while nonprobabilistic methods failed (r<sup>2</sup>&LT;0.02). Lastly, the age of an allele (the time since its creation through mutation) may provide clues about demographic processes that resulted in its observed frequency. I present a novel method to estimate (rare) allele age based on the inferred shared haplotype structure of the sample. The method operates in a Bayesian framework to infer pairwise coalescent times from which the age is estimated using a composite posterior approach. I show in simulated data that coalescent time can be inferred with high accuracy (rank correlation &GT;0.91) which resulted in a likewise high accuracy for estimated age (&GT;0.94). When applied to data from the 1000 Genomes Project, I show that estimated age distributions were overall conform with frequency-dependent expectations under neutrality, but where patterns of low frequency and old age may hint at signatures of selection at certain sites. Thus, this method may prove useful in the analysis of large cohorts when linked to biomedical phenotype data.
12	EXOME SEQUENCING FOR RARE MUTATIONS IN YOUNG STROKE / EXOME SEQUENCING TO CHARACTERIZE THE ROLES OF MENDELIAN STROKE GENES AND NOVEL GENES IN YOUNG STROKE Chong, Michael 11 1900 (has links) Background: Rare genetic mutations cause familial early-onset stroke disorders, known as “Mendelian strokes”. The broader relevance of rare mutations in unrelated young stroke patients is uncertain. We hypothesize that rare mutations in known and novel genes are important risk factors for stroke. Methods: Exome sequencing was used to characterize rare disruptive protein-altering mutations in 185 young cases and 185 matched controls from INTERSTROKE, a large and globally representative stroke study. The major objectives were: 1) to precisely define the role of known Mendelian stroke genes and 2) to discover novel gene and pathway associations. Results: A focused assessment of known Mendelian stroke genes revealed a significant contribution from NOTCH3, the causal gene for Cerebral Autosomal Dominant Arteriopathies with Subcortical Infarcts and Leucoencephalopathies (CADASIL). CADASIL mutations were identified in six cases and no controls (P=0.03). The clinical presentation of CADASIL mutation carriers deviated from known symptomatology, consisting of small-vessel ischemic strokes (SVIS) accompanied by secondary features including migraine and depression. A novel role for non-CADASIL NOTCH3 mutations in ICH was also elucidated (OR=2.86; 95% CI, 1.13 to 7.93, P=0.02). Such mutations were present in 22% of ICH cases and 8% of matching controls. An agnostic evaluation of all genes did not reveal any genome-wide significant associations. However, NOTCH3 was among the top ICH genes out of 13,706 tested, and many others were also biologically relevant, notably, AARS2 and NBEAL2. A protective association was identified for the renin angiotensin system (P=8.1x10-4), whereas type II diabetes mellitus was associated with increased risk (P=1.9x10-2). Conclusion: Rare mutations influence risk of early-onset stroke. CADASIL mutations play an important role in unrelated stroke patients. Beyond CADASIL, a novel role was uncovered for other NOTCH3 mutations as common and significant risk factors for ICH. Novel biologically relevant genes and pathways may also affect stroke susceptibility. / Thesis / Master of Science in Medical Sciences (MSMS) Exome sequencing Cardiovascular disease Stroke Genetics Cerebrovascular disease hemorrhagic stroke small vessel stroke ischemic stroke CADASIL Mendelian stroke Stroke genetics Rare variants INTERSTROKE Next-generation sequencing
13	Detecting Rare Haplotype-Environment Interaction and Dynamic Effects of Rare Haplotypes using Logistic Bayesian LASSO Xia, Shuang 30 December 2014 (has links) No description available. Statistics age-related macular degeneration hypertension combination of common SNPs GXE cohort study GWAS missing heritability rare variants B-Spline retrospective likelihood dynamic effect prospective likelihood
14	Étude des déterminants génétiques et moléculaires de la scoliose idiopathique Nada, Dina 04 1900 (has links) No description available. Scoliose idiopathique Séquençage d’exome (WES) Variants rares GWAS FAT3 LBX1 CHI3L1 YKL-40 Endophenotypes Population québécoise Idiopathic scoliosis Whole exome sequencing Rare variants French-Canadian population
15	Genetic architecture of complex disease in humans :a cross-population exploration Martínez Marigorta, Urko, 1983- 12 November 2012 (has links) The aetiology of common diseases is shaped by the effects of genetic and environmental factors. Big efforts have been devoted to unravel the genetic basis of disease with the hope that it will help to develop new therapeutic treatments and to achieve personalized medicine. With the development of high-throughput genotyping technologies, hundreds of association studies have described many loci associated to disease. However, the depiction of disease architecture remains incomplete. The aim of this work is to perform exhaustive comparisons across human populations to evaluate pressing questions. Our results provide new insights in the allele frequency of risk variants, their sharing across populations and the likely architecture of disease / La etiología de las enfermedades comunes está formada por factores genéticos y ambientales. Se ha puesto mucho empeño en describir sus bases genéticas. Este conocimiento será útil para desarrollar nuevas terapias y la medicina personalizada. Gracias a las técnicas de genotipado masivo, centenares de estudios de asociación han descrito una infinidad de genes asociados a enfermedad. Pese a ello, la arquitectura genética de las enfermedades no ha sido totalmente descrita. Esta tesis pretende llevar a cabo exhaustivas comparaciones entre poblaciones para responder diversas preguntas candentes. Nuestros resultados dan pistas sobre la frecuencia de los alelos de riesgo, su presencia entre poblaciones y la probable arquitectura de las enfermedades. Complex disease Genome-wide association studies Human populations Replicability Genetic architecture of disease Rare variants Infinitesimal model Genomic and personalized medicine Enfermedades complejas Estudio de asociación genómica Poblaciones humanas Replicabilidad Arquitectura genética Variantes raras Modelo infinitesimal Medicina personalizada y genómica 575
16	Moving beyond Genome-Wide Association Studies / Comment aller au delà des études d'association à l'échelle du génome entier Delahaye-Sourdeix, Manon 14 November 2014 (has links) Les études d'association à grande échelle consistent à étudier la corrélation de plusieurs millions de polymorphismes nucléotidiques avec un risque de cancer chez des milliers d'individus, sans avoir besoin de connaissances préalables sur la fonction biologique de ces variants. Ces études ont été utiles pour établir des hypothèses étiologiques et comprendre l'architecture génétique sous-jacente de plusieurs maladies humaines. Cependant, la plupart des facteurs héréditaires de ces maladies restent inexpliqués. Une partie de cette variation pourrait venir de variants rares qui ne sont pas ciblés par les puces de génotypage actuelles ou encore de variants avec un effet plus modéré voire faible pour lesquels une détection par les études d'association actuelles n'est pas envisageable. Dans ce contexte et comme illustré dans cette thèse, les récentes études d'association peuvent maintenant servir de point de départ pour de nouvelles découvertes, en mettant en place des stratégies innovantes pour étudier à la fois les variants rares et les maladies rares. Nous avons plus particulièrement exploré ces techniques dans le cadre du cancer du poumon, des voies aérodigestives et du lymphome de Hodgkins. L'utilisation de la bioinformatique pour combiner les résultats des études avec d'autres sources d'information, l'intégration de différents types de données génomiques ainsi que l'investigation de la relation entre altérations germinales et somatiques représentent les principales opportunités poursuivies dans ce travail de thèse / Genome-wide association (GWA) studies consist in testing up to one million (or more) single nucleotide polymorphisms (SNPs) for their association with cancer risk in thousands of individuals, without requiring any prior knowledge on the functional significance of these variants. These studies have been valuable for establishing etiological hypotheses and understanding the underlying genetic architecture of human diseases. However, most of the heritable factors of these traits remain unexplained. Part of this variation may come from rarer variants that are not targeted by current genotyping arrays or variants with moderate to low effects for which detection by current GWA studies is impractical. In this context and as illustrated in this thesis, GWA studies can now serve as starting points towards further discoveries, looking for new strategies to study both rarer variants and rarer diseases. We have specifically explored these approaches in the context of lung cancer, head and neck cancer and Hodgkin's lymphoma. The use of bioinformatics to combine recent GWA study results with other sources of information, the integration of different types of genomic data as well as the investigation of the interrelationship between germline and somatic alterations represent the main opportunities pursued in this thesis work Étude d'association Susceptibilité génétique Sequençage haut débit Imputation Cancer du poumon Cancer des voies aérodigestives Lymphome de Hodgkins Variants rares GWAS Genetic susceptibility Next generation sequencing Imputation Lung cancer Head and neck cancer Hodgkin's lymphoma Rare variants 570.15
17	Caractérisation de l'étiologie génétique de patients atteints de différentes maladies neuromusculaires par l’intégration de données omiques Triassi, Valérie 12 1900 (has links) Les progrès des technologies de séquençage ont joué un rôle important dans le diagnostic moléculaire des maladies rares, telles que les myopathies et les dystrophies musculaires. Cependant, plusieurs patients atteints de maladies neuromusculaires restent sans diagnostic. Ceci est dû à la grande hétérogénéité clinique et génétique ainsi qu'au caractère hautement polymorphique des gènes associés à ces troubles. L'interprétation des données génétiques est un grand défi et les tests génétiques aboutissent souvent à l'identification de variants de signification inconnue (VUSs). Plusieurs de ces variants peuvent perturber l'épissage normal de l'ARN ou affecter l'expression des gènes. À cet égard, nous proposons une approche bio-informatique. Notre objectif est de mettre en place un pipeline identifiant et caractérisant des variants d'intérêt dans un contexte pathologique. Afin de déterminer si les variants ont un impact fonctionnel, notre pipeline se concentre sur l'épissage alternatif ainsi que sur l'intégration des données génomiques et transcriptomiques. Nous émettons l'hypothèse qu'une partie des patients sans diagnostic pour leur maladie neuromusculaire s'explique par des variants introniques jouant un rôle régulateur ou affectant l'épissage et l'abondance de l'ARNm. Cette approche multi-omique permet de déterminer si les variants ont un impact fonctionnel. Pour ce faire, nous avons réalisé un séquençage de l'ARN et de l'ADN à partir de biopsies musculaires de quatre patients. Les données ont été alignées et annotées avec différents scores de pathogénicité. Les événements d'épissage sont analysés par SpliceAI et rMATS. L'analyse des gènes différentiellement exprimés a été réalisée par l'outil LPEseq. Les CNVs et les expansions de répétitions ont été analysés avec CNVkit et ExpansionHunter. Plusieurs scripts maison filtrent et intègrent les données ARN et ADN. Pour l'instant, un total de huit variants pathogéniques sont proposés pour nos patients, mais des investigations supplémentaires sont nécessaires. Les variants recherchés sont rares et nécessitent donc un pipeline cohérent et efficace. Ce projet apportera un bénéfice significatif pour les patients en leur permettant d'obtenir un diagnostic et ainsi d'avoir accès à un meilleur suivi clinique. / Advances in sequencing technologies have played an important role in the molecular diagnosis of rare diseases, such as myopathies and muscular dystrophies. However, several patients with these neuromuscular diseases remain undiagnosed. This is due to the great clinical and genetic heterogeneity as well as the highly polymorphic nature of the genes associated with myopathies and muscular dystrophies. The interpretation of genetic data is a great challenge and genetic testing often results in the identification of variants of uncertain significances (VUSs). Many of these variants can disrupt normal RNA splicing or affect gene expression. In this regard, we propose a bioinformatics approach. Our aim is to put in place a pipeline identifying and characterizing variants of interest in a pathological context. To determine if the variants have a functional impact, our pipeline focuses on alternative splicing as well as the integration of genomic and transcriptomic data. We hypothesize that a portion of patients without a diagnosis for their neuromuscular disorder is explained by intronic variants having a regulatory role or affecting mRNA splicing and abundance. This multi-omics approach will make it possible to determine whether the variants have a functional impact. To do so, we performed RNA and DNA sequencing using muscle biopsies from four patients. Data was aligned and annotated with different pathogenicity scores. Splicing events are analyzed by SpliceAI and rMATS. The analysis of the differentially expressed genes was carried out by the LPEseq tool. CNVs and repeat expansion were analyzed with CNVkit and ExpansionHunter. Several in-house scripts filter and integrate RNA and DNA data. For now, a total of eight pathogenic variants are proposed for our patients, but further investigations are needed. The variants sought are rare and therefore require a coherent and efficient pipeline to facilitate their characterization. This project will have a significant benefit for patients by allowing them to obtain a diagnosis and thus have access to better clinical follow-up. Neuromusculaire Traits complexes Variants rares Bio-informatique Épissage alternatif RNA-Seq Exome-Seq WGS Maladies Mendélienne Neuromuscular Complex traits Rare variants Bioinformatics Alternative splicing Mendelian diseases
18	Recherche des facteurs génétiques contrôlant la réponse à l’infection par Mycobacterium tuberculosis et le développement d’une tuberculose maladie / Search for genetic factors controlling the response to infection by Mycobacterium tuberculosis and the development of clinical tuberculosis Jabot-Hanin, Fabienne 12 October 2017 (has links) La tuberculose, causée par Mycobacterium tuberculosis, connaît actuellement une résurgence inquiétante, et l’OMS estime à plus de 10 millions le nombre de nouveaux cas cliniques en 2015 avec environ 1,8 millions de décès dus à la maladie. Environ un tiers de la population mondiale est exposée à M.tuberculosis, et après exposition, la plupart des individus sont infectés par la mycobactérie. La grande majorité (~90%) des individus infectés ne présentera jamais de symptomatologie clinique. Parmi les 10% qui développent la maladie, environ la moitié le fera dans les deux années suivant l’infection, ce qui est en général considéré comme une forme primaire de tuberculose. Les autres patients présenteront leur maladie à distance de l’infection primaire (parfois plusieurs dizaines d’années plus tard) ; il s’agit des formes pulmonaires classiques de l’adulte. Chez l’homme, le rôle de certains facteurs génétiques a été maintenant démontré dans le développement d’une tuberculose active, à la fois la tuberculose pulmonaire de l’adulte et les formes plus disséminées de l’enfant, et aussi dans le contrôle de l’infection tuberculeuse. Cependant, la plus grande part de ces facteurs génétiques reste à identifier. Le premier objectif de ma thèse était d'identifier les facteurs génétiques de l'hôte modulant les phénotypes immunologiques de production d'Interféron gamma in vitro (IGRA) après exposition à M. tuberculosis dans un échantillon de 590 individus ayant été en contact avec un cas avéré de tuberculose dans le Val de Marne, en région parisienne. Puis, dans un second temps, de voir si les facteurs trouvés pouvaient être répliquées dans un échantillon familial d'Afrique du Sud, zone de très forte endémie tuberculeuse. Pour cela, j'ai tout d'abord réalisé des analyses de liaison génétique à l'échelle du génome entier sur plusieurs phénotypes quantitatifs d'IGRA. Celles-ci ont permis de mettre en évidence 2 loci majeurs (p < 10-4) répliqués en Afrique du Sud et liés à la production d'interféron gamma induite pour l’un par le bacille du BCG, et pour l’autre, par la part spécifique de l'antigène ESAT6 de M. tuberculosis (absent de la plupart des mycobactéries environnementales et du BCG), indépendamment de la capacité intrinsèque de réponse aux mycobactéries. La seconde étape a consisté en la réalisation d'une étude d'association sur les régions de liaison ainsi identifiées. Un variant associé au phénotype spécifique de l’ESAT6 (p < 10-5) a ainsi été trouvé, variant contribuant de manière significative au pic de liaison précédemment découvert (p<0.001) et ayant été rapporté comme modulant l’expression du gène ZXDC. Le second objectif de la thèse concernait l’identification de variants génétiques rares sous-jacents à la déclaration d’une tuberculose pulmonaire chez les individus infectés par le bacille. A cette fin, j’ai comparé les exomes de 120 patients tuberculeux à ceux de 136 individus infectés par le bacille mais non malades, tous originaires du Maroc. Cette étude m’a permis d’identifier le gène BTNL2, en bordure de la région HLA, dans lequel près de 10% des patients comportaient un variant rare perte de fonction contrairement aux contrôles qui n’en présentaient aucun. / Tuberculosis remains a major public health concern, with approximately 10.4 million new cases and 1.8 million deaths due to the disease in 2015 according to WHO. While an estimated one third of the world population is estimated to be infected with Mycobacterium tuberculosis, only about 10% of infected individuals go on to develop a clinical disease. Among them, half will declare the disease in the 2 years following infection, which is generally considered as primary tuberculosis. The other patients will develop the disease more distant in time of primary infection, sometimes several tens of years latter; these are classical pulmonary forms in adults. In humans, the role of genetic factors have been demonstrated in the development of active tuberculosis, in pulmonary forms as in disseminated forms in childhood, et also in the control of M.tuberculosis infection. Nevertheless, most of these genetic factors remain to identify. The first aim of my PhD was to identify genetic factors controlling in vitro interferon-gamma production phenotypes (IGRA) after exposure to M.tuberculosis in a sample of 590 subjects who were in contact with a proven tuberculous patient in Val-de-Marne, Paris suburbs, and in a second time, to try to replicate the findings in a south African familial sample where the tuberculosis is highly endemic. For this purpose, I first performed genome-wide genetic linkage analysis for several quantitative IGRA phenotypes. They led to identify 2 major loci (p<10-4) replicated in South-Africa and linked to the interferon-gamma production induced by live BCG for the first one, and for the second one, by the specific part of the ESAT6 antigen of M.tuberculosis (absent from most of environmental mycobacteria and from BCG), independently of intrinsic ability to respond to mycobacteria. The second step was an association study in the identified linkage regions. A variant associated to the specific ESAT6 phenotype was found (p<10-5), which was significantly contributing to the linkage peak (p<0.001) and previously reported as eQTL of ZXDC gene. The second objective of my PhD was the identification of rare genetic variants underlying the development of pulmonary tuberculosis in infected individuals. To this end, I compared exome data from 120 tuberculous patients and 136 infected individuals without any clinical symptoms. All of them were from Morocco. This study resulted in the lighting of BTNL2 gene, very closed to the HLA region, in which around 10% of patients had a rare loss of function variant whereas the controls didn’t have any. Maladie infectieuse Tuberculose Infection tuberculeuse Génétique humaine Épidémiologie génétique IGRA Interféron gamma Quantiféron Liaison génétique Analyse de liaison Étude d’association génétique Analyse d’exomes NGS Variants rares Burden Tests CAST ZXDC BTNL2 Tuberculosis LTBI Pulmonary tuberculosis Human genetics Genetic epidemiology IGRA Gamma interferon Quantiferon Genetic linkage Association analysis Exome analysis NGS Rare variants Burden Test CAST ZXDC BTNL2 579.374
19	Genetic Determinants of Rare Coding Variants on the Development of Early-Onset Coronary Artery Disease Lali, Ricky 11 1900 (has links) Background: Coronary Artery Disease (CAD) represents the leading cause of mortality and morbidity worldwide despite declines in the prevalence of environmental risk factors. This trend has drawn attention to the risk conferred by genetic variation. Twin and linkage studies demonstrate a profound hereditary risk for CAD, especially in young individuals. Rare genetic variants conferring high risk for extreme disease phenotypes can provide invaluable insight into novel mechanisms underlying CAD development. Methods: Whole exome sequencing was performed to characterize rare protein-altering variants in 52 early-onset CAD (EOCAD) patients encompassing the DECODE study. The enrichment of Mendelian dyslipidemias in EOCAD was assessed through interrogation of pathogenic mutations among known lipid genes. The identification of novel genetic CAD associations was conducted through case-only and case-control approaches across all protein-coding genes using rare variant burden and variance component tests. Lastly, beta coefficients for significant risk genes from the European population in the Early-onset Myocardial Infarction (EOMI) cohort (N=552) were used to construct calibrated, single-sample rare variant gene scores (RVGS) in DECODE Europeans (N=39) and a local European CAD-free cohort (N=77). Results: A 20-fold enrichment of Familial hypercholesterolemia mutation carriers was detected in EOCAD cases compared to CAD-free controls (P=0.005). Association analysis using EOMI Europeans revealed exome-wide and nominal significance for two known CAD/MI genes: CELSR2 (P=1.1x10-17) and APOA5 (P=0.001). DECODE association revealed exome-wide and nominal significance for genes involved in endothelial integrity and immune cell activity. RVGS based upon beta coefficients of significant CAD/MI risk genes were significantly increased in DECODE (z-score=1.84; p=0.03) and insignificantly decreased among CAD-free individuals (z-score=-1.61; p=0.053). Conclusion: Rare variants play a pivotal role in the development early CAD through Mendelian and polygenic mechanisms. Construction of RVGS that are calibrated against population and technical biases can facilitate discovery of single-sample and cohort-based associations beyond what is detectable using standard methods. / Thesis / Master of Science (MSc)
20	Medical relevance and functional consequences of protein truncating variants Rivas Cruz, Manuel A. January 2015 (has links) Genome-wide association studies have greatly improved our understanding of the contribution of common variants to the genetic architecture of complex traits. However, two major limitations have been highlighted. First, common variant associations typically do not identify the causal variant and/or the gene that it is exerting its effect on to influence a trait. Second, common variant associations usually consist of variants with small effects. As a consequence, it is more challenging to harness their translational impact. Association studies of rare variants and complex traits may be able to help address these limitations. Empirical population genetic data shows that deleterious variants are rare. More specifically, there is a very strong depletion of common protein truncating variants (PTVs, commonly referred to as loss-of-function variants) in the genome, a group of variants that have been shown to have large effect on gene function, are enriched for severe disease-causing mutations, but in other instances may actually be protective against disease. This thesis is divided into three parts dedicated to the study of protein truncating variants, their medical relevance, and their functional consequences. First, I present statistical, bioinformatic, and computational methods developed for the study of protein truncating variants and their association to complex traits, and their functional consequences. Second, I present application of the methods to a number of case-control and quantitative trait studies discovering new variants and genes associated to breast and ovarian cancer, type 1 diabetes, lipids, and metabolic traits measured with NMR spectroscopy. Third, I present work on improving annotation of protein truncating variants by studying their functional consequences. Taken together, these results highlight the utility of interrogating protein truncating variants in medical and functional genomic studies. 572

Search results