71 |
Ewastools: Infinium Human Methylation BeadChip pipeline for population epigenetics integrated into GalaxyMurat, Katarzyna, Grüning, B., Poterlowicz, P.W., Westgate, Gillian E., Tobin, Desmond J., Poterlowicz, Krzysztof 26 June 2020 (has links)
Yes / Infinium Human Methylation BeadChip is an array platform for complex evaluation of DNA methylation at an individual CpG locus in the human genome based on Illumina’s bead technology and is one of the most common techniques used in epigenome-wide association studies. Finding associations between epigenetic variation and phenotype is a significant challenge in biomedical research. The newest version, HumanMethylationEPIC, quantifies the DNA methylation level of 850,000 CpG sites, while the previous versions, HumanMethylation450 and HumanMethylation27, measured >450,000 and 27,000 loci, respectively. Although a number of bioinformatics tools have been developed to analyse this assay, they require some programming skills and experience in order to be usable.
Results
We have developed a pipeline for the Galaxy platform for those without experience aimed at DNA methylation analysis using the Infinium Human Methylation BeadChip. Our tool is integrated into Galaxy (http://galaxyproject.org), a web-based platform. This allows users to analyse data from the Infinium Human Methylation BeadChip in the easiest possible way.
Conclusions
The pipeline provides a group of integrated analytical methods wrapped into an easy-to-use interface. Our tool is available from the Galaxy ToolShed, GitHub repository, and also as a Docker image. The aim of this project is to make Infinium Human Methylation BeadChip analysis more flexible and accessible to everyone. / Research Development Fund Publication Prize Award winner, May 2020.
|
72 |
Enrichment of inflammatory bowel disease and colorectal cancer risk variants in colon expression quantitative trait lociHulur, Imge, Gamazon, Eric R., Skol, Andrew D., Xicola, Rosa M., Llor, Xavier, Onel, Kenan, Ellis, Nathan A., Kupfer, Sonia S. January 2015 (has links)
BACKGROUND: Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with diseases of the colon including inflammatory bowel diseases (IBD) and colorectal cancer (CRC). However, the functional role of many of these SNPs is largely unknown and tissue-specific resources are lacking. Expression quantitative trait loci (eQTL) mapping identifies target genes of disease-associated SNPs. This study provides a comprehensive eQTL map of distal colonic samples obtained from 40 healthy African Americans and demonstrates their relevance for GWAS of colonic diseases. RESULTS: 8.4 million imputed SNPs were tested for their associations with 16,252 expression probes representing 12,363 unique genes. 1,941 significant cis-eQTL, corresponding to 122 independent signals, were identified at a false discovery rate (FDR) of 0.01. Overall, among colon cis-eQTL, there was significant enrichment for GWAS variants for IBD (Crohn's disease [CD] and ulcerative colitis [UC]) and CRC as well as type 2 diabetes and body mass index. ERAP2, ADCY3, INPP5E, UBA7, SFMBT1, NXPE1 and REXO2 were identified as target genes for IBD-associated variants. The CRC-associated eQTL rs3802842 was associated with the expression of C11orf93 (COLCA2). Enrichment of colon eQTL near transcription start sites and for active histone marks was demonstrated, and eQTL with high population differentiation were identified. CONCLUSIONS: Through the comprehensive study of eQTL in the human colon, this study identified novel target genes for IBD- and CRC-associated genetic variants. Moreover, bioinformatic characterization of colon eQTL provides a tissue-specific tool to improve understanding of biological differences in diseases between different ethnic groups.
|
73 |
On genetic variants underlying common diseaseHechter, Eliana January 2011 (has links)
Genome-wide association studies (GWAS) exploit the correlation in ge- netic diversity along chromosomes in order to detect effects on disease risk without having to type causal loci directly. The inevitable downside of this approach is that, when the correlation between the marker and the causal variant is imperfect, the risk associated with carrying the predisposing allele is diluted and its effect is underestimated. This thesis explores four different facets of this risk dilution: (1) estimating true effect sizes from those observed in GWAS; (2) asking how the context of a GWAS, including the population studied, the genotyping chip employed, and the use of im- putation, affects risk estimates; (3) assessing how often the best-associated SNP in a GWAS coincides with the causal variant; and (4) quantifying how departures from the simplest disease risk model at a causal variant distort the observed disease risk model. Using simulations, where we have information about the true risk at the causal locus, we show that the correlation between the marker and the causal variant is the primary driver of effect size underestimation. The extent of the underestimation depends on a number of factors, including the population in which the study is conducted, the genotyping chip employed, whether imputation is used, and the strength, frequency, and disease model of the risk allele. Suppose that a GWAS study is conducted in a European population, with an Affymetrix 6.0 genotyping chip, without imputation, and that the causal loci have a modest effect on disease risk, are common in the population, and follow an additive disease risk model. In such a study, we show that the risk estimated from the most associated SNP is very close to the truth approximately two-thirds of the time (although we predict that fine mapping of GWAS loci will infrequently identify causal variants with considerably higher risk), and that the best-associated variant is very often perfectly or nearly-perfectly correlated with, and almost always within 0.1cM of, the causal variant. However, the strong correlations among nearby loci mean that the causal and best-associated variants coincide infrequently, less than one-fifth of the time, even if the causal variant is genotyped. We explore ways in which these results change quantitatively depending on the parameters of the GWAS study. Additionally, we demonstrate that we expect to identify substantial deviations from the additive disease risk model among loci where association is detected, even though power to detect departures from the model drops off very quickly as the correlation between the marker and causal loci decreases. Finally, we discuss the implications of our results for the design and interpretation of future GWAS studies.
|
74 |
Imputation aided analysis of the association between autoimmune diseases and the MHCMoutsianas, Loukas January 2011 (has links)
The Major Histocompatibility Complex (MHC) is a genomic region in chromosome 6 which has been consistently found to be associated with the risk of developing virtually all common autoimmune diseases. Although its importance in disease pathogenesis has been known for decades, efforts to disentangle the roles of the classical human leukocyte antigens (HLA) and other variants responsible for the susceptibility to disease have often met with limited success, owing to the complex structure and extreme heterogeneity of the region. In this thesis, I interrogate the MHC for association with three common autoimmune diseases, ankylosing spondylitis, psoriasis and multiple sclerosis, with the aim of confirming the previously-reported associations and of identifying novel ones. To do so, I employ a systematic, joint analysis of single nucleotide polymorphism (SNP) and HLA allele data, in a logistic regression framework, using a recently developed algorithm to predict the HLA alleles for samples where such information is unavailable. To ensure the reliability of the analysis, I apply stringent quality control procedures and integrate over the uncertainty of the HLA allele predictions. Moreover, I resolve the haplotype phase of individuals from the HapMap project to create reliable reference panels, used in both HLA prediction and in quality control procedures. By directly testing HLA subtypes for association with the disease, the power to detect such associations is increased. I present the results of the analysis on the three disease phenotypes and discuss the evidence for important novel findings amongst both SNPs and HLA alleles in two of the diseases. In the final part of this thesis, I introduce a novel, model-based approach to detect inconsistencies in the data and show how it can be used to flag problematic SNPs which conventional quality control procedures may fail to identify.
|
75 |
Troost - Busca de interações entre trios de SNPs em estudos de associação de genoma inteiro / Troost Search for interactions among trios of SNPs in genome-wide association studiesAzevedo Neto, José Osório de Oliveira 07 November 2013 (has links)
Os estudos de associação de genoma inteiro têm encontrado alguns marcadores associados a doenças notoriamente hereditárias com herança complexa, mas, muitas vezes, estes marcadores somente explicam uma pequena parte da herdabilidade. Este relativo insucesso é atribuído, entre outras causas, à epistasia, ou seja, interação entre diferentes locos genéticos. A busca por epistasia é complexa e exige intensos recursos computacionais. Diversos métodos têm sido propostos para abordar este problema, incluindo métodos estatísticos tradicionais, busca estocástica e métodos heurísticos. Poucos destes métodos são capazes de processar as grandes massas de dados produzidas nos estudos caso-controle de genoma inteiro, e ainda menos métodos buscam conjuntos de três ou mais marcadores. A busca exaustiva de conjuntos de marcadores epistáticos é inviável hoje em dia para estes conjuntos, mas o algoritmo BOOST (WAN et al., 2010) mostrou que ela é relativamente fácil para pares de locos, em especial com o uso de placas gráficas como processadores (GPGPU). Partindo deste recente sucesso, propomos um algoritmo em fases para a busca de trios de locos que interagem, utilizando a busca de pares como passo inicial, uma abordagem ainda não utilizada. Outra ideia fundamental do algoritmo proposto é a extensão da concepção de trio de marcadores para um trio de blocos haplotípicos, onde cada bloco é formado por marcadores próximos entre si. Usando os dados do WTCCC, o Troost (de TRio+bOOST) sugeriu trios potencialmente epistáticos em todas a sete doenças. Quando submetidos à confirmação em amostra independente, os trios não puderam ser confirmados, exceto os trios para diabetes tipo 1 (T1D). Duzentos e oito trios foram confirmados para T1D, com baixos valores-P e genótipos combinados de risco com altas razões de chances. Os SNPs que compõem estes trios estão todos na região MHC, sabidamente associada à doença, exceto por um deles que está no cromossomo cinco e não havia sido previamente relacionado à T1D. / Genome-wide association studies have found some markers associated with diseases with complex inheritance. However, these markers explain only a fraction of the previously estimated heritability of the trait. This relative failure has been credited, among other causes, to epistasis, i.e. the interaction among genotypes at different loci. The search for epistasis is complex and requires intense computational resources. Many methods have been proposed to approach this problem, including traditional statistics, stochastic search, and heuristic methods. Few of them are capable of extracting, from the large amount of data produced in genome-wide case-control studies, useful information about sets of markers associated with the trait in question. Exhaustive search of sets of interacting markers is unfeasible nowadays for sets of three or more markers, but the BOOST algorithm (WAN et al., 2010) showed that the search is relatively easy for pairs of SNPs, in particular with the use of graphic cards for general processing (GPGPU). Starting from this recent success, we propose an algorithm in phases for the search for trios of interacting loci, using the search for pairs as the initial step, an approach not tried yet, to our knowledge. Another important idea of our algorithm is the extension of the concept of trio of markers to a trio of haplotypic blocks, where each block is formed by neighbor markers. Using data from WTCCC, the Troost (from TRio+bOOST) algorithm suggested potentially epistatic trios in all seven diseases. When submitted to a confirmation in an independent sample, the results could not be confirmed, except for type-1 diabetes (T1D). Two hundred eight trios were confirmed for T1D, with low p-values and risk combined genotypes with high odds ratio. The SNPs that form those trios are all in the MHC region, which is known to be strongly associated to T1D, except by one SNP in chromosome five that has not been previously associated with T1D.
|
76 |
Architecture of human complex trait variationXin, Xiachi January 2018 (has links)
A complex trait is a trait or disease that is controlled by both genetic and environmental factors, along with their interactions. Trait architecture encompasses the genetic variants and environmental causes of variation in the trait or disease, their effects on the trait or disease and the mechanism by which these factors interact at molecular and organism levels. It is important to understand trait architecture both from a biological viewpoint and a health perspective. In this thesis, I laid emphasis on exploring the influence of familial environmental factors on complex trait architecture alongside the genetic components. I performed a variety of studies to explore the architecture of anthropometric and cardio-metabolic traits, such as height, body mass index, high density lipoprotein content of blood and blood pressure, using a cohort of 20,000 individuals of recent Scottish descent and their phenotype measurements, Single Nucleotide Polymorphism (SNP) data and genealogical information. I extended a method of variance component analysis that could simultaneously estimate SNP-associated heritability and total heritability whilst considering familial environmental effects shared among siblings, couples and nuclear family members. I found that most missing heritability could be explained by including closely related individuals in the analysis and accounting for these close relationships; and that, on top of genetics, couple and sibling environmental effects are additional significant contributors to the complex trait variation investigated. Subsequently, I accounted for couple and sibling environmental effects in Genome- Wide Association Study (GWAS) and prediction models. Results demonstrated that by adding additional couple and sibling information, both GWAS performance and prediction accuracy were boosted for most traits investigated, especially for traits related to obesity. Since couple environmental effects as modelled in my study might, in fact, reflect the combined effect of assortative mating and shared couple environment, I explored further the dissection of couple effects according to their origin. I extended assortative mating theory by deriving the expected resemblance between an individual and in-laws of his first-degree relatives. Using the expected resemblance derived, I developed a novel pedigree study which could jointly estimate the heritability and the degree of assortative mating. I have shown in this thesis that, for anthropometric and cardio-metabolic traits, environmental factors shared by siblings and couples seem to have important effects on trait variation and that appropriate modelling of such effects may improve the outcome of genetic analyses and our understanding of the causes of trait variation. My thesis also points out that future studies on exploring trait architecture should not be limited to genetics because environment, as well as mate choice, might be a major contributor to trait variation, although trait architecture varies from trait to trait.
|
77 |
Troost - Busca de interações entre trios de SNPs em estudos de associação de genoma inteiro / Troost Search for interactions among trios of SNPs in genome-wide association studiesJosé Osório de Oliveira Azevedo Neto 07 November 2013 (has links)
Os estudos de associação de genoma inteiro têm encontrado alguns marcadores associados a doenças notoriamente hereditárias com herança complexa, mas, muitas vezes, estes marcadores somente explicam uma pequena parte da herdabilidade. Este relativo insucesso é atribuído, entre outras causas, à epistasia, ou seja, interação entre diferentes locos genéticos. A busca por epistasia é complexa e exige intensos recursos computacionais. Diversos métodos têm sido propostos para abordar este problema, incluindo métodos estatísticos tradicionais, busca estocástica e métodos heurísticos. Poucos destes métodos são capazes de processar as grandes massas de dados produzidas nos estudos caso-controle de genoma inteiro, e ainda menos métodos buscam conjuntos de três ou mais marcadores. A busca exaustiva de conjuntos de marcadores epistáticos é inviável hoje em dia para estes conjuntos, mas o algoritmo BOOST (WAN et al., 2010) mostrou que ela é relativamente fácil para pares de locos, em especial com o uso de placas gráficas como processadores (GPGPU). Partindo deste recente sucesso, propomos um algoritmo em fases para a busca de trios de locos que interagem, utilizando a busca de pares como passo inicial, uma abordagem ainda não utilizada. Outra ideia fundamental do algoritmo proposto é a extensão da concepção de trio de marcadores para um trio de blocos haplotípicos, onde cada bloco é formado por marcadores próximos entre si. Usando os dados do WTCCC, o Troost (de TRio+bOOST) sugeriu trios potencialmente epistáticos em todas a sete doenças. Quando submetidos à confirmação em amostra independente, os trios não puderam ser confirmados, exceto os trios para diabetes tipo 1 (T1D). Duzentos e oito trios foram confirmados para T1D, com baixos valores-P e genótipos combinados de risco com altas razões de chances. Os SNPs que compõem estes trios estão todos na região MHC, sabidamente associada à doença, exceto por um deles que está no cromossomo cinco e não havia sido previamente relacionado à T1D. / Genome-wide association studies have found some markers associated with diseases with complex inheritance. However, these markers explain only a fraction of the previously estimated heritability of the trait. This relative failure has been credited, among other causes, to epistasis, i.e. the interaction among genotypes at different loci. The search for epistasis is complex and requires intense computational resources. Many methods have been proposed to approach this problem, including traditional statistics, stochastic search, and heuristic methods. Few of them are capable of extracting, from the large amount of data produced in genome-wide case-control studies, useful information about sets of markers associated with the trait in question. Exhaustive search of sets of interacting markers is unfeasible nowadays for sets of three or more markers, but the BOOST algorithm (WAN et al., 2010) showed that the search is relatively easy for pairs of SNPs, in particular with the use of graphic cards for general processing (GPGPU). Starting from this recent success, we propose an algorithm in phases for the search for trios of interacting loci, using the search for pairs as the initial step, an approach not tried yet, to our knowledge. Another important idea of our algorithm is the extension of the concept of trio of markers to a trio of haplotypic blocks, where each block is formed by neighbor markers. Using data from WTCCC, the Troost (from TRio+bOOST) algorithm suggested potentially epistatic trios in all seven diseases. When submitted to a confirmation in an independent sample, the results could not be confirmed, except for type-1 diabetes (T1D). Two hundred eight trios were confirmed for T1D, with low p-values and risk combined genotypes with high odds ratio. The SNPs that form those trios are all in the MHC region, which is known to be strongly associated to T1D, except by one SNP in chromosome five that has not been previously associated with T1D.
|
78 |
Estudo da associação de genes de pigmentação com cor da pele, cabelo e olhos para fenotipagem forense em amostra brasileira / Association study of pigmentation genes with skin, hair and eyes color for forensic phenotyping purposes in Brazilian sampleLima, Felícia de Araujo 04 May 2017 (has links)
A pigmentação humana é uma característica variável e complexa determinada por fatores genéticos e hormonais, exposição à radiação ultravioleta, idade, doenças, entre outros. Alguns polimorfismos em genes de pigmentação têm sido associados com a diversidade fenotípica de cor da pele, cabelo e olhos e em populações homogêneas. A técnica denominada Fenotipagem Forense pelo DNA (FDP) vem beneficiando a ciência forense em vários países e auxiliando investigações criminais por ser capaz de sugerir, com boa precisão, os possíveis fenótipos para as características externamente visíveis (EVCs) em amostras de origem desconhecida. No presente trabalho foram avaliadas as associações entre os SNPs presentes nos genes SLC24A5 (rs1426654; rs16960620; rs2555364), TYR (rs1126809) e ASIP (rs6058017) com cor de pele, cabelo e olhos em indivíduos da população brasileira para apontar o possível uso desses marcadores na prática forense em populações miscigenadas. Os voluntários responderam um questionário no qual fizeram a autodeclaração dessas características e estes dados foram usados para as comparações entre genótipos e fenótipos. Os resultados mostraram que para os SNPs rs2555364 e rs1426654 o alelo ancestral esteve associado com as características cor de pele negra, cabelos castanhos ou pretos e olhos castanhos. Além disso, o alelo ancestral do SNP rs6058017 foi significativamente associado com cor de pele negra e olhos castanhos. Inversamente, os alelos variantes destes SNPs são correlacionados com características de pigmentação clara para as EVCs avaliadas, corroborando os estudos prévios realizados em diferentes populações. Esses resultados mostram que a informação molecular pode ser útil para a inferência de EVCs, e a técnica de FDP é uma importante ferramenta para estudos forenses em amostra brasileira / Human pigmentation is a variable and complex trait determined by genetic and hormonal factors, exposure to ultraviolet radiation, age, diseases, among others. Some polymorphisms in pigmentation genes have been associated with the phenotypic diversity of skin, hair and eyes color in homogeneous populations. Forensic DNA Phenotyping (FDP) is benefiting forensic science in several countries, helping in criminal investigations due to its ability to suggest, with good accuracy, the possible phenotypes for externally visible characteristics (EVCs) in samples of unknown origin. Herein, we evaluated the associations between the SNPs present in the genes SLC24A5 (rs1426654; rs16960620; rs2555364), TYR (rs1126809) and ASIP (rs6058017) with skin, hair and eyes color in individuals of the Brazilian population in order to point out the possible use of these markers in forensic practice in admixed populations. The volunteers answered a questionnaire in which they self reported these characteristics for comparison between genotypes and phenotypes. The results showed that for the SNPs rs2555364 and rs1426654 the ancestral allele was associated with characteristics of black skin color, brown or black hair and brown eyes. In addition, the ancestral allele of the SNP rs6058017 was significantly associated with black skin color and brown eyes. Inversely, the variant alleles of these SNPs are correlated with fair pigmentation characteristics for the evaluated EVCs, corroborating the previous studies performed in different populations. These results show that molecular information may be useful for the inference of EVCs, and the FDP technique is an important tool for forensic studies in a Brazilian sample
|
79 |
Analyse génétique et écophysiologique de la tolérance à la sècheresse et au stress thermique chez le blé tendre (T. Aestivum L.) / Genetic and ecophysiological analyses of tolerance to drought and high temperature in bread wheat (Triticum aestivum L.)Touzy, Gaëtan 07 May 2019 (has links)
Dans un contexte de changement climatique, la caractérisation des variétés de blé tendre en réponse à des évènements de sécheresse et de stress thermique est un des défis de l’agriculture. Cette thèse, issue d’un partenariat -public entre Arvalis-Institut du Végétal, Biogemma et l’INRA (Institut National de la Recherche Agronomique), avait pour but de développer des connaissances et des outils nécessaires à l’identification de variétés tolérantes à la sécheresse et au stress thermique et à la création de variétés répondant à cette exigence. Pour ce faire, nous avons analysé un panel de 220 variétés commerciales, génotypées avec 280K SNP et testées dans 35 environnements variés (combinaison d’année, lieu et régime hydrique), plus une expérimentation en conditions contrôlées où un stress thermique a été appliqué pendant le remplissage du grain. La complexité de l’étude de la tolérance à la sécheresse nous a conduit à présenter cette thèse en séparant, dans un premier temps, l’étude des stress hydriques et thermiques, puis de prospecter une méthode d’analyse multi-stress. Nous avons montré que même si la sélection a amélioré la performance des variétés en condition hydrique optimale, le progrès génétique doit être accéléré et mieux réparti en fonction des différents types de stress. Nous proposons pour cela plusieurs déterminants génétiques qui pourraient permettre un gain dans des environnements stressants. Nos résultats et méthodes sont discutés au regard des besoins en préconisation et amélioration variétale. Des pistes de recherche complémentaires et des améliorations ont aussi été suggérées. / In a context of climate change, the characterization of wheat varieties in response to drought and heat stress events is one of the major challenges of agriculture. This PhD thesis, resulting from a private-public partnership between Arvalis ‘Institut du Végétal’, Biogemma and INRA (“Institut National de la Recherche Agronomique”), aimed at providing necessary knowledge and tools to identify drought or heat-tolerant varieties and breed for varieties that meet these requirements. Analyses were conducted using a panel of 220 commercial varieties, genotyped with 280K SNP and tested in 35 environments (combination of year, location and water regime) and an experiment under controlled conditions where heat stress was applied during grain filling. The complexity of the study of drought and heat tolerance led us to present this thesis by first separating hydric and thermal stresses, and then to explore a multi-stress analysis method. Even if breeding has improved the performance of varieties under optimal water conditions, we showed that genetic progress must be accelerated and better distributed according to different stress scenarios. We propose several genetic determinants that could allow genetic gain in stressful environments. Our results and methods are discussed in view of the needs for varietal recommendation and improvement. Additional research strategies and methods improvements were also suggested.
|
80 |
Characterization of Gene Interaction and Assessment of Ld Matrix Measures for the Analysis of Biological Pathway AssociationCrosslin, David Russell January 2009 (has links)
<p>Leukotrienes are arachidonic acid derivatives long known for their inflammatory properties and their involvement with a number of human diseases, most notably asthma. Recently, leukotriene-based inflammation has also been implicated in atherosclerosis: ALOX5AP and LTA4H, two genes in the leukotriene biosynthesis pathway, have been associated with various cardiovascular disease (CVD) phenotypes. To assess the role of the leukotriene pathway in CVD pathogenesis, we performed genetic association studies of ALOX5AP and LTA4H in a non-familial data set of early onset coronary artery disease. Our results support a modest role for the leukotriene pathway in atherosclerosis pathogenesis, reveal important genomic interactions within the pathway, and suggest the importance of using pathway-based modeling for evaluating the genomics of atherosclerosis susceptibility. Motivated by this need, we investigated the statistical properties of a class of matrix-based statistics to assess epistasis. We simulated multiple two-variant disease models with haplotypes to gain an understanding of pathway interactions in terms of correlation patterns. Our goal was to detect an interaction between multiple disease-causing variants by means of their linkage disequlibrium (LD) patterns with other haplotype markers. The simulated models can be summarized into three categories: 1. No epistasis in the presence of marginal effects and LD; 2. Epistasis in the presence of LD and no marginal effects; and 3. Epistasis in the presence marginal effects and LD. We then assessed previously introduced single-gene methods that compare whole matrices of Single Nucleotide Polymorphism (SNP) LD between two samples. These methods include comparing two sets of principal components, a sum-of-squared-differences comparing pairwise LD, and a contrast test that controls for background LD. We also considered a partial least-square (PLS) approach for modeling gene-gene interactions. Our results indicate that these measures can be used to assess epistasis as well as marginal effects under certain disease models. Understanding and quantifying whole-gene variation and association to disease using multiple SNPs remains a difficult task. Providing a single statistical measure per gene will facilitate combining multiple types of genomic data at a gene-level and will serve as an alternative approach to assess epistasis in genome-wide association studies. The matrix-based measures can also be used in pathway ascertainment tools that require scores on a gene-level.</p> / Dissertation
|
Page generated in 0.1045 seconds