Spelling suggestions: "subject:"genomewide association"" "subject:"genomewide asssociation""
31 |
An exploration of BMSF algorithm in genome-wide association mappingJiang, Dayou January 1900 (has links)
Master of Science / Department of Statistics / Haiyan Wang / Motivation: Genome-wide association studies (GWAS) provide an important avenue for investigating many common genetic variants in different individuals to see if any variant is associated with a trait. GWAS is a great tool to identify genetic factors that influence health and disease. However, the high dimensionality of the gene expression dataset makes GWAS challenging. Although a lot of promising machine learning methods, such as Support Vector Machine (SVM), have been investigated in GWAS, the question of how to improve the accuracy of the result has drawn increased attention of many researchers A lot of the studies did not apply feature selection to select a parsimonious set of relevant genes. For those that performed gene selections, they often failed to consider the possible interactions among genes. Here we modify a gene selection algorithm BMSF originally developed by Zhang et al. (2012) for improving the accuracy of cancer classification with binary responses. A continuous response version of BMSF algorithm is provided in this report so that it can be applied to perform gene selection for continuous gene expression dataset. The algorithm dramatically reduces the dimension of the gene markers under concern, thus increases the efficiency and accuracy of GWAS.
Results: We applied the continuous response version of BMSF on the wheat phenotypes dataset to predict two quantitative traits based on the genotype marker data. This wheat dataset was previously studied in Long et al. (2009) for the same purpose but used only direct application of SVM regression methods. By applying our gene selection method, we filtered out a large portion of genes which are less relevant and achieved a better prediction result for the test data by building SVM regression model using only selected genes on the training data. We also applied our algorithm on simulated datasets which was generated following the setting of an example in Fan et al. (2011). The continuous response version of BMSF showed good ability to identify active variables hidden among high dimensional irrelevant variables. In comparison to the smoothing based methods in Fan et al. (2011), our method has the advantage of no ambiguity due to difference choices of the smoothing parameter.
|
32 |
Strategies to improve results from genomic analyzes in small dairy cattle populations / Estratégias para aprimorar os resultados de análises genômicas em pequenas populações de gado de leitePerez, Bruno da Costa 12 February 2019 (has links)
The main objective of the present thesis was to propose a procedure to optimize genotypic information value in small dairy cattle populations and investigate the impacts of including genotypes and phenotypes of cows chosen by different strategies over the performance of genome-wide association studies and genomic selection. The first study was designed to propose innovative methods that could support alternative inference over population structure in livestock populations using graph theory. It reviews general aspects of graphs and how each element relates to theoretical and practical concepts of traditional pedigree structure studies. This chapter also presents a computational application (PedWorks) built in Python 2.7 programming language. It demonstrates that graph theory is a suitable framework for modeling pedigree data. The second study was aimed asses how graph community detection algorithms could help unraveling population partition. This new concept was considered to develop a method for stablishing new cow genotyping strategies (community-based). Results obtained showed that accounting for population structure using community detection for choosing cows to get included in the reference population may improve results from genomic selection. Methods presented are easily applied to animal breeding programs. The third study aimed to observe the impacts of different genotyping strategies (including the proposed community-based) over the ability to detect quantitative trait loci in genome-wide association studies. Distinct models for genomic analysis were also tested. Results obtained showed that including cows with extreme phenotypic observations proportionally sampled from communities can improve the ability to detect quantitative trait loci in genomic evaluations. The last chapter was designed study possible deleterious impacts of the presence of preferential treatment (in different levels) in a small dairy cattle population environment over accuracy and bias of genomic selection. Different proportions of cows with artificially increased phenotypic observations were included in the reference population. Observed results suggest that both accuracy and bias are affected by the presence of preferential treatment of cows in the evaluated population. Preferential treatment is expected to have much more effect on the performance of genomic selection in small than in large dairy cattle populations for the higher (proportional) value of the information from cows in such reduced-size breeds. / O principal objetivo da presente tese foi propor um procedimento capaz de otimizar o valor da informação genotípica em pequenas populações de gado de leite e investigar os impactos da inclusão de genótipos e fenótipos de vacas escolhidas por diferentes estratégias sobre o desempenho de estudos de associação genômica ampla e seleção genômica. O primeiro estudo foi delineado para elaborar um método que permita uma inferência alternativa sobre a estrutura populacional de populações de animais de produção usando como base a teoria de grafos. Este revê os aspectos gerais de grafos e como cada elemento se relaciona com conceitos teóricos e práticos de estudos de estrutura de pedigree tradicionais. Este capítulo também apresenta um aplicativo computacional (PedWorks) construído em linguagem de programação Python 2.7. Resultados observados demonstraram que a teoria de grafos é uma estrutura adequada para modelar dados de pedigree. O segundo estudo teve como objetivo avaliar como os algoritmos de detecção de comunidades de grafos poderiam ajudar revelar o particionamento de uma população. Este novo conceito foi considerado para desenvolver um método para o estabelecimento de novas estratégias de genotipagem de vacas (baseadas em comunidades). Os resultados obtidos mostraram que a contabilização da estrutura populacional usando a detecção de comunidades para a escolha de vacas a serem incluídas na população de referência pode melhorar os resultados da seleção genômica. Os métodos apresentados sugerem ser facilmente introduzidos em programas de melhoramento animal. O terceiro estudo teve como objetivo observar os impactos de diferentes estratégias de genotipagem (incluindo a anteriormente proposta baseada em comunidades) sobre a capacidade de detectar locos relacionados características quantitativas por meio de estudos de associação genômica ampla. Modelos distintos para análise genômica também foram testados. Os resultados obtidos mostraram que incluir vacas com observações fenotípicas extremas amostradas proporcionalmente das comunidades pode melhorar a capacidade de detectar locos de características quantitativas em avaliações genômicas. O último capítulo foi desenhado para estudar possíveis impactos deletérios da presença de tratamento preferencial no ambiente de pequenas populações de gado leiteiro sobre resultados da seleção genômica. Diferentes proporções de vacas com observações fenotípicas aumentadas artificialmente foram incluídas na população de referência. Os resultados observados sugerem que tanto a acurácia quanto o viés são afetados pela presença de tratamento preferencial de vacas na população avaliada. Espera-se que o tratamento preferencial tenha muito mais efeito sobre o desempenho da seleção genômica em populações pequenas de gado de leite que em grandes populações devido a maior relevância das informações de vacas em raças de tamanho reduzido.
|
33 |
Application of genomic technologies to the horseCorbin, Laura Jayne January 2013 (has links)
The publication of a draft equine genome sequence and the release by Illumina of a 50,000 marker single-nucleotide polymorphism (SNP) genotyping chip has provided equine researchers with the opportunity to use new approaches to study the relationships between genotype and phenotype. In particular, it is hoped that the use of high-density markers applied to population samples will enable progress to be made with regard to more complex diseases. The first objective of this thesis is to explore the potential for the equine SNP chip to enable such studies to be performed in the horse. The second objective is to investigate the genetic background of osteochondrosis (OC) in the horse. These objectives have been tackled using 348 Thoroughbreds from the US, divided into cases and controls, and a further 836 UK Thoroughbreds, the majority with no phenotype data. All horses had been genotyped with the Illumina Equine SNP50 BeadChip. Linkage disequilibrium (LD) is the non-random association of alleles at neighbouring loci. The reliance of many genomic methodologies on LD between neutral markers and causal variants makes it an important characteristic of genome structure. In this thesis, the genomic data has been used to study the extent of LD in the Thoroughbred and the results considered in terms of genome coverage. Results suggest that the SNP chip offers good coverage of the genome. Published theoretical relationships between LD and historical effective population size (Ne) were exploited to enable accuracy predictions for genome-wide evaluation (GWE) to be made. A subsequent in-depth exploration of this theory cast some doubt on the reliability of this approach in the estimation of Ne, but the general conclusion that the Thoroughbred population has a small Ne which should enable GWE to be carried out efficiently in this population, remains valid. In the course of these studies, possible errors embedded within the current sequence assembly were identified using empirical approaches. Osteochondrosis is a developmental orthopaedic disease which affects the joints of young horses. Osteochondrosis is considered multifactorial in origin with a variety of environmental factors and heredity having been implicated. In this thesis, a genome-wide association study was carried out to identify quantitative trait loci (QTL) associated with OC. A single SNP was found to be significantly associated with OC. The low heritability of OC combined with the apparent lack of major QTL suggests GWE as an alternative approach to tackle this disease. A GWE analysis was carried out on the same dataset but the resulting genomic breeding values had no predictive ability for OC status. This, combined with the small number of significant QTL, indicates a lack of power which could be addressed in the future by increasing sample size. An alternative to genotyping more horses for the 50K SNP chip would be to use a low-density SNP panel and impute remaining genotypes. The final chapter of this thesis examines the feasibility of this approach in the Thoroughbred. Results suggest that genotyping only a subset of samples at high density and the remainder at lower density could be an effective strategy to enable greater progress to be made in the arena of equine genomics. Finally, this thesis provides an outlook on the future for genomics in the horse.
|
34 |
A Novel Locus for Body Mass Index on 5p15.2: A Meta-Analysis of Two Genome-Wide Association StudiesWang, Ke-Sheng, Liu, Xuefeng, Zheng, Shimin, Zeng, Min, Pan, Yue, Callahan, Katie 25 May 2012 (has links)
Objective
Genetic factors play an important role in modulating the vulnerability to body mass index (BMI). The purpose of this study is to identify novel genetic variants for BMI using genome-wide association (GWA) meta-analysis.
Methods
PLINK software was used to perform meta-analysis of two GWA studies (the FUSION and Marshfield samples) of 5218 Caucasian individuals with BMI. A replication study was conducted using the SAGE sample with 762 individuals.
Results
Through meta-analysis we identified 33 SNPs associated with BMI with p < 10− 4. The most significant association was observed with rs2967951 (p = 1.19 × 10− 6) at 5p15.2 within ROPN1L gene. Two additional SNPs within ROPN1L and 5 SNPs within MARCH6 (the top SNP was rs2607292 with 4.27 × 10− 6) further supported the association with BMI on 5p15.2 (p < 1.8 × 10− 5). Conditional analysis on 5p15.2 could not distinguish the effects of ROPN1L and MARCH6. Several SNPs within MARCH6 and ROPN1L were replicated in the SAGE sample (p < 0.05).
Conclusion
We identified a novel locus for BMI. These findings offer the potential for new insights into the pathogenesis of BMI and obesity and will serve as a resource for replication in other populations to elucidate the potential role of these genetic variants in BMI and obesity.
|
35 |
Functional Analysis of the Ovarian Cancer Susceptibility Locus at 9p22.2 Reveals a Transcription Regulatory Network Mediated by BNC2 in Ovarian CellsBuckley, Melissa 01 January 2015 (has links)
GWAS have identified several chromosomal loci associated with ovarian cancer risk. However, the mechanism underlying these associations remains elusive. We identify candidate functional Single Nucleotide Polymorphisms (SNPs) at the 9p22.2 ovarian cancer susceptibility locus, several of which map to transcriptional regulatory elements active in ovarian cells identified by FAIRE-seq (Formaldehyde assisted isolation of regulatory elements followed by sequencing) and ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) in relevant cell types. Reporter and electrophoretic mobility shift assays (EMSA) determined the extent to which candidate SNPs had allele specific effects. Chromosome conformation capture (3C) reveals a physical association between Basonuclin 2 (BNC2) and SNPs with functional properties. This establishes BNC2 as a major target of four candidate functional SNPs in at least two distinct elements.
BNC2 codes for a putative transcription regulator containing three pairs of zinc finger (ZF) domains. Furthermore, bnc2 mutation in zebrafish leads to developmental defects including dysmorphic ovaries and sterility, clearly implicating this protein in cellular processes associated with ovarian development. We show that BNC2 is a transcriptional regulator with a specific DNA recognition sequence of targets enriched in genes involved in cell communication through DNA binding assays, ChIP-seq, and expression analysis.
This study reveals a comprehensive regulatory landscape at the 9p22.2 locus and indicates that a likely mechanism of susceptibility to ovarian cancer may include multiple allele-specific changes in DNA regulatory elements some of which alter BNC2 expression. This study begins to identify the underlying mechanisms of the 9p22.2 locus association with ovarian cancer and aims to provide data to support advances in care based on one’s genetic composition.
|
36 |
SNP-set Tests for Sequencing and Genome-Wide Association StudiesBarnett, Ian 06 June 2014 (has links)
In this dissertation we propose methodology for testing SNP-sets for genetic associations, both for sequencing and genome-wide association studies. Due to the large scale of this kind of data, there is an emphasis on producing methodology that is not only accurate and powerful, but also computationally efficient.
|
37 |
Imputing Genotypes Using Regularized Generalized Linear Regression ModelsGriesman, Joshua 14 June 2012 (has links)
As genomic sequencing technologies continue to advance, researchers are furthering their understanding of the relationships between genetic variants and expressed traits (Hirschhorn and Daly, 2005). However, missing data can significantly limit the power of a genetic study. Here, the use of a regularized generalized linear model, denoted GLMNET is proposed to impute missing genotypes. The method aimed to address certain limitations of earlier regression approaches in regards to genotype imputation, particularly multicollinearity among predictors. The performance of GLMNET-based method is compared to the performance of the phase-based method fastPHASE. Two simulation settings were evaluated: a sparse-missing model, and a small-panel expan- sion model. The sparse-missing model simulated a scenario where SNPs were missing in a random fashion across the genome. In the small-panel expansion model, a set of test individuals that were only genotyped at a small subset of the SNPs of the large panel. Each imputation method was tested in the context of two data-sets: Canadian Holstein cattle data and human HapMap CEU data. Although the proposed method was able to perform with high accuracy (>90% in all simulations), fastPHASE per- formed with higher accuracy (>94%). However, the new method, which was coded in R, was able to impute genotypes with better time efficiency than fastPHASE and this could be further improved by optimizing in a compiled language.
|
38 |
Exploiting Historical Data and Diverse Germplasm to Increase Maize Grain Yield in TexasBarrero Farfan, Ivan D. 16 December 2013 (has links)
The U.S. is the largest maize producer in the world with a production of 300 million tons in 2012. Approximately 86% of the maize production is focused on the Midwestern states. The rest of the production is focused in the Southern states, where Texas is the largest maize producer. Grain yield in Texas ranges from 18 tons/ha in the irrigated production zones to 3 tons/ha in the dryland production zones. As a result, grain yield has increased slowly because of the poor production in the non-irrigated acres. Methods to improve the grain yield in Texas is to breed for maize varieties adapted to Texas growing conditions, including mapping genes that can be incorporated into germplasm through marker assisted selection. This dissertation includes two separate projects that exploit historical data and maize diversity to increase grain yield in Texas.
For the first project, a large dataset collected by Texas AgriLife program was analyzed to elucidate past trends and future hints on how to improve maize yield within Texas. This study confirmed previous reports that the rate of increase for grain yield in Texas is less than the rate observed in the Midwestern US.
For the second project, a candidate gene and whole genome association mapping analysis was performed for drought and aflatoxin resistance in maize. In order to do so, maize inbred lines from a diversity panel were testcrossed to isogenic versions of Tx714. The hybrids were evaluated under irrigated and non-irrigated conditions. The irrigated trials were inoculated with Aspergillus flavus and the aflatoxin level was quantified. This study found that the gene ZmLOX4 was associated with days to silk, and the gene ZmLOX5 gene was associated with plant and ear height. In addition, this study identified 13 QTL variants for grain yield, plant height, days to anthesis and days to silk. Furthermore, this study shows that diverse maize inbred lines can make hybrids that out yield commercial hybrids under heat and drought stress. Therefore, there are useful genes present in these diverse lines that can be exploited in maize breeding programs
|
39 |
Estimating the Overlap of Top Instances in Lists Ranked by Correlation to LabelDamavandi, Babak Unknown Date
No description available.
|
40 |
EQUINE PROTOZOAL MYELOENCEPHALITIS: INVESTIGATION OF GENETIC SUSCEPTIBILITY AND ASSESSMENT OF AN EQUINE INFECTION METHODGaubatz, Breanna M. 01 January 2013 (has links)
Equine protozoal myeloencephalitis (EPM) is a progressive neurological disease of horses caused by Sarcocystis neurona. Two projects were conducted to identify factors involved in the development of EPM. The first study explored a possible genetic susceptibility to EPM by attempting a genome-wide association study (GWAS) on formalin-fixed, paraffin-embedded (FFPE) tissue from 24 definitively-positive EPM horses. DNA extracted from tissues older than 14 months was inadequate for SNP analysis on the Illumina Equine SNP50 BeadChip probably due to degradation and formalin cross-linking. Results were inconclusive as analysis was not possible with the small sample set. The second study evaluated an artificial infection method in creating a reliable equine EPM model. Five horses were injected intravenously at 4 time points with autologous blood incubated with 1,000,000S. neurona merozoites. Challenged horses progressively developed mild to moderate clinical signs and had detectable S. neurona serum antibodies on day 42 post challenge. Horses appeared to have produced a Th1 immune response and cleared the infection by the conclusion of the study on day 89. No histopathological evidence of S. neurona infection was found within central nervous system tissue. This artificial infection method was not effective in replicating the severe clinical EPM seen in natural infections.
|
Page generated in 0.0905 seconds