Spelling suggestions: "subject:"biology, genetics|estatistics"" "subject:"biology, genetics|cstatistics""
1 |
Shrinkage of dispersion parameters in the double exponential family of distributions, with applications to genomic sequencingRuddy, Sean Matthew 27 March 2015 (has links)
<p> The prevalence of sequencing experiments in genomics has led to an increased use of methods for count data in analyzing high-throughput genomic data to perform analyses. The importance of shrinkage methods in improving the performance of statistical methods remains. A common example is that of gene expression data, where the counts per gene are often modeled as some form of an overdispersed Poisson. In this case, shrinkage estimates of the per-gene dispersion parameter have lead to improved estimation of dispersion in the case of a small number of samples. We address a different count setting introduced by the use of sequencing data: comparing differential proportional usage via an overdispersed binomial model. Such a model can be useful for testing differential exon inclusion in mRNA-Seq experiments in addition to the typical differential gene expression analysis. In this setting, there are fewer such shrinkage methods for the dispersion parameter. We introduce a novel method that is developed by modeling the dispersion based on the double exponential family of distributions proposed by Efron (1986), also known as the exponential dispersion model (Jorgensen, 1987). Our methods (WEB-Seq and DEB-Seq) are empirical bayes strategies for producing a shrunken estimate of dispersion that can be applied to any double exponential dispersion family, though we focus on the binomial and poisson. These methods effectively detect differential proportional usage, and have close ties to the weighted likelihood strategy of edgeR developed for gene expression data (Robinson and Smyth, 2007; Robinson <i>et al.,</i> 2010). We analyze their behavior on simulated data sets as well as real data for both differential exon usage and differential gene expression. In the exon usage case, we will demonstrate our methods' superior ability to control the FDR and detect truly different features compared to existing methods. In the gene expression setting, our methods fail to control the FDR; however, the rankings of the genes by p-value is among the top performers and proves to be robust to both changes in the probability distribution used to generate the counts and in low sample size situations. We provide implementation of our methods in the R package DoubleExpSeq available from the Comprehensive R Archive Network (CRAN).</p>
|
2 |
Polygenic analysis of genome-wide SNP dataSimonson, Matthew A. 28 June 2013 (has links)
<p> One of the central motivators behind genetic research is to understand how genetic variation relates to human health and disease. Recently, there has been a large-scale effort to find common genetic variants associated with many forms of disease and disorder using single nucleotide polymorphisms (SNPs). Several genome-wide association (GWAS) studies have successfully identified SNPs associated with phenotypes. However, the effect sizes attributed to individual variants is generally small, explaining only a very small amount of the genetic risk and heritability expected based on the estimates of family and twin studies. Several explanations exist for the inability of GWAS to find the "missing heritability." </p><p> The results of recent research appear to confirm the prediction made by population genetics theory that most complex phenotypes are highly polygenic, occasionally influenced by a few alleles of relatively large effect, and usually by several of small effect. Studies have also confirmed that common variants are only part of what contributes to the total genetic variance for most traits, indicating rare-variants may play a significant role. </p><p> This research addresses some of the most glaring weaknesses of the traditional GWAS approach through the application of methods of polygenic analysis. We apply several methods, including those that investigate the net effects of large sets of SNPs, more sophisticated approaches informed by biology rather than the purely statistical approach of GWAS, as well as methods that infer the effects of recessive rare variants. </p><p> Our results indicate that traditional GWAS is well complemented and improved upon by methods of polygenic analysis. We demonstrate that polygenic approaches can be used to significantly predict individual risk for disease, provide an unbiased estimate of a substantial proportion of the heritability for multiple phenotypes, identify sets of genes grouped into biological pathways that are enriched for associations, and finally, detect the significant influence of recessive rare variants.</p>
|
3 |
Module-Based Analysis for "Omics" DataWang, Zhi 24 March 2015 (has links)
<p> This thesis focuses on methodologies and applications of module-based analysis (MBA) in omics studies to investigate the relationships of phenotypes and biomarkers, e.g., SNPs, genes, and metabolites. As an alternative to traditional single–biomarker approaches, MBA may increase the detectability and reproducibility of results because biomarkers tend to have moderate individual effects but significant aggregate effect; it may improve the interpretability of findings and facilitate the construction of follow-up biological hypotheses because MBA assesses biomarker effects in a functional context, e.g., pathways and biological processes. Finally, for exploratory “omics” studies, which usually begin with a full scan of a long list of candidate biomarkers, MBA provides a natural way to reduce the total number of tests, and hence relax the multiple-testing burdens and improve power.</p><p> The first MBA project focuses on genetic association analysis that assesses the main and interaction effects for sets of genetic (G) and environmental (E) factors rather than for individual factors. We develop a kernel machine regression approach to evaluate the complete effect profile (i.e., the G, E, and G-by-E interaction effects separately or in combination) and construct a kernel function for the Gene-Environmental (GE) interaction directly from the genetic kernel and the environmental kernel. We use simulation studies and real data applications to show improved performance of the Kernel Machine (KM) regression method over the commonly adapted PC regression methods across a wide range of scenarios. The largest gain in power occurs when the underlying effect structure is involved complex GE interactions, suggesting that the proposed method could be a useful and powerful tool for performing exploratory or confirmatory analyses in GxE-GWAS.</p><p> In the second MBA project, we extend the kernel machine framework developed in the first project to model biomarkers with network structure. Network summarizes the functional interplay among biological units; incorporating network information can more precisely model the biological effects, enhance the ability to detect true signals, and facilitate our understanding of the underlying biological mechanisms. In the work, we develop two kernel functions to capture different network structure information. Through simulations and metabolomics study, we show that the proposed network-based methods can have markedly improved power over the approaches ignoring network information.</p><p> Metabolites are the end products of cellular processes and reflect the ultimate responses of biology system to genetic variations or environment exposures. Because of the unique properties of metabolites, pharmcometabolomics aims to understand the underlying signatures that contribute to individual variations in drug responses and identify biomarkers that can be helpful to response predictions. To facilitate mining pharmcometabolomic data, we establish an MBA pipeline that has great practical value in detection and interpretation of signatures, which may potentially indicate a functional basis for the drug response. We illustrate the utilities of the pipeline by investigating two scientific questions in aspirin study: (1) which metabolites changes can be attributed to aspirin intake, and (2) what are the metabolic signatures that can be helpful in predicting aspirin resistance. Results show that the MBA pipeline enables us to identify metabolic signatures that are not found in preliminary single-metabolites analysis.</p>
|
Page generated in 0.1215 seconds