Return to search

Power Analysis in Applied Linear Regression for Cell Type-Specific Differential Expression Detection

The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from any tissues suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. However, this variability may actually be advantageous, as heterogeneous gene expression measurements coupled with cell counts may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression. Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect the performance of linear regression. In this dissertation we specifically address the parameter space involved in the most rigorous use of linear regression to estimate cell type-specific differential expression and report under which conditions significant detection is probable. We define parameters affecting the sensitivity of cell type-specific differential expression estimation as follows: sample size, cell type-specific proportion variability, mean squared error (spread of observations around linear regression line), conditioning of the cell proportions predictor matrix, and the size of actual cell type-specific differential expression. Each parameter, with the exception of cell type-specific differential expression (effect size), affects the variability of cell type-specific differential expression estimates. We have developed a power-analysis approach to cell type by cell type and genomic site by site differential expression detection which relies upon Welch’s two-sample t-test and factors in differences in cell type-specific expression estimate variability and reduces false discovery. To this end we have published an R package, LRCDE, available in GitHub (http://www.github.com/ERGlass/lrcde.dev) which outputs observed statistics of cell type-specific differential expression, including two-sample t- statistic, t-statistic p-value, and power calculated from two-sample t-statistic on a genomic site- by-site basis.

Identiferoai:union.ndltd.org:vcu.edu/oai:scholarscompass.vcu.edu:etd-5571
Date01 January 2016
CreatorsGlass, Edmund
PublisherVCU Scholars Compass
Source SetsVirginia Commonwealth University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations
Rights© The Author

Page generated in 0.0111 seconds