Return to search

Bayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene Expression

We are concerned with testing for differential expression and consider three different
aspects of such testing procedures. First, we develop an exact ANOVA type
model for discrete gene expression data, produced by technologies such as a Massively
Parallel Signature Sequencing (MPSS), Serial Analysis of Gene Expression (SAGE)
or other next generation sequencing technologies. We adopt two Bayesian hierarchical
models—one parametric and the other semiparametric with a Dirichlet process
prior that has the ability to borrow strength across related signatures, where a signature
is a specific arrangement of the nucleotides. We utilize the discreteness of the
Dirichlet process prior to cluster signatures that exhibit similar differential expression
profiles. Tests for differential expression are carried out using non-parametric
approaches, while controlling the false discovery rate. Next, we consider ways to
combine expression data from different studies, possibly produced by different technologies
resulting in mixed type responses, such as Microarrays and MPSS. Depending
on the technology, the expression data can be continuous or discrete and can have different
technology dependent noise characteristics. Adding to the difficulty, genes can
have an arbitrary correlation structure both within and across studies. Performing
several hypothesis tests for differential expression could also lead to false discoveries.
We propose to address all the above challenges using a Hierarchical Dirichlet process
with a spike-and-slab base prior on the random effects, while smoothing splines model the unknown link functions that map different technology dependent manifestations
to latent processes upon which inference is based. Finally, we propose an algorithm
for controlling different error measures in a Bayesian multiple testing under generic
loss functions, including the widely used uniform loss function. We do not make
any specific assumptions about the underlying probability model but require that
indicator variables for the individual hypotheses are available as a component of the
inference. Given this information, we recast multiple hypothesis testing as a combinatorial
optimization problem and in particular, the 0-1 knapsack problem which
can be solved efficiently using a variety of algorithms, both approximate and exact in
nature.

Identiferoai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-2010-12-8659
Date2010 December 1900
CreatorsDhavala, Soma Sekhar
ContributorsMallick, Bani K.
Source SetsTexas A and M University
Languageen_US
Detected LanguageEnglish
Typethesis, text
Formatapplication/pdf

Page generated in 0.0021 seconds