Global ETD Search

Return to search

Bayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene Expression

We are concerned with testing for differential expression and consider three different
aspects of such testing procedures. First, we develop an exact ANOVA type
model for discrete gene expression data, produced by technologies such as a Massively
Parallel Signature Sequencing (MPSS), Serial Analysis of Gene Expression (SAGE)
or other next generation sequencing technologies. We adopt two Bayesian hierarchical
models—one parametric and the other semiparametric with a Dirichlet process
prior that has the ability to borrow strength across related signatures, where a signature
is a specific arrangement of the nucleotides. We utilize the discreteness of the
Dirichlet process prior to cluster signatures that exhibit similar differential expression
profiles. Tests for differential expression are carried out using non-parametric
approaches, while controlling the false discovery rate. Next, we consider ways to
combine expression data from different studies, possibly produced by different technologies
resulting in mixed type responses, such as Microarrays and MPSS. Depending
on the technology, the expression data can be continuous or discrete and can have different
technology dependent noise characteristics. Adding to the difficulty, genes can
have an arbitrary correlation structure both within and across studies. Performing
several hypothesis tests for differential expression could also lead to false discoveries.
We propose to address all the above challenges using a Hierarchical Dirichlet process
with a spike-and-slab base prior on the random effects, while smoothing splines model the unknown link functions that map different technology dependent manifestations
to latent processes upon which inference is based. Finally, we propose an algorithm
for controlling different error measures in a Bayesian multiple testing under generic
loss functions, including the widely used uniform loss function. We do not make
any specific assumptions about the underlying probability model but require that
indicator variables for the individual hypotheses are available as a component of the
inference. Given this information, we recast multiple hypothesis testing as a combinatorial
optimization problem and in particular, the 0-1 knapsack problem which
can be solved efficiently using a variety of algorithms, both approximate and exact in
nature.

http://hdl.handle.net/1969.1/ETD-TAMU-2010-12-8659

Bayesian Models

Generalized linear models

Semiparametric models

Dirichlet process

Meta-analysis

Multiple hypothesis testing

Bioinformatics

Identifer	oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-2010-12-8659
Date	2010 December 1900
Creators	Dhavala, Soma Sekhar
Contributors	Mallick, Bani K.
Source Sets	Texas A and M University
Language	en_US
Detected Language	English
Type	thesis, text
Format	application/pdf

Page generated in 0.0018 seconds

Bayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene Expression

Description

Links & Downloads

Tags

Additional Fields