Return to search

Bayesian Model Selection for High-dimensional High-throughput Data

Bayesian methods are often criticized on the grounds of subjectivity. Furthermore, misspecified
priors can have a deleterious effect on Bayesian inference. Noting that model
selection is effectively a test of many hypotheses, Dr. Valen E. Johnson sought to eliminate
the need of prior specification by computing Bayes' factors from frequentist test statistics.
In his pioneering work that was published in the year 2005, Dr. Johnson proposed
using so-called local priors for computing Bayes? factors from test statistics. Dr. Johnson
and Dr. Jianhua Hu used Bayes' factors for model selection in a linear model setting. In
an independent work, Dr. Johnson and another colleage, David Rossell, investigated two
families of non-local priors for testing the regression parameter in a linear model setting.
These non-local priors enable greater separation between the theories of null and alternative
hypotheses.
In this dissertation, I extend model selection based on Bayes' factors and use nonlocal
priors to define Bayes' factors based on test statistics. With these priors, I have been
able to reduce the problem of prior specification to setting to just one scaling parameter.
That scaling parameter can be easily set, for example, on the basis of frequentist operating
characteristics of the corresponding Bayes' factors. Furthermore, the loss of information by basing a Bayes' factors on a test statistic is minimal.
Along with Dr. Johnson and Dr. Hu, I used the Bayes' factors based on the likelihood
ratio statistic to develop a method for clustering gene expression data. This method has
performed well in both simulated examples and real datasets. An outline of that work is
also included in this dissertation. Further, I extend the clustering model to a subclass of
the decomposable graphical model class, which is more appropriate for genotype data sets,
such as single-nucleotide polymorphism (SNP) data. Efficient FORTRAN programming has
enabled me to apply the methodology to hundreds of nodes.
For problems that produce computationally harder probability landscapes, I propose a
modification of the Markov chain Monte Carlo algorithm to extract information regarding
the important network structures in the data. This modified algorithm performs well in
inferring complex network structures. I use this method to develop a prediction model for
disease based on SNP data. My method performs well in cross-validation studies.

Identiferoai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-2010-05-7740
Date2010 May 1900
CreatorsJoshi, Adarsh
ContributorsJohnson, Valen E., Dahl, David B.
Source SetsTexas A and M University
Languageen_US
Detected LanguageEnglish
Typethesis, text
Formatapplication/pdf

Page generated in 0.0022 seconds