1 |
Bayesian analysis of binary and count data in two-arm trialsKpekpena, Cynthia 23 May 2014 (has links)
Binary and count data naturally arise in clinical trials in health sciences. We consider a Bayesian analysis of binary and count data arising from two-arm clinical trials for testing hypotheses of equivalence.
For each type of data, we discuss the development of likelihood, the prior and the posterior distributions of parameters of interest. For binary data, we also examine the suitability of a normal approximation to the posterior distribution obtained via a Taylor series expansion.
When the posterior distribution is complex and high-dimensional, the Bayesian inference is carried out using Markov Chain Monte Carlo (MCMC) methods. We also discuss a meta-analysis approach for data arising from two-arm trials with multiple studies. We assign a Dirichlet process prior for the study effects parameters for accounting heterogeneity among multiple studies. We illustrate the methods using actual data arising from several health studies.
|
2 |
Scalable Bayesian regression utilising marginal informationGray-Davies, Tristan Daniel January 2017 (has links)
This thesis explores approaches to regression that utilise the treatment of covariates as random variables. The distribution of covariates, along with the conditional regression model Y | X, define the joint model over (Y,X), and in particular, the marginal distribution of the response Y. This marginal distribution provides a vehicle for the incorporation of prior information, as well as external, marginal data. The marginal distribution of the response provides a means of parameterisation that can yield scalable inference, simple prior elicitation, and, in the case of survival analysis, the complete treatment of truncated data. In many cases, this information can be utilised without need to specify a model for X. Chapter 2 considers the application of Bayesian linear regression where large marginal datasets are available, but the collection of response and covariate data together is limited to a small dataset. These marginal datasets can be used to estimate the marginal means and variances of Y and X, which impose two constraints on the parameters of the linear regression model. We define a joint prior over covariate effects and the conditional variance σ<sup>2</sup> via a parameter transformation, which allows us to guarantee these marginal constraints are met. This provides a computationally efficient means of incorporating marginal information, useful when incorporation via the imputation of missing values may be implausible. The resulting prior and posterior have rich dependence structures that have a natural 'analysis of variance' interpretation, due to the constraint on the total marginal variance of Y. The concept of 'marginal coherence' is introduced, whereby competing models place the same prior on the marginal mean and variance of the response. Our marginally constrained prior can be extended by placing priors on the marginal variances, in order to perform variable selection in a marginally coherent fashion. Chapter 3 constructs a Bayesian nonparametric regression model parameterised in terms of FY , the marginal distribution of the response. This naturally allows the incorporation of marginal data, and provides a natural means of specifying a prior distribution for a regression model. The construction is such that the distribution of the ordering of the response, given covariates, takes the form of the Plackett-Luce model for ranks. This facilitates a natural composite likelihood approximation that decomposes the likelihood into a term for the marginal response data, and a term for the probability of the observed ranking. This can be viewed as a extension to the partial likelihood for proportional hazards models. This convenient form leads to simple approximate posterior inference, which circumvents the need to perform MCMC, allowing scalability to large datasets. We apply the model to a US Census dataset with over 1,300,000 data points and more than 100 covariates, where the nonparametric prior is able to capture the highly non-standard distribution of incomes. Chapter 4 explores the analysis of randomised clinical trial (RCT) data for subgroup analysis, where interest lies in the optimal allocation of treatment D(X), based on covariates. Standard analyses build a conditional model Y | X,T for the response, given treatment and covariates, which can be used to deduce the optimal treatment rule. We show that the treatment of covariates as random facilitates direct testing of a treatment rule, without the need to specify a conditional model. This provides a robust, efficient, and easy-to-use methodology for testing treatment rules. This nonparametric testing approach is used as a splitting criteria in a random-forest methodology for the exploratory analysis of subgroups. The model introduced in Chapter 3 is applied in the context of subgroup analysis, providing a Bayesian nonparametric analogue to this approach: where inference is based only on the order of the data, circumventing the requirement to specify a full data-generating model. Both approaches to subgroup analysis are applied to data from an AIDS Clinical Trial.
|
3 |
From genomes to post-processing of Bayesian inference of phylogenyAli, Raja Hashim January 2016 (has links)
Life is extremely complex and amazingly diverse; it has taken billions of years of evolution to attain the level of complexity we observe in nature now and ranges from single-celled prokaryotes to multi-cellular human beings. With availability of molecular sequence data, algorithms inferring homology and gene families have emerged and similarity in gene content between two genes has been the major signal utilized for homology inference. Recently there has been a significant rise in number of species with fully sequenced genome, which provides an opportunity to investigate and infer homologs with greater accuracy and in a more informed way. Phylogeny analysis explains the relationship between member genes of a gene family in a simple, graphical and plausible way using a tree representation. Bayesian phylogenetic inference is a probabilistic method used to infer gene phylogenies and posteriors of other evolutionary parameters. Markov chain Monte Carlo (MCMC) algorithm, in particular using Metropolis-Hastings sampling scheme, is the most commonly employed algorithm to determine evolutionary history of genes. There are many softwares available that process results from each MCMC run, and explore the parameter posterior but there is a need for interactive software that can analyse both discrete and real-valued parameters, and which has convergence assessment and burnin estimation diagnostics specifically designed for Bayesian phylogenetic inference. In this thesis, a synteny-aware approach for gene homology inference, called GenFamClust (GFC), is proposed that uses gene content and gene order conservation to infer homology. The feature which distinguishes GFC from earlier homology inference methods is that local synteny has been combined with gene similarity to infer homologs, without inferring homologous regions. GFC was validated for accuracy on a simulated dataset. Gene families were computed by applying clustering algorithms on homologs inferred from GFC, and compared for accuracy, dependence and similarity with gene families inferred from other popular gene family inference methods on a eukaryotic dataset. Gene families in fungi obtained from GFC were evaluated against pillars from Yeast Gene Order Browser. Genome-wide gene families for some eukaryotic species are computed using this approach. Another topic focused in this thesis is the processing of MCMC traces for Bayesian phylogenetics inference. We introduce a new software VMCMC which simplifies post-processing of MCMC traces. VMCMC can be used both as a GUI-based application and as a convenient command-line tool. VMCMC supports interactive exploration, is suitable for automated pipelines and can handle both real-valued and discrete parameters observed in a MCMC trace. We propose and implement joint burnin estimators that are specifically applicable to Bayesian phylogenetics inference. These methods have been compared for similarity with some other popular convergence diagnostics. We show that Bayesian phylogenetic inference and VMCMC can be applied to infer valuable evolutionary information for a biological case – the evolutionary history of FERM domain. / <p>QC 20160201</p>
|
4 |
Bayesian statistics and production reliability assessments for mining operationsSharma, Gaurav Kumar 05 1900 (has links)
This thesis presents a novel application of structural reliability concepts to assess the
reliability of mining operations. “Limit-states” are defined to obtain the probability that the
total productivity — measured in production time or economic gain — exceeds user-selected
thresholds. Focus is on the impact of equipment downtime and other non-operating instances
on the productivity and the economic costs of the operation. A comprehensive set of data
gathered at a real-world mining facility is utilized to calibrate the probabilistic models. In
particular, the utilization of Bayesian inference facilitates the inclusion of data — and
updating of the production probabilities — as they become available. The thesis includes a
detailed description of the Bayesian approach, as well as the limit-state-based reliability
methodology. A comprehensive numerical example demonstrates the methodology and the
usefulness of the probabilistic results.
|
5 |
Decision Support Systems: Diagnostics and Explanation methods : In the context of telecommunication networksLindberg, Martin January 2013 (has links)
This thesis work, conducted at Ericsson Software Research, aims to recommend a system setup for a tool to help troubleshooting personal at network operation centres (NOC) who monitors the telecom network. This thesis examines several different artificial intelligence algorithms resulting in the conclusion that Bayesian networks are suitable for the aimed system. Since the system will act as a decision support system it needs to be able to explain how recommendations have been developed. Hence a number of explanation methods have been examined. Unfortunately no satisfactory method was found and thus a new method was defined, modified explanation tree (MET) which visually illustrates the variables of most interest in a so called tree structure. The method was implementation and after some initial testing the method has gained some positive first feedback from stakeholders. Thus the final recommendation consists of a system based on a Bayesian model where the gathered training data is collected earlier from the domain. The users will thus obtain recommendations for the top ranked cases and afterwards get the option to get further explanation regarding the specific cause. The explanation aims to give the user situation awareness and help him/her in the final action to solve the problem.
|
6 |
Bayesian-lopa methodology for risk assessment of an LNG importation terminalYun, Geun-Woong 15 May 2009 (has links)
LNG (Liquefied Natural Gas) is one of the fastest growing energy sources in the
U.S. to fulfill the increasing energy demands. In order to meet the LNG demand, many
LNG facilities including LNG importation terminals are operating currently. Therefore,
it is important to estimate the potential risks in LNG terminals to ensure their safety.
One of the best ways to estimate the risk is LOPA (Layer of Protection Analysis)
because it can provide quantified risk results with less time and efforts than other
methods. For LOPA application, failure data are essential to compute risk frequencies.
However, the failure data from the LNG industry are very sparse. Bayesian estimation is
identified as one method to compensate for its weaknesses. It can update the generic data
with plant specific data.
Based on Bayesian estimation, the frequencies of initiating events were obtained
using a conjugate gamma prior distribution such as OREDA (Offshore Reliability Data)
database and Poisson likelihood distribution. If there is no prior information, Jeffreys
noninformative prior may be used. The LNG plant failure database was used as plant
specific likelihood information. The PFDs (Probability of Failure on Demand) of IPLs (Independent Protection
Layers) were estimated with the conjugate beta prior such as EIReDA (European
Industry Reliability Data Bank) database and binomial likelihood distribution. In some
cases EIReDA did not provide failure data, so the newly developed Frequency-PFD
conversion method was used instead. By the combination of Bayesian estimation and
LOPA procedures, the Bayesian-LOPA methodology was developed and was applied to
an LNG importation terminal. The found risk values were compared to the tolerable risk
criteria to make risk decisions. Finally, the risk values of seven incident scenarios were
compared to each other to make a risk ranking.
In conclusion, the newly developed Bayesian-LOPA methodology really does
work well in an LNG importation terminal and it can be applied in other industries
including refineries and petrochemicals. Moreover, it can be used with other frequency
analysis methods such as Fault Tree Analysis (FTA).
|
7 |
Bayesian prediction of modulus of elasticity of self consolidated concreteBhattacharjee, Chandan 15 May 2009 (has links)
Current models of the modulus of elasticity, E , of concrete recommended by the
American Concrete Institute (ACI) and the American Association of State Highway and
Transportation Officials (AASHTO) are derived only for normally vibrated concrete
(NVC). Because self consolidated concrete (SCC) mixtures used today differ from NVC
in the quantities and types of constituent materials, mineral additives, and chemical
admixtures, the current models may not take into consideration the complexity of SCC,
and thus they may predict the E of SCC inaccurately. Although some authors
recommend specific models to predict the E of SCC, they include only a single variable
of assumed importance, namely the compressive strength of concrete, c f ′ . However
there are other parameters that may need to be accounted for while developing a
prediction model for the E of SCC. In this research, a Bayesian variable selection
method is implemented to identify the significant parameters in predicting the E of SCC
and more accurate models for the E are generated using these variables. The models
have a parsimonious parameterization for ease of use in practice and properly account
for the prevailing uncertainties.
|
8 |
Bayesian learning in bioinformaticsGold, David L. 15 May 2009 (has links)
Life sciences research is advancing in breadth and scope, affecting many areas of life
including medical care and government policy. The field of Bioinformatics, in particular,
is growing very rapidly with the help of computer science, statistics, applied
mathematics, and engineering. New high-throughput technologies are making it possible
to measure genomic variation across phenotypes in organisms at costs that were
once inconceivable. In conjunction, and partly as a consequence, massive amounts
of information about the genomes of many organisms are becoming accessible in the
public domain. Some of the important and exciting questions in the post-genomics
era are how to integrate all of the information available from diverse sources.
Learning in complex systems biology requires that information be shared in a natural
and interpretable way, to integrate knowledge and data. The statistical sciences can
support the advancement of learning in Bioinformatics in many ways, not the least
of which is by developing methodologies that can support the synchronization of efforts
across sciences, offering real-time learning tools that can be shared across many
fields from basic science to the clinical applications. This research is an introduction
to several current research problems in Bioinformatics that addresses integration
of information, and discusses statistical methodologies from the Bayesian school of
thought that may be applied. Bayesian statistical methodologies are proposed to integrate biological knowledge and
improve statistical inference for three relevant Bioinformatics applications: gene expression
arrays, BAC and aCGH arrays, and real-time gene expression experiments.
A unified Bayesian model is proposed to perform detection of genes and gene classes,
defined from historical pathways, with gene expression arrays. A novel Bayesian statistical
method is proposed to infer chromosomal copy number aberrations in clinical
populations with BAC or aCGH experiments. A theoretical model is proposed, motivated
from historical work in mathematical biology, for inference with real-time gene
expression experiments, and fit with Bayesian methods. Simulation and case studies
show that Bayesian methodologies show great promise to improve the way we learn
with high-throughput Bioinformatics experiments.
|
9 |
Bayesian methods in bioinformaticsBaladandayuthapani, Veerabhadran 25 April 2007 (has links)
This work is directed towards developing flexible Bayesian statistical methods
in the semi- and nonparamteric regression modeling framework with special focus on
analyzing data from biological and genetic experiments. This dissertation attempts to
solve two such problems in this area. In the first part, we study penalized regression
splines (P-splines), which are low-order basis splines with a penalty to avoid under-
smoothing. Such P-splines are typically not spatially adaptive, and hence can have
trouble when functions are varying rapidly. We model the penalty parameter inherent
in the P-spline method as a heteroscedastic regression function. We develop a full
Bayesian hierarchical structure to do this and use Markov Chain Monte Carlo tech-
niques for drawing random samples from the posterior for inference. We show that
the approach achieves very competitive performance as compared to other methods.
The second part focuses on modeling DNA microarray data. Microarray technology
enables us to monitor the expression levels of thousands of genes simultaneously and
hence to obtain a better picture of the interactions between the genes. In order to
understand the biological structure underlying these gene interactions, we present a
hierarchical nonparametric Bayesian model based on Multivariate Adaptive Regres-sion Splines (MARS) to capture the functional relationship between genes and also
between genes and disease status. The novelty of the approach lies in the attempt to
capture the complex nonlinear dependencies between the genes which could otherwise
be missed by linear approaches. The Bayesian model is flexible enough to identify
significant genes of interest as well as model the functional relationships between the
genes. The effectiveness of the proposed methodology is illustrated on leukemia and
breast cancer datasets.
|
10 |
Bayesian statistics and production reliability assessments for mining operationsSharma, Gaurav Kumar 05 1900 (has links)
This thesis presents a novel application of structural reliability concepts to assess the
reliability of mining operations. “Limit-states” are defined to obtain the probability that the
total productivity — measured in production time or economic gain — exceeds user-selected
thresholds. Focus is on the impact of equipment downtime and other non-operating instances
on the productivity and the economic costs of the operation. A comprehensive set of data
gathered at a real-world mining facility is utilized to calibrate the probabilistic models. In
particular, the utilization of Bayesian inference facilitates the inclusion of data — and
updating of the production probabilities — as they become available. The thesis includes a
detailed description of the Bayesian approach, as well as the limit-state-based reliability
methodology. A comprehensive numerical example demonstrates the methodology and the
usefulness of the probabilistic results.
|
Page generated in 0.0338 seconds