Global ETD Search

111	Understanding the genetic basis of complex polygenic traits through Bayesian model selection of multiple genetic models and network modeling of family-based genetic data Bae, Harold Taehyun 12 March 2016 (has links) The global aim of this dissertation is to develop advanced statistical modeling to understand the genetic basis of complex polygenic traits. In order to achieve this goal, this dissertation focuses on the development of (i) a novel methodology to detect genetic variants with different inheritance patterns formulated as a Bayesian model selection problem, (ii) integration of genetic data and non-genetic data to dissect the genotype-phenotype associations using Bayesian networks with family-based data, and (iii) an efficient technique to model the family-based data in the Bayesian framework. In the first part of my dissertation, I present a coherent Bayesian framework for selection of the most likely model from the five genetic models (genotypic, additive, dominant, co-dominant, and recessive) used in genetic association studies. The approach uses a polynomial parameterization of genetic data to simultaneously fit the five models and save computations. I provide a closed-form expression of the marginal likelihood for normally distributed data, and evaluate the performance of the proposed method and existing methods through simulated and real genome-wide data sets. The second part of this dissertation presents an integrative analytic approach that utilizes Bayesian networks to represent the complex probabilistic dependency structure among many variables from family-based data. I propose a parameterization that extends mixed effects regression models to Bayesian networks by using random effects as additional nodes of the networks to model the between-subjects correlations. I also present results of simulation studies to compare different model selection metrics for mixed models that can be used for learning BNs from correlated data and application of this methodology to real data from a large family-based study. In the third part of this dissertation, I describe an efficient way to account for family structure in Bayesian inference Using Gibbs Sampling (BUGS). In linear mixed models, a random effects vector has a variance-covariance matrix whose dimension is as large as the sample size. However, a direct handling of this multivariate normal distribution is not computationally feasible in BUGS. Therefore, I propose a decomposition of this multivariate normal distribution into univariate normal distributions using singular value decomposition, and implementation in BUGS is presented. Biostatistics Bayesian Network Parameterization GWAS
112	Non-parametric Bayesian models for structured output prediction Bratières, Sébastien January 2018 (has links) Structured output prediction is a machine learning tasks in which an input object is not just assigned a single class, as in classification, but multiple, interdependent labels. This means that the presence or value of a given label affects the other labels, for instance in text labelling problems, where output labels are applied to each word, and their interdependencies must be modelled. Non-parametric Bayesian (NPB) techniques are probabilistic modelling techniques which have the interesting property of allowing model capacity to grow, in a controllable way, with data complexity, while maintaining the advantages of Bayesian modelling. In this thesis, we develop NPB algorithms to solve structured output problems. We first study a map-reduce implementation of a stochastic inference method designed for the infinite hidden Markov model, applied to a computational linguistics task, part-of-speech tagging. We show that mainstream map-reduce frameworks do not easily support highly iterative algorithms. The main contribution of this thesis consists in a conceptually novel discriminative model, GPstruct. It is motivated by labelling tasks, and combines attractive properties of conditional random fields (CRF), structured support vector machines, and Gaussian process (GP) classifiers. In probabilistic terms, GPstruct combines a CRF likelihood with a GP prior on factors; it can also be described as a Bayesian kernelized CRF. To train this model, we develop a Markov chain Monte Carlo algorithm based on elliptical slice sampling and investigate its properties. We then validate it on real data experiments, and explore two topologies: sequence output with text labelling tasks, and grid output with semantic segmentation of images. The latter case poses scalability issues, which are addressed using likelihood approximations and an ensemble method which allows distributed inference and prediction. The experimental validation demonstrates: (a) the model is flexible and its constituent parts are modular and easy to engineer; (b) predictive performance and, most crucially, the probabilistic calibration of predictions are better than or equal to that of competitor models, and (c) model hyperparameters can be learnt from data.
113	Robust variational Bayesian clustering for underdetermined speech separation Zohny, Zeinab Y. January 2016 (has links) The main focus of this thesis is the enhancement of the statistical framework employed for underdetermined T-F masking blind separation of speech. While humans are capable of extracting a speech signal of interest in the presence of other interference and noise; actual speech recognition systems and hearing aids cannot match this psychoacoustic ability. They perform well in noise and reverberant free environments but suffer in realistic environments. Time-frequency masking algorithms based on computational auditory scene analysis attempt to separate multiple sound sources from only two reverberant stereo mixtures. They essentially rely on the sparsity that binaural cues exhibit in the time-frequency domain to generate masks which extract individual sources from their corresponding spectrogram points to solve the problem of underdetermined convolutive speech separation. Statistically, this can be interpreted as a classical clustering problem. Due to analytical simplicity, a finite mixture of Gaussian distributions is commonly used in T-F masking algorithms for modelling interaural cues. Such a model is however sensitive to outliers, therefore, a robust probabilistic model based on the Student's t-distribution is first proposed to improve the robustness of the statistical framework. This heavy tailed distribution, as compared to the Gaussian distribution, can potentially better capture outlier values and thereby lead to more accurate probabilistic masks for source separation. This non-Gaussian approach is applied to the state-of the-art MESSL algorithm and comparative studies are undertaken to confirm the improved separation quality. A Bayesian clustering framework that can better model uncertainties in reverberant environments is then exploited to replace the conventional expectation-maximization (EM) algorithm within a maximum likelihood estimation (MLE) framework. A variational Bayesian (VB) approach is then applied to the MESSL algorithm to cluster interaural phase differences thereby avoiding the drawbacks of MLE; specifically the probable presence of singularities and experimental results confirm an improvement in the separation performance. Finally, the joint modelling of the interaural phase and level differences and the integration of their non-Gaussian modelling within a variational Bayesian framework, is proposed. This approach combines the advantages of the robust estimation provided by the Student's t-distribution and the robust clustering inherent in the Bayesian approach. In other words, this general framework avoids the difficulties associated with MLE and makes use of the heavy tailed Student's t-distribution to improve the estimation of the soft probabilistic masks at various reverberation times particularly for sources in close proximity. Through an extensive set of simulation studies which compares the proposed approach with other T-F masking algorithms under different scenarios, a significant improvement in terms of objective and subjective performance measures is achieved. 621.382 Bayesian analysis ; Speech separation
114	Optimisation and Bayesian optimality Joyce, Thomas January 2016 (has links) This doctoral thesis will present the results of work into optimisation algorithms. We first give a detailed exploration of the problems involved in comparing optimisation algorithms. In particular we provide extensions and refinements to no free lunch results, exploring algorithms with arbitrary stopping conditions, optimisation under restricted metrics, parallel computing and free lunches, and head-to-head minimax behaviour. We also characterise no free lunch results in terms of order statistics. We then ask what really constitutes understanding of an optimisation algorithm. We argue that one central part of understanding an optimiser is knowing its Bayesian prior and cost function. We then pursue a general Bayesian framing of optimisation, and prove that this Bayesian perspective is applicable to all optimisers, and that even seemingly non-Bayesian optimisers can be understood in this way. Specifically we prove that arbitrary optimisation algorithms can be represented as a prior and a cost function. We examine the relationship between the Kolmogorov complexity of the optimiser and the Kolmogorov complexity of it’s corresponding prior. We also extended our results from deterministic optimisers to stochastic optimisers and forgetful optimisers, and we show that uniform randomly selecting a prior is not equivalent to uniform randomly selecting an optimisation behaviour. Lastly we consider what the best way to go about gaining a Bayesian understanding of real optimisation algorithms is. We use the developed Bayesian framework to explore the affects of some common approaches to constructing meta-heuristic optimisation algorithms, such as on-line parameter adaptation. We conclude by exploring an approach to uncovering the probabilistic beliefs of optimisers with a “shattering” method. 519.7 Bayesian ; optimisation ; meta-heuristic
115	SELEÇÃO de Modelos e Estimação de Parâmetros No Tratamento Quimioterápico de Tumores Via Inferência Bayesiana MATA, A. M. M. 21 July 2017 (has links) Made available in DSpace on 2018-08-02T00:03:01Z (GMT). No. of bitstreams: 1 tese_11469_ADRIANA MACHADO MALAFAIA DA MATA.pdf: 525854 bytes, checksum: 6cb593fee29b00aa8d38d9498f996ea0 (MD5) Previous issue date: 2017-07-21 / O câncer é uma doença decorrente do crescimento desordenado de células. Comumente, a quimioterapia antineoplásica é utilizada no tratamento dos cânceres mais comuns. Nesse contexto, as pesquisas têm se voltado para modelos matemáticos que descrevem o crescimento de células tumorais com a ação de um fármaco quimioterápico. Diante de uma variedade de modelos existentes na literatura para tal fim, um método para selecionar o modelo mais adequado faz-se necessário. Esta dissertação estuda modelos matemáticos de tratamento de tumores e aplica Approximate Bayesian Computation (ABC) para seleção do modelo que melhor representa os dados observados. O algoritmo ABC utilizado foi determinístico, priorizando a seleção do modelo. Ao modelo selecionado, foi aplicado o filtro de partículas SIR que permitiu aprimorar as estimativas de parâmetros. Foram estudados modelos de crescimento tumoral via equações diferenciais ordinárias e os parâmetros foram assumidos como constantes. Os modelos foram estruturados a partir de farmacocinética Bicompartimental, que permite o estudo de drogas antineoplásicas administradas por via oral. Além disso, foram utilizadas formulações de crescimento de tumores conhecidas adicionando-se o fator de influência de uma dose única de droga quimioterápica. Approximatte Bayesian Computation Filtro de Partículas
116	Bayesian M/EEG source localization with possible joint skull conductivity estimation Costa, Facundo Hernan 02 March 2017 (has links) (PDF) M/EEG mechanisms allow determining changes in the brain activity, which is useful in diagnosing brain disorders such as epilepsy. They consist of measuring the electric potential at the scalp and the magnetic field around the head. The measurements are related to the underlying brain activity by a linear model that depends on the lead-field matrix. Localizing the sources, or dipoles, of M/EEG measurements consists of inverting this linear model. However, the non-uniqueness of the solution (due to the fundamental law of physics) and the low number of dipoles make the inverse problem ill-posed. Solving such problem requires some sort of regularization to reduce the search space. The literature abounds of methods and techniques to solve this problem, especially with variational approaches. This thesis develops Bayesian methods to solve ill-posed inverse problems, with application to M/EEG. The main idea underlying this work is to constrain sources to be sparse. This hypothesis is valid in many applications such as certain types of epilepsy. We develop different hierarchical models to account for the sparsity of the sources. Theoretically, enforcing sparsity is equivalent to minimizing a cost function penalized by an l0 pseudo norm of the solution. However, since the l0 regularization leads to NP-hard problems, the l1 approximation is usually preferred. Our first contribution consists of combining the two norms in a Bayesian framework, using a Bernoulli-Laplace prior. A Markov chain Monte Carlo (MCMC) algorithm is used to estimate the parameters of the model jointly with the source location and intensity. Comparing the results, in several scenarios, with those obtained with sLoreta and the weighted l1 norm regularization shows interesting performance, at the price of a higher computational complexity. Our Bernoulli-Laplace model solves the source localization problem at one instant of time. However, it is biophysically well-known that the brain activity follows spatiotemporal patterns. Exploiting the temporal dimension is therefore interesting to further constrain the problem. Our second contribution consists of formulating a structured sparsity model to exploit this biophysical phenomenon. Precisely, a multivariate Bernoulli-Laplacian distribution is proposed as an a priori distribution for the dipole locations. A latent variable is introduced to handle the resulting complex posterior and an original Metropolis-Hastings sampling algorithm is developed. The results show that the proposed sampling technique improves significantly the convergence. A comparative analysis of the results is performed between the proposed model, an l21 mixed norm regularization and the Multiple Sparse Priors (MSP) algorithm. Various experiments are conducted with synthetic and real data. Results show that our model has several advantages including a better recovery of the dipole locations. The previous two algorithms consider a fully known leadfield matrix. However, this is seldom the case in practical applications. Instead, this matrix is the result of approximation methods that lead to significant uncertainties. Our third contribution consists of handling the uncertainty of the lead-field matrix. The proposed method consists in expressing this matrix as a function of the skull conductivity using a polynomial matrix interpolation technique. The conductivity is considered as the main source of uncertainty of the lead-field matrix. Our multivariate Bernoulli-Laplacian model is then extended to estimate the skull conductivity jointly with the brain activity. The resulting model is compared to other methods including the techniques of Vallaghé et al and Guttierez et al. Our method provides results of better quality without requiring knowledge of the active dipole positions and is not limited to a single dipole activation. Bayesian Inverse problem Source localization
117	Statistical methods & algorithms for autonomous immunoglobulin repertoire analysis Norwood, Katherine Frances 13 January 2021 (has links) Investigating the immunoglobulin repertoire is a means of understanding the adaptive immune response to infectious disease or vaccine challenge. The data examined are typically generated using high-throughput sequencing on samples of immunoglobulin variable-region genes present in blood or tissue collected from human or animal subjects. The analysis of these large, diverse collections provides a means of gaining insight into the specific molecular mechanisms involved in generating and maintaining a protective immune response. It involves the characterization of distinct clonal populations, specifically through the inference of founding alleles for germline gene segment recombination, as well as the lineage of accumulated mutations acquired during the development of each clone. Germline gene segment inference is currently performed by aligning immunoglobulin sequencing reads against an external reference database and assigning each read to the entry that provides the best score according to the metric used. The problem with this approach is that allelic diversity is greater than can be usefully accommodated in a static database. The absence of the alleles used from the database often leads to the misclassification of single-nucleotide polymorphisms as somatic mutations acquired during affinity maturation. This trend is especially evident with the rhesus macaque, but also affects the comparatively well-catalogued human databases, whose collections are biased towards samples from individuals of European descent. Our project presents novel statistical methods for immunoglobulin repertoire analysis which allow for the de novo inference of germline gene segment libraries directly from next-generation sequencing data, without the need for external reference databases. These methods follow a Bayesian paradigm, which uses an information-theoretic modelling approach to iteratively improve upon internal candidate gene segment libraries. Both candidate libraries and trial analyses given those libraries are incorporated as components of the machine learning evaluation procedure, allowing for the simultaneous optimization of model accuracy and simplicity. Finally, the proposed methods are evaluated using synthetic data designed to mimic known mechanisms for repertoire generation, with pre-designated parameters. We also apply these methods to known biological sources with unknown repertoire generation parameters, and conclude with a discussion on how this method can be used to identify potential novel alleles. Bioinformatics Bayesian Dirichlet Immunoglobulin Repertoire
118	Adaptive methods for Bayesian time-to-event point-of-care clinical trials Leatherman, Sarah Michelle 22 January 2016 (has links) Point-of-care clinical trials are randomized clinical trials designed to maximize pragmatic design features. The goal is to integrate research into standard care such that the burden of research is minimized for patient and physician, including recruitment, randomization and study visits. When possible, these studies employ Bayesian adaptive methods and data collection through the medical record. Due to the passive and adaptive nature of these trials, a number of unique challenges may arise over the course of a study. In this dissertation, adaptive methodology for Bayesian time-to-event clinical trials is developed and evaluated for studies with limited censoring. Use of a normal approximation to the study parameter likelihood is proposed for trials in which the likelihood is not normally distributed and assessed with respect to frequentist type I and II errors. A previously developed method for choosing a normal prior distribution for analysis is applied with modifications to allow for adaptive randomization. This method of prior selection in conjunction with the normal parameter likelihood is used to estimate future data for the purpose of prediction of study success. A previously published method for future event estimation is modified to allow for adaptive randomization and inclusion of prior information. Accuracy of this method is evaluated against final study numbers under a range of study designs and parameter likelihood assumptions. With these future estimates, we predict study conclusions by calculating predicted probabilities of study outcome and compare them to actual study conclusions. Reliability of this method is evaluated considering prior distribution choice, study design, and use of an incorrect likelihood for analysis. The normal approximation to non-normally distributed data performs well here and is reliable when the underlying likelihood is known. The choice of analytic prior distribution agrees with previously published results when equal allocation is forced, but changes depending on the severity of adaptive allocation. Performance of event estimation and prediction vary, but can provide reliable estimates after only 25 subjects have been observed. Analysis and prediction can reliably be carried out in point-of-care studies when care is taken to ensure assumptions are reasonable. Biostatistics Bayesian Adaptive Clinical trials
119	Bayesian Network Analysis for Diagnostics and Prognostics of Engineering Systems Banghart, Marc D 11 August 2017 (has links) Bayesian networks have been applied to many different domains to perform prognostics, reduce risk and ultimately improve decision making. However, these methods have not been applied to military field and human performance data sets in an industrial environment. Methods frequently rely on a clear understanding of causal connections leading to an undesirable event and detailed understanding of the system behavior. Methods may also require large amount of analyst teams and domain experts, coupled with manual data cleansing and classification. The research performed utilized machine learning algorithms (such as Bayesian networks) and two existing data sets. The primary objective of the research was to develop a diagnostic and prognostic tool utilizing Bayesian networks that does not require the need for detailed causal understanding of the underlying system. The research yielded a predictive method with substantial benefits over reactive methods. The research indicated Bayesian networks can be trained and utilized to predict failure of several important components to include potential malfunction codes and downtime on a real-world Navy data set. The research also considered potential error within the training data set. The results provided credence to utilization of Bayesian networks in real field data – which will always contain error that is not easily quantified. Research should be replicated with additional field data sets from other aircraft. Future research should be conducted to solicit and incorporate domain expertise into subsequent models. Research should also consider incorporation of text based analytics for text fields, which was considered out of scope for this research project. risk prognostics diagnostics network bayesian
120	Using Box-Scores to Determine a Position's Contribution to Winning Basketball Games Page, Garritt L. 16 August 2005 (has links) (PDF) Basketball is a sport that has become increasingly popular world-wide. At the professional level it is a game in which each of the five positions has a specific responsibility that requires unique skills. It seems likely that it would be valuable for coaches to know which skills for each position are most conducive to winning. Knowing which skills to develop for each position could help coaches optimize each player's ability by customizing practice to contain drills that develop the most important skills for each position that would in turn improve the team's overall ability. Through the use of Bayesian hierarchical modeling and NBA box-score performance categories, this project will determine how each position needs to perform in order for their team to be successful. Bayesian Hierarchical Model Statistics and Probability

Search results