Global ETD Search

201	Probabilistic inference for phrase-based machine translation : a sampling approach Arun, Abhishek January 2011 (has links) Recent advances in statistical machine translation (SMT) have used dynamic programming (DP) based beam search methods for approximate inference within probabilistic translation models. Despite their success, these methods compromise the probabilistic interpretation of the underlying model thus limiting the application of probabilistically defined decision rules during training and decoding. As an alternative, in this thesis, we propose a novel Monte Carlo sampling approach for theoretically sound approximate probabilistic inference within these models. The distribution we are interested in is the conditional distribution of a log-linear translation model; however, often, there is no tractable way of computing the normalisation term of the model. Instead, a Gibbs sampling approach for phrase-based machine translation models is developed which obviates the need of computing this term yet produces samples from the required distribution. We establish that the sampler effectively explores the distribution defined by a phrase-based models by showing that it converges in a reasonable amount of time to the desired distribution, irrespective of initialisation. Empirical evidence is provided to confirm that the sampler can provide accurate estimates of expectations of functions of interest. The mix of high probability and low probability derivations obtained through sampling is shown to provide a more accurate estimate of expectations than merely using the n-most highly probable derivations. Subsequently, we show that the sampler provides a tractable solution for finding the maximum probability translation in the model. We also present a unified approach to approximating two additional intractable problems: minimum risk training and minimum Bayes risk decoding. Key to our approach is the use of the sampler which allows us to explore the entire probability distribution and maintain a strict probabilistic formulation through the translation pipeline. For these tasks, sampling allies the simplicity of n-best list approaches with the extended view of the distribution that lattice-based approaches benefit from, while avoiding the biases associated with beam search. Our approach is theoretically well-motivated and can give better and more stable results than current state of the art methods. 004.01
202	Transiting exoplanets : characterisation in the presence of stellar activity Alapini Odunlade, Aude Ekundayo Pauline January 2010 (has links) The combined observations of a planet’s transits and the radial velocity variations of its host star allow the determination of the planet’s orbital parameters, and most inter- estingly of its radius and mass, and hence its mean density. Observed densities provide important constraints to planet structure and evolution models. The uncertainties on the parameters of large exoplanets mainly arise from those on stellar masses and radii. For small exoplanets, the treatment of stellar variability limits the accuracy on the de- rived parameters. The goal of this PhD thesis was to reduce these sources of uncertainty by developing new techniques for stellar variability filtering and for the determination of stellar temperatures, and by robustly fitting the transits taking into account external constraints on the planet’s host star. To this end, I developed the Iterative Reconstruction Filter (IRF), a new post-detection stellar variability filter. By exploiting the prior knowledge of the planet’s orbital period, it simultaneously estimates the transit signal and the stellar variability signal, using a com- bination of moving average and median filters. The IRF was tested on simulated CoRoT light curves, where it significantly improved the estimate of the transit signal, particu- lary in the case of light curves with strong stellar variability. It was then applied to the light curves of the first seven planets discovered by CoRoT, a space mission designed to search for planetary transits, to obtain refined estimates of their parameters. As the IRF preserves all signal at the planet’s orbital period, t can also be used to search for secondary eclipses and orbital phase variations for the most promising cases. This en- abled the detection of the secondary eclipses of CoRoT-1b and CoRoT-2b in the white (300–1000 nm) CoRoT bandpass, as well as a marginal detection of CoRoT-1b’s orbital phase variations. The wide optical bandpass of CoRoT limits the distinction between thermal emission and reflected light contributions to the secondary eclipse. I developed a method to derive precise stellar relative temperatures using equiv- alent width ratios and applied it to the host stars of the first eight CoRoT planets. For stars with temperature within the calibrated range, the derived temperatures are con- sistent with the literature, but have smaller formal uncertainties. I then used a Markov Chain Monte Carlo technique to explore the correlations between planet parameters derived from transits, and the impact of external constraints (e.g. the spectroscopically derived stellar temperature, which is linked to the stellar density). Globally, this PhD thesis highlights, and in part addresses, the complexity of perform- ing detailed characterisation of transit light curves. Many low amplitude effects must be taken into account: residual stellar activity and systematics, stellar limb darkening, and the interplay of all available constraints on transit fitting. Several promising areas for further improvements and applications were identified. Current and future high precision photometry missions will discover increasing numbers of small planets around relatively active stars, and the IRF is expected to be useful in characterising them. 520
203	Probabilistic Modelling of Domain and Gene Evolution Muhammad, Sayyed Auwn January 2016 (has links) Phylogenetic inference relies heavily on statistical models that have been extended and refined over the past years into complex hierarchical models to capture the intricacies of evolutionary processes. The wealth of information in the form of fully sequenced genomes has led to the development of methods that are used to reconstruct the gene and species evolutionary histories in greater and more accurate detail. However, genes are composed of evolutionary conserved sequence segments called domains, and domains can also be affected by duplications, losses, and bifurcations implied by gene or species evolution. This thesis proposes an extension of evolutionary models, such as duplication-loss, rate, and substitution, that have previously been used to model gene evolution, to model the domain evolution. In this thesis, I am proposing DomainDLRS: a comprehensive, hierarchical Bayesian method, based on the DLRS model by Åkerborg et al., 2009, that models domain evolution as occurring inside the gene and species tree. The method incorporates a birth-death process to model the domain duplications and losses along with a domain sequence evolution model with a relaxed molecular clock assumption. The method employs a variant of Markov Chain Monte Carlo technique called, Grouped Independence Metropolis-Hastings for the estimation of posterior distribution over domain and gene trees. By using this method, we performed analyses of Zinc-Finger and PRDM9 gene families, which provides an interesting insight of domain evolution. Finally, a synteny-aware approach for gene homology inference, called GenFamClust, is proposed that uses similarity and gene neighbourhood conservation to improve the homology inference. We evaluated the accuracy of our method on synthetic and two biological datasets consisting of Eukaryotes and Fungal species. Our results show that the use of synteny with similarity is providing a significant improvement in homology inference. / <p>QC 20160904</p> Phylogenetics Phylogenomics Evolution Domain Evolution Gene tree Domain tree Bayesian Inference Markov Chain Monte Carlo Homology Inference Gene families C2H2 Zinc-Finger Reelin Protein
204	Segmentation and analysis of vascular networks Allen, K. E. January 2010 (has links) From a clinical perspective retinal vascular segmentation and analysis are important tasks in aiding quantification of vascular disease progression for such prevalent pathologies as diabetic retinopathy, arteriolosclerosis and hypertension. Combined with the emergence of inexpensive digital imaging, retinal fundus images are becoming increasingly available through public databases fuelling interest in retinal vessel research. Vessel segmentation is a challenging task which needs to fulfil many requirements: the accurate segmentation of both normal and pathological vessels; the extraction of vessels of different sizes from large high contrast to small low contrast; minimal user interaction; low computational requirements; and the potential for application among different imaging modalities. We demonstrate a novel and significant improvement on an emerging stochastic vessel segmentation technique, particle filtering, in terms of improved performance at vascular bifurcations and extensibility. An alternative deterministic approach is also presented in the form of a framework utilising morphological Tramline filtering and non-parametric windows pdf estimation. Results of the deterministic algorithm on retinal images match those of state-of-art unsupervised methods in terms of pixel accuracy. In analysing retinal vascular networks, an important initial step is to distinguish between arteries and veins in order to proceed with pathological metrics such as branching angle, diameter, length and arteriole to venule diameter ratio. Practical difficulties include the lack of intensity and textural differences between arteries and veins in all but the largest vessels and the obstruction of vessels and connectivity by low contrast or other vessels. To this end, an innovative Markov Chain Monte Carlo Metropolis-Hastings framework is formulated for the separation of vessel trees. It is subsequently applied to both synthetic and retinal image data with promising results. 617.7
205	A Hierarchical Bayesian Model for the Unmixing Analysis of Compositional Data subject to Unit-sum Constraints Yu, Shiyong 15 May 2015 (has links) Modeling of compositional data is emerging as an active area in statistics. It is assumed that compositional data represent the convex linear mixing of definite numbers of independent sources usually referred to as end members. A generic problem in practice is to appropriately separate the end members and quantify their fractions from compositional data subject to nonnegative and unit-sum constraints. A number of methods essentially related to polytope expansion have been proposed. However, these deterministic methods have some potential problems. In this study, a hierarchical Bayesian model was formulated, and the algorithms were coded in MATLABÒ. A test run using both a synthetic and real-word dataset yields scientifically sound and mathematically optimal outputs broadly consistent with other non-Bayesian methods. Also, the sensitivity of this model to the choice of different priors and structure of the covariance matrix of error were discussed. Applied Statistics
206	Generating Evidence for COPD Clinical Guidelines Using EHRs Amber M Johnson (7023350) 14 August 2019 (has links) The Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelinesare used to guide clinical practices for treating Chronic Obstructive Pulmonary Disease (COPD). GOLD focuses heavily on stable COPD patients, limiting its use fornon-stable COPD patients such as those with severe, acute exacerbations of COPD (AECOPD) that require hospitalization. Although AECOPD can be heterogeneous, it can lead to deterioration of health and early death. Electronic health records (EHRs) can be used to analyze patient data for understanding disease progression and generating guideline evidence for AECOPD patients. However, because of its structure and representation, retrieving, analyzing, and properly interpreting EHR data can be challenging, and existing tools do not provide granular analytic capabil-ities for this data.<div><br></div><div>This dissertation presents, develops, and implements a novel approach that systematically captures the effect of interventions during patient medical encounters, and hence may support evidence generation for clinical guidelines in a systematic and principled way. A conceptual framework that structures components, such as data storage, aggregation, extraction, and visualization, to support EHR data analytics for granular analysis is introduced. We develop a software framework in Python based on these components to create longitudinal representations of raw medical data extracted from the Medical Information Mart for Intensive Care (MIMIC-III) clinical database. The software framework consists of two tools: Patient Aggregated Care Events (PACE), a novel tool for constructing and visualizing entire medical histories of both individual patients and patient cohorts, and Mark SIM, a Markov Chain Monte Carlo modeling and simulation tool for predicting clinical outcomes through probabilistic analysis that captures granular temporal aspects of aggregated, clinicaldata.<br></div><div><br></div><div>We assess the efficacy of antibiotic treatment and the optimal time of initiationfor in-hospitalized AECOPD patients as an application to probabilistic modeling. We identify 697 AECOPD patients of which 26.0% were administered antibiotics. Our model simulations show a 50% decrease in mortality rate as the number of patients administered antibiotics increase, and an estimated 5.5% mortality rate when antibiotics are initially administrated after 48 hours vs 1.8% when antibiotics are initially administrated between 24 and 48 hours. Our findings suggest that there may be amortality benefit in initiation of antibiotics early in patients with acute respiratory failure in ICU patients with severe AECOPD.<br></div><div><br></div><div>Thus, we show that it is feasible to enhance representation of EHRs to aggregate patients’ entire medical histories with temporal trends and support complex clinical questions to drive clinical guidelines for COPD.<br></div> Applied Computer Science MCMC (Markov chain Monte Carlo) methods Electronic Health Record Data Visualizations simulation modeling healthcare ehrs copd
207	Cosmological parameter estimation with the Planck satellite data : from the construction of a likelihood to neutrino properties / Estimation des paramètres cosmologiques à partir des données du satellite Planck : de la construction d’une fonction de vraisemblance aux caractéristiques des neutrinos Spinelli, Marta 28 September 2015 (has links) Le fond diffus cosmologique (CMB), relique du Big-Bang chaud, porte les traces à la fois de la formation des structures des époques récentes et des premières phases énergétiques de l'Univers.Le satellite Planck, en service de 2009 à 2013, a fourni des mesures de haute qualité des anisotropies du CMB. Celles-ci sont utilisés dans cette thèse pour déterminer les paramètres du modèle cosmologique standard et autour du secteur des neutrinos.Ce travail décrit la construction d'un fonction de vraisemblance pour les hauts-l de Planck. Cela implique une stratégie de masquage pour l'émission thermique de la Galaxie ainsi que pour les sources ponctuelles. Les avant-plans résiduels sont traités directement au niveau du spectre de puissance en utilisant des templates physiques bases sur des études de Planck.Les méthodes statistiques nécessaires pour extraire les paramètres cosmologiques dans la comparaison entre les modèles et les données sont décrites, à la fois la méthode bayésienne de Monte-Carlo par chaînes de Markov et la technique fréquentiste du profil de la fonction de vraisemblance.Les résultats sur les paramètres cosmologiques sont présentés en utilisant les données de Planck seul et en combinaison avec les données à petites échelles des expériences de CMB basées au sol (ACT et SPT), avec les contraintes provenant des mesures des oscillations acoustiques des baryons (BAO) et des supernovae. Les contraintes sur l'échelle absolue de la masse des neutrinos et sur le nombre effectif de neutrinos sont également discutées. / The cosmic microwave background (CMB), relic of the hot Big-Bang, carries the traces of both the rich structure formation of the late time epochs and the energetic early phases of the universe.The Planck satellite provided, from 2009 to 2013, high-quality measurements of the anisotropies of the CMB. These are used in this thesis to determine the parameters of the standard cosmological model and of the extension concerning the neutrino sector. The construction of an high-l Planck likelihood is detailed. This involves a masking strategy that deals in particular with the contamination from thermal emission of the Galaxy. The residual foregrounds are treated directly at the power spectrum level relying on physically motivated templates based on Planck studies.The statistical methods needed to extract the cosmological parameters in the comparison between models and data are described, both the Bayesian Monte Carlo Markov Chain techniques and the frequentist profile likelihood. Results on cosmological parameters are presented using Planck data alone and in combination with the small scale data from the ground based CMB experiment ACT and SPT, the Baryon Acoustic Oscillation and the Supernovae. Constraints on the absolute scale of neutrino masses and of the number of effective neutrino are also discussed. Fond diffus cosmologique Modèle LCDM Neutrino Fonction de vraisemblance Cosmic Microwave Background LCDM model Neutrino Markov Chain Monte Carlo Likelihood
208	Abordagem bayesiana para polinômios fracionários Carvalho, Dennison Célio de Oliveira January 2019 (has links) Orientador: Miriam Harumi Tsunemi / Resumo: Em inúmeras situações práticas a relação entre uma variável resposta e uma ou mais covariáveis é curvada. Dentre as diversas formas de representar esta curvatura, Royston e Altman (1994) propuseram uma extensa famı́lia de funções denominada de Polinômios Fracionários (Fractional Polynomials - FP ). Bové e Held (2011) im- plementaram o paradigma bayesiano para FP sob a suposição de normalidade dos erros. Sua metodologia é fundamentada em uma distribuição a priori hiper − g (Liang et al., 2008), que, além de muitas propriedades assintóticas interessantes, garante uma predição bayesiana de modelos consistente. Nesta tese, compara-se as abordagens clássica e Bayesiana para PF a partir de dados reais disponı́veis na litera- tura, bem como por simulações. Além disso, propõem-se uma abordagem Bayesiana para modelos FPs em que a potência, diferentemente dos métodos usuais, pode as- sumir qualquer valor num determinado intervalo real e é estimada via métodos de simulação HMC (Monte Carlo Hamiltoniano) e MCMC (Métodos de Monte Carlo via Cadeias de Markov). Neste modelo, para o caso de um FP de segunda ordem, ao contrário dos modelos atualmente disponı́veis, apenas uma potência é estimada. Avalia-se este modelo a partir de dados simulados e em dados reais, sendo um deles com transformação de Box-Cox. / Abstract: In many practical situations the relationship between the response variable and one or more covariates is curved. Among the various ways of representing this curvature, Royston and Altman (1994) proposed an extended family of functions called Fractional Polynomials (FP). Bov´e and Held (2011) implemented the Bayesian paradigm for FP on the assumption of error normality. Their methodology is based on a hyperg prior distribution, which, in addition to many interesting asymptotic properties, guarantees a consistent Bayesian model average (BMA). In addition, a Bayesian approach is proposed for FPs models in which power, unlike the usual methods, can obtain any numerical real interval value and is estimated via HMC (Monte Carlo Hamiltonian) and MCMC (Markov chain Monte Carlo). In this model, in the case of a second-order FP, unlike the currently available models, only one power is estimated. This model is evaluated from simulated data and real data, one of them with Box-Cox transformation. / Doutor Regressão não linear Monte Carlo Hamiltoniano nonlinear regression Monte Carlo Hamiltonian Markov chain Monte Carlo
209	Globální explorace v Monte Carlo metodách s Markovovými řetězci pro simulaci transportu světla / Global exploration in Markov chain Monte Carlo methods for light transport simulation Šik, Martin January 2019 (has links) Monte Carlo light transport simulation has become a de-facto standard tool for photorealistic rendering. However, the algorithms used by the current rendering systems are often ineffective, especially in scenes featuring light transport due to multiple highly glossy or specular interactions and complex visibility between the camera and light sources. It is therefore desirable to adopt more robust algorithms in practice. Light transport algorithms based on Markov chain Monte Carlo (MCMC) are known to be effective at sampling many different kinds of light transport paths even in the presence of complex visibility. However, the current MCMC algorithms often over-sample some of the paths while under-sampling or completely missing other paths. We attribute this behavior to insufficient global exploration of path space which leads to their unpredictable convergence and causes the occurrence of image artifacts. This in turn prohibits adoption of MCMC algorithms in practice. In this thesis we therefore focus on improving global exploration in MCMC algorithms for light transport simulation. First, we present a new MCMC algorithm that utilizes replica exchange to improve global exploration. To maximize efficiency of replica exchange we introduce tempering of the path space, which allows easier discovery of important...
210	Modelling Long-Term Persistence in Hydrological Time Series Thyer, Mark Andrew January 2001 (has links) The hidden state Markov (HSM) model is introduced as a new conceptual framework for modelling long-term persistence in hydrological time series. Unlike the stochastic models currently used, the conceptual basis of the HSM model can be related to the physical processes that influence long-term hydrological time series in the Australian climatic regime. A Bayesian approach was used for model calibration. This enabled rigourous evaluation of parameter uncertainty, which proved crucial for the interpretation of the results. Applying the single site HSM model to rainfall data from selected Australian capital cities provided some revealing insights. In eastern Australia, where there is a significant influence from the tropical Pacific weather systems, the results showed a weak wet and medium dry state persistence was likely to exist. In southern Australia the results were inconclusive. However, they suggested a weak wet and strong dry persistence structure may exist, possibly due to the infrequent incursion of tropical weather systems in southern Australia. This led to the postulate that the tropical weather systems are the primary cause of two-state long-term persistence. The single and multi-site HSM model results for the Warragamba catchment rainfall data supported this hypothesis. A strong two-state persistence structure was likely to exist in the rainfall regime of this important water supply catchment. In contrast, the single and multi-site results for the Williams River catchment rainfall data were inconsistent. This illustrates further work is required to understand the application of the HSM model. Comparisons with the lag-one autoregressive [AR(1)] model showed that it was not able to reproduce the same long-term persistence as the HSM model. However, with record lengths typical of real data the difference between the two approaches was not statistically significant. Nevertheless, it was concluded that the HSM model provides a conceptually richer framework than the AR(1) model. / PhD Doctorate Long-term persistence Hidden state Markov model Australian rainfall El Nino Markov chain Monte Carlo methods Gibbs sampler Parameter uncertainty AR(1) model

Search results