• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 152
  • 86
  • 54
  • 21
  • 10
  • 7
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • Tagged with
  • 412
  • 181
  • 87
  • 86
  • 78
  • 78
  • 77
  • 70
  • 65
  • 58
  • 57
  • 56
  • 48
  • 43
  • 42
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Statistical Models for Next Generation Sequencing Data

Wang, Yiyi 03 October 2013 (has links)
Three statistical models are developed to address problems in Next-Generation Sequencing data. The first two models are designed for RNA-Seq data and the third is designed for ChIP-Seq data. The first of the RNA-Seq models uses a Bayesian non- parametric model to detect genes that are differentially expressed across treatments. A negative binomial sampling distribution is used for each gene’s read count such that each gene may have its own parameters. Despite the consequent large number of parameters, parsimony is imposed by a clustering inherent in the Bayesian nonparametric framework. A Bayesian discovery procedure is adopted to calculate the probability that each gene is differentially expressed. A simulation study and real data analysis show this method will perform at least as well as existing leading methods in some cases. The second RNA-Seq model shares the framework of the first model, but replaces the usual random partition prior from the Dirichlet process by a random partition prior indexed by distances from Gene Ontology (GO). The use of the external biological information yields improvements in statistical power over the original Bayesian discovery procedure. The third model addresses the problem of identifying protein binding sites for ChIP-Seq data. An exact test via a stochastic approximation is used to test the hypothesis that the treatment effect is independent of the sequence count intensity effect. The sliding window procedure for ChIP-Seq data is followed. The p-value and the adjusted false discovery rate are calculated for each window. For the sites identified as peak regions, three candidate models are proposed for characterizing the bimodality of the ChIP-Seq data, and the stochastic approximation in Monte Carlo (SAMC) method is used for selecting the best of the three. Real data analysis shows that this method produces comparable results as other existing methods and is advantageous in identifying bimodality of the data.
62

Markov Chain Monte Carlo Modeling of High-Redshift Quasar Host Galaxies in Hubble Space Telescope Imaging

January 2014 (has links)
abstract: Quasars, the visible phenomena associated with the active accretion phase of super- massive black holes found in the centers of galaxies, represent one of the most energetic processes in the Universe. As matter falls into the central black hole, it is accelerated and collisionally heated, and the radiation emitted can outshine the combined light of all the stars in the host galaxy. Studies of quasar host galaxies at ultraviolet to near-infrared wavelengths are fundamentally limited by the precision with which the light from the central quasar accretion can be disentangled from the light of stars in the surrounding host galaxy. In this Dissertation, I discuss direct imaging of quasar host galaxies at redshifts z ≃ 2 and z ≃ 6 using new data obtained with the Hubble Space Telescope. I describe a new method for removing the point source flux using Markov Chain Monte Carlo parameter estimation and simultaneous modeling of the point source and host galaxy. I then discuss applications of this method to understanding the physical properties of high-redshift quasar host galaxies including their structures, luminosities, sizes, and colors, and inferred stellar population properties such as age, mass, and dust content. / Dissertation/Thesis / Ph.D. Astrophysics 2014
63

Inferring the photometric and size evolution of galaxies from image simulations / Inférence de l'évolution photométrique et en taille des galaxies au moyen d'images simulées

Carassou, Sébastien 20 October 2017 (has links)
Les contraintes actuelles sur l'évolution en luminosité et en taille des galaxies dépendent de catalogues multi-bandes extraits de relevés d'imagerie. Mais ces catalogues sont altérés par des effets de sélection difficiles à modéliser et pouvant mener à des résultats contradictoires s'ils ne sont pas bien pris en compte. Dans cette thèse nous avons développé une nouvelle méthode pour inférer des contraintes robustes sur les modèles d'évolution des galaxies. Nous utilisons un modèle empirique générant une distribution de galaxies synthétiques à partir de paramètres physiques. Ces galaxies passent par un simulateur d'image émulant les propriétés instrumentales de n'importe quel relevé et sont extraites de la même façon que les données observées pour une comparaison directe. L'écart entre vraies et fausses données est minimisé via un échantillonnage basé sur des chaînes de Markov adaptatives. A partir de donnée synthétiques émulant les propriétés du Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) Deep, nous démontrons la cohérence interne de notre méthode en inférant les distributions de taille et de luminosité et leur évolution de plusieurs populations de galaxies. Nous comparons nos résultats à ceux obtenus par la méthode classique d'ajustement de la distribution spectrale d'énergie (SED) et trouvons que notre pipeline infère efficacement les paramètres du modèle en utilisant seulement 3 filtres, et ce plus précisément que par ajustement de la SED à partir des mêmes observables. Puis nous utilisons notre pipeline sur une fraction d'un champ du CFHTLS Deep pour contraindre ces mêmes paramètres. Enfin nous soulignons le potentiel et les limites de cette méthode. / Current constraints on the luminosity and size evolution of galaxies rely on catalogs extracted from multi-band surveys. However resulting catalogs are altered by selection effects difficult to model and that can lead to conflicting predictions if not taken into account properly. In this thesis we have developed a new approach to infer robust constraints on model parameters. We use an empirical model to generate a set of mock galaxies from physical parameters. These galaxies are passed through an image simulator emulating the instrumental characteristics of any survey and extracted in the same way as from observed data for direct comparison. The difference between mock and observed data is minimized via a sampling process based on adaptive Monte Carlo Markov Chain methods. Using mock data matching most of the properties of a Canada-France-Hawaii Telescope Legacy Survey Deep (CFHTLS Deep) field, we demonstrate the robustness and internal consistency of our approach by inferring the size and luminosity functions and their evolution parameters for realistic populations of galaxies. We compare our results with those obtained from the classical spectral energy distribution (SED) fitting method, and find that our pipeline infers the model parameters using only 3 filters and more accurately than SED fitting based on the same observables. We then apply our pipeline to a fraction of a real CFHTLS Deep field to constrain the same set of parameters in a way that is free from systematic biases. Finally, we highlight the potential of this technique in the context of future surveys and discuss its drawbacks.
64

Robustness of the Within- and Between-Series Estimators to Non-Normal Multiple-Baseline Studies: A Monte Carlo Study

Joo, Seang-Hwane 06 April 2017 (has links)
In single-case research, multiple-baseline (MB) design is the most widely used design in practical settings. It provides the opportunity to estimate the treatment effect based on not only within-series comparisons of treatment phase to baseline phase observations, but also time-specific between-series comparisons of observations from those that have started treatment to those that are still in the baseline. In MB studies, the average treatment effect and the variation of these effects across multiple participants can be estimated using various statistical modeling methods. Recently, two types of statistical modeling methods were proposed for analyzing MB studies: a) within-series model and b) between-series model. The within-series model is a typical two-level multilevel modeling approach analyzing the measurement occasions within a participant, whereas the between-series model is an alternative modeling approach analyzing participants’ measurement occasions at certain time points, where some participants are in the baseline phase and others are in the treatment phase. Parameters of both within- and between-series models are generally estimated with restricted maximum likelihood (ReML) estimation and ReML is developed based on the assumption of normality (Hox, et al., 2010; Raudenbush & Bryk, 2002). However, in practical educational and psychological settings, observed data may not be easily assumed to be normal. Therefore, the purpose of this study is to investigate the robustness of analyzing MB studies with the within- and between-series models when level-1 errors are non-normal. A Monte Carlo study was conducted under the conditions where level-1 errors were generated from non-normal distributions in which skewness and kurtosis of the distribution were manipulated. Four statistical approaches were considered for comparison based on theoretical and/or empirical rationales. The approaches were defined by the crossing of two analytic decisions: a) whether to use a within- or between-series estimate of effect and b) whether to use REML estimation with Kenward-Roger adjustment for inferences or Bayesian estimation and inference. The accuracy of parameter estimation and statistical power and Type I error were systematically analyzed. The results of the study showed the within- and between-series models are robust to the non-normality of the level-1 error variance. Both within- and between-series models estimated the treatment effect accurately and statistical inferences were acceptable. ReML and Bayesian estimations also showed similar results in the current study. Applications and implications for applied and methodology researchers are discussed based on the findings of the study.
65

Generalised Bayesian matrix factorisation models

Mohamed, Shakir January 2011 (has links)
Factor analysis and related models for probabilistic matrix factorisation are of central importance to the unsupervised analysis of data, with a colourful history more than a century long. Probabilistic models for matrix factorisation allow us to explore the underlying structure in data, and have relevance in a vast number of application areas including collaborative filtering, source separation, missing data imputation, gene expression analysis, information retrieval, computational finance and computer vision, amongst others. This thesis develops generalisations of matrix factorisation models that advance our understanding and enhance the applicability of this important class of models. The generalisation of models for matrix factorisation focuses on three concerns: widening the applicability of latent variable models to the diverse types of data that are currently available; considering alternative structural forms in the underlying representations that are inferred; and including higher order data structures into the matrix factorisation framework. These three issues reflect the reality of modern data analysis and we develop new models that allow for a principled exploration and use of data in these settings. We place emphasis on Bayesian approaches to learning and the advantages that come with the Bayesian methodology. Our port of departure is a generalisation of latent variable models to members of the exponential family of distributions. This generalisation allows for the analysis of data that may be real-valued, binary, counts, non-negative or a heterogeneous set of these data types. The model unifies various existing models and constructs for unsupervised settings, the complementary framework to the generalised linear models in regression. Moving to structural considerations, we develop Bayesian methods for learning sparse latent representations. We define ideas of weakly and strongly sparse vectors and investigate the classes of prior distributions that give rise to these forms of sparsity, namely the scale-mixture of Gaussians and the spike-and-slab distribution. Based on these sparsity favouring priors, we develop and compare methods for sparse matrix factorisation and present the first comparison of these sparse learning approaches. As a second structural consideration, we develop models with the ability to generate correlated binary vectors. Moment-matching is used to allow binary data with specified correlation to be generated, based on dichotomisation of the Gaussian distribution. We then develop a novel and simple method for binary PCA based on Gaussian dichotomisation. The third generalisation considers the extension of matrix factorisation models to multi-dimensional arrays of data that are increasingly prevalent. We develop the first Bayesian model for non-negative tensor factorisation and explore the relationship between this model and the previously described models for matrix factorisation.
66

Probabilistic models for protein conformational changes

Nguyen, Chuong Thach 22 May 2020 (has links)
No description available.
67

Inferencia Bayesiana conjunta de modelos hidrológicos y modelos de error generalizados, para la evaluación de las incertidumbres predictiva y de los parámetros

Hernández López, Mario Ramón 07 November 2017 (has links)
Over the years, the method of least squares (SLS) has been the method of inference commonly applied in hydrological modeling, although its hypotheses are not respected by the modeling errors. Awareness of the fact that the hydrological modeling process is affected by more, and more important, sources of uncertainty that the purely observational, the only source of error considered by SLS, has contributed to the appearance of publications that suggest the need for applying more appropriate inference methods on hydrological models, and in general, on environmental models. The adequacy of inference methods involves considering all sources of error, or their effects, which influence the modeling process. Only in this way it is possible to obtain reliable parameters, a non-biased prediction and a correct estimation of the uncertainty of both, these being the main objectives of this Doctoral Thesis. To this end, this thesis proposes the joint inference, following the Bayesian inference paradigm, of hydrological parameters and the parameters of a generalized error model, which provides the necessary flexibility to relax all hypotheses (Gaussian errors with null mean, independent and identically distributed), which disable the SLS error model to infer hydrological models. The main contribution of the thesis is the proposition of the methodology to follow, for the correct application of the direct modeling (without previous transformation of the variables) of the variance of the errors. This methodology is based on the need to consider the coupling, during the joint inference, between the variations of the marginal distribution of the errors and the variations of their conditional distributions, which are modeled by the error model. Such coupling is guaranteed by the fulfillment of the Total Laws (TLs) of the expectation and the variance. In order to verify the feasibility of the theoretical aspects deduced in the Thesis, a series of inference experiments is performed in which two lumped and one distributed hydrological models are combined with two classic error models (SLS and WLS) and two generalized error models proposed in this Thesis (GL++ and GL++Bias). The results show, once again, that inferences with SLS and WLS are not applicable to hydrological models, since the generated errors do not fulfill their hypotheses. Likewise, based on the results obtained, this thesis can be considered as the culminated affirmation of the hypothesis in it, that is to say: the non application of the TLs in the direct modeling of the variance and the bias of the errors produces an incorrect estimation of the hydrological parameters, as well as their uncertainty and an erroneous estimation of the predictive distribution. / A lo largo de los años, el método de los mínimos cuadrados (SLS) ha sido el método de inferencia comúnmente aplicado en modelación hidrológica, a pesar de que sus hipótesis no son respetadas por los errores en los resultados de la modelación. La concienciación sobre el hecho de que el proceso de modelación hidrológica es afectado por más, y más importantes, fuentes de incertidumbre que la puramente observacional, única fuente de error considerada por el SLS, ha contribuido a la aparición de publicaciones que sugieren la necesidad de aplicar métodos de inferencia más adecuados para los modelos hidrológicos, y en general, los modelos ambientales. La adecuación de los métodos de inferencia pasa por considerar todas las fuentes de error, o sus efectos, que influyen en el proceso de modelación. Solamente de esa manera es posible la obtención de unos parámetros fiables, una predicción no sesgada y una estimación correcta de la incertidumbre de ambos, siendo estos los objetivos principales de esta Tesis Doctoral. Para ello, esta Tesis plantea la inferencia conjunta, siguiendo el paradigma de los métodos de inferencia Bayesianos, de los parámetros hidrológicos y los parámetros de un modelo de error generalizado, el cual proporciona la flexibilidad necesaria para relajar todas las hipótesis (errores Gaussianos con media nula, independientes e idénticamente distribuidos), que inhabilitan al modelo de error SLS para inferir modelos hidrológicos. La principal aportación de la Tesis es la proposición de la metodología a seguir para la correcta aplicación de la modelación directa (sin previa transformación de las variables) de la varianza de los errores. Dicha metodología se fundamenta en la necesidad de considerar el acoplamiento, durante la inferencia conjunta, entre las variaciones de la distribución marginal de los errores y las variaciones de las distribuciones condicionales, las cuales son modeladas por el modelo de error. Dicho acoplamiento se garantiza mediante el cumplimiento de las Leyes Totales (TLs) de la esperanza y de la varianza. Para la comprobación de la viabilidad de los aspectos teóricos deducidos en la Tesis, se realiza una serie de experimentos de inferencia en los que se combina 2 modelos hidrológicos agregados y uno distribuido, con dos modelos de error clásicos (SLS y WLS) y dos modelos de error generalizados propuestos en esta Tesis (GL++ y GL++Bias). Los resultados muestran, una vez más, que las inferencias con SLS y WLS no son aplicables a modelos hidrológicos, puesto que los errores generados no cumplen con sus hipótesis. Igualmente, en base a los resultados obtenidos, esta Tesis se puede considerar como la afirmación culminada de la hipótesis en ella planteada, esto es: la no aplicación de las TLs en la modelación directa de la varianza y el sesgo de los errores produce una incorrecta estimación de los parámetros hidrológicos, así como de su incertidumbre y una errónea estimación de la distribución predictiva. / Al llarg dels anys, el mètode dels mínims quadrats (SLS) ha sigut el mètode d'inferència comunament aplicat en modelació hidrològica, a pesar que les seues hipòtesis no són respectades pels errors en els resultats de la modelació. La conscienciació sobre el fet de que el procés de modelació hidrològica és afectat per més, i més importants, fonts d'incertesa que la purament observacional, única font d'error considerada pel SLS, ha contribuït a l'aparició de publicacions que suggerixen la necessitat d'aplicar mètodes d'inferència més adequats per als models hidrològics, i en general, els models ambientals. L'adequació dels mètodes d'inferència passa per considerar totes les fonts d'error, o els seus efectes, que influïxen en el procés de modelació. Només d'eixa manera és possible l'obtenció d'uns paràmetres fiables, una predicció no esbiaixada i una estimació correcta de la incertesa d'ambdós, sent estos els objectius principals d'esta Tesi Doctoral. Per a això, esta Tesi planteja la inferència conjunta, seguint el paradigma dels mètodes d'inferència Bayesianos, dels paràmetres hidrològics i els paràmetres d'un model d'error generalitzat, el qual proporciona la flexibilitat necessària per a relaxar totes les hipòtesis (errors Gaussianos amb mitja nul·la, independents i idènticament distribuïts), que inhabiliten el model d'error SLS per a inferir models hidrològics. La principal aportació de la Tesi és la proposició de la metodologia que s'ha de seguir per a la correcta aplicació de la modelació directa (sense prèvia transformació de les variables) de la varianza dels errors. La dita metodologia es fonamenta en la necessitat de considerar l'adaptament, durant la inferència conjunta, entre les variacions de la distribució marginal dels errors i les variacions de les distribucions condicional, les quals són modelades pel model d'error. El dit adaptament es garantix per mitjà del compliment de les Lleis Totals (TLs) de l'esperança i de la variància. Per a la comprovació de la viabilitat dels aspectes teòrics deduïts en la Tesi, es realitza una sèrie d'experiments d'inferència en què es combina 2 models hidrològics agregats i un distribuït, amb dos models d'error clàssics (SLS i WLS) i dos models d'error generalitzats proposats en esta Tesi (GL++ i GL++Bias). Els resultats mostren, una vegada més, que les inferències amb SLS i WLS no són aplicables a models hidrològics, ja que els errors generats no complixen amb les seues hipòtesis. Igualment, basant-se en els resultats obtinguts, esta Tesi es pot considerar com l'afirmació culminada de la hipòtesi en ella plantejada, açò és: la no aplicació de les TLs en la modelació directa de la variància i el biaix dels errors produïx una incorrecta estimació dels paràmetres hidrològics, així com de la seua incertesa i una errònia estimació de la distribució predictiva. / Hernández López, MR. (2017). Inferencia Bayesiana conjunta de modelos hidrológicos y modelos de error generalizados, para la evaluación de las incertidumbres predictiva y de los parámetros [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90652 / TESIS
68

Stochastic Computer Model Calibration and Uncertainty Quantification

Fadikar, Arindam 24 July 2019 (has links)
This dissertation presents novel methodologies in the field of stochastic computer model calibration and uncertainty quantification. Simulation models are widely used in studying physical systems, which are often represented by a set of mathematical equations. Inference on true physical system (unobserved or partially observed) is drawn based on the observations from corresponding computer simulation model. These computer models are calibrated based on limited ground truth observations in order produce realistic predictions and associated uncertainties. Stochastic computer model differs from traditional computer model in the sense that repeated execution results in different outcomes from a stochastic simulation. This additional uncertainty in the simulation model requires to be handled accordingly in any calibration set up. Gaussian process (GP) emulator replaces the actual computer simulation when it is expensive to run and the budget is limited. However, traditional GP interpolator models the mean and/or variance of the simulation output as function of input. For a simulation where marginal gaussianity assumption is not appropriate, it does not suffice to emulate only the mean and/or variance. We present two different approaches addressing the non-gaussianity behavior of an emulator, by (1) incorporating quantile regression in GP for multivariate output, (2) approximating using finite mixture of gaussians. These emulators are also used to calibrate and make forward predictions in the context of an Agent Based disease model which models the Ebola epidemic outbreak in 2014 in West Africa. The third approach employs a sequential scheme which periodically updates the uncertainty inn the computer model input as data becomes available in an online fashion. Unlike other two methods which use an emulator in place of the actual simulation, the sequential approach relies on repeated run of the actual, potentially expensive simulation. / Doctor of Philosophy / Mathematical models are versatile and often provide accurate description of physical events. Scientific models are used to study such events in order to gain understanding of the true underlying system. These models are often complex in nature and requires advance algorithms to solve their governing equations. Outputs from these models depend on external information (also called model input) supplied by the user. Model inputs may or may not have a physical meaning, and can sometimes be only specific to the scientific model. More often than not, optimal values of these inputs are unknown and need to be estimated from few actual observations. This process is known as inverse problem, i.e. inferring the input from the output. The inverse problem becomes challenging when the mathematical model is stochastic in nature, i.e., multiple execution of the model result in different outcome. In this dissertation, three methodologies are proposed that talk about the calibration and prediction of a stochastic disease simulation model which simulates contagion of an infectious disease through human-human contact. The motivating examples are taken from the Ebola epidemic in West Africa in 2014 and seasonal flu in New York City in USA.
69

Exact Markov Chain Monte Carlo with Likelihood Approximations for Functional Linear Models

Smith, Corey James 28 September 2018 (has links)
No description available.
70

A Comparison of Two MCMC Algorithms for Estimating the 2PL IRT Models

Chang, Meng-I 01 August 2017 (has links) (PDF)
The fully Bayesian estimation via the use of Markov chain Monte Carlo (MCMC) techniques has become popular for estimating item response theory (IRT) models. The current development of MCMC includes two major algorithms: Gibbs sampling and the No-U-Turn sampler (NUTS). While the former has been used with fitting various IRT models, the latter is relatively new, calling for the research to compare it with other algorithms. The purpose of the present study is to evaluate the performances of these two emerging MCMC algorithms in estimating two two-parameter logistic (2PL) IRT models, namely, the 2PL unidimensional model and the 2PL multi-unidimensional model under various test situations. Through investigating the accuracy and bias in estimating the model parameters given different test lengths, sample sizes, prior specifications, and/or correlations for these models, the key motivation is to provide researchers and practitioners with general guidelines when it comes to estimating a UIRT model and a multi-unidimensional IRT model. The results from the present study suggest that NUTS is equally effective as Gibbs sampling at parameter estimation under most conditions for the 2PL IRT models. Findings also shed light on the use of the two MCMC algorithms with more complex IRT models.

Page generated in 0.0583 seconds