• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 87
  • 15
  • 15
  • 7
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 154
  • 154
  • 70
  • 31
  • 27
  • 24
  • 22
  • 21
  • 17
  • 17
  • 15
  • 15
  • 14
  • 14
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Chromosome 3D Structure Modeling and New Approaches For General Statistical Inference

Rongrong Zhang (5930474) 03 January 2019 (has links)
<div>This thesis consists of two separate topics, which include the use of piecewise helical models for the inference of 3D spatial organizations of chromosomes and new approaches for general statistical inference. The recently developed Hi-C technology enables a genome-wide view of chromosome</div><div>spatial organizations, and has shed deep insights into genome structure and genome function. However, multiple sources of uncertainties make downstream data analysis and interpretation challenging. Specically, statistical models for inferring three-dimensional (3D) chromosomal structure from Hi-C data are far from their maturity. Most existing methods are highly over-parameterized, lacking clear interpretations, and sensitive to outliers. We propose a parsimonious, easy to interpret, and robust piecewise helical curve model for the inference of 3D chromosomal structures</div><div>from Hi-C data, for both individual topologically associated domains and whole chromosomes. When applied to a real Hi-C dataset, the piecewise helical model not only achieves much better model tting than existing models, but also reveals that geometric properties of chromatin spatial organization are closely related to genome function.</div><div><br></div><div><div>For potential applications in big data analytics and machine learning, we propose to use deep neural networks to automate the Bayesian model selection and parameter estimation procedures. Two such frameworks are developed under different scenarios. First, we construct a deep neural network-based Bayes estimator for the parameters of a given model. The neural Bayes estimator mitigates the computational challenges faced by traditional approaches for computing Bayes estimators. When applied to the generalized linear mixed models, the neural Bayes estimator</div><div>outperforms existing methods implemented in R packages and SAS procedures. Second, we construct a deep convolutional neural networks-based framework to perform</div><div>simultaneous Bayesian model selection and parameter estimation. We refer to the neural networks for model selection and parameter estimation in the framework as the</div><div>neural model selector and parameter estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation</div><div>study shows that both the neural selector and estimator demonstrate excellent performances.</div></div><div><br></div><div><div>The theory of Conditional Inferential Models (CIMs) has been introduced to combine information for efficient inference in the Inferential Models framework for priorfree</div><div>and yet valid probabilistic inference. While the general theory is subject to further development, the so-called regular CIMs are simple. We establish and prove a</div><div>necessary and sucient condition for the existence and identication of regular CIMs. More specically, it is shown that for inference based on a sample from continuous</div><div>distributions with unknown parameters, the corresponding CIM is regular if and only if the unknown parameters are generalized location and scale parameters, indexing</div><div>the transformations of an affine group.</div></div>
22

Um modelo Bayesiano de meta-análise para dados de ChIP-Seq / A meta-analysis Bayesian model for ChIP-Seq data

Andrade, Pablo de Morais 17 April 2017 (has links)
Com o desenvolvimento do sequenciamento em larga escala, novas tecnologias surgiram para auxiliar o estudo de sequências de ácidos nucleicos (DNA e cDNA); como consequência, o desenvolvimento de novas ferramentas para analisar o grande volume de dados gerados fez-se necessário. Entre essas novas tecnologias, uma, em particular, chamada Imunoprecipitação de Cromatina seguida de sequenciamento de DNA em larga escala ou CHIP-Seq, tem recebido muita atenção nos últimos anos. Esta tecnologia tornou-se um método usado amplamente para mapear sítios de ligação de proteínas de interesse no genoma. A análise de dados resultantes de experimentos de ChIP-Seq é desaadora porque o mapeamento das sequências no genoma apresenta diferentes formas de viés. Os métodos existentes usados para encontrar picos em dados de ChIP-Seq apresentam limitações relacionadas ao número de amostras de controle e tratamento usadas, e em relação à forma como essas amostras são combinadas. Nessa tese, mostramos que métodos baseados em testes estatísticos de hipótese tendem a encontrar um número muito maior de picos à medida que aumentamos o tamanho da amostra, o que os torna pouco conáveis para análise de um grande volume de dados. O presente estudo descreve um método estatístico Bayesiano, que utiliza meta-análise para encontrar sítios de ligação de proteínas de interesse no genoma resultante de experimentos de ChIPSeq. Esse métodos foi chamado Meta-Analysis Bayesian Approach ou MABayApp. Nós mostramos que o nosso método é robusto e pode ser utilizado com diferentes números de amostras de controle e tratamentos, assim como quando comparando amostras provenientes de diferentes tratamentos. / With the development of high-throughput sequencing, new technologies emerged for the study of nucleic acid sequences (DNA and cDNA) and as a consequence, the necessity for tools to analyse a great volume of data was made necessary. Among these new technologies, one in special Chromatin Immunoprecipitation followed by massive parallel DNA Sequencing, or ChIP-Seq, has been evidenced during the last years. This technology has become a widely used method to map locations of binding sites for a given protein in the genome. The analysis of data resulting from ChIP-Seq experiments is challenging since it can have dierent sources of bias during the sequencing and mapping of reads to the genome. Current methods used to nd peaks in this ChIP-Seq have limitations regarding the number of treatment and control samples used and on how these samples should be used together. In this thesis we show that since most of these methods are based on traditional statistical hypothesis tests, by increasing the sample size the number of peaks considered signicant changes considerably. This study describes a Bayesian statistical method using meta-analysis to discover binding sites of a protein of interest based on peaks of reads found in ChIP-Seq data. We call it Meta- Analysis Bayesian Approach or MABayApp. We show that our method is robust and can be used for dierent number of control and treatment samples, as well as when comparing samples under dierent treatments.
23

Forecasting the Equity Premium and Optimal Portfolios

Bjurgert, Johan, Edstrand, Marcus January 2008 (has links)
The expected equity premium is an important parameter in many financial models, especially within portfolio optimization. A good forecast of the future equity premium is therefore of great interest. In this thesis we seek to forecast the equity premium, use it in portfolio optimization and then give evidence on how sensitive the results are to estimation errors and how the impact of these can be minimized. Linear prediction models are commonly used by practitioners to forecast the expected equity premium, this with mixed results. To only choose the model that performs the best in-sample for forecasting, does not take model uncertainty into account. Our approach is to still use linear prediction models, but also taking model uncertainty into consideration by applying Bayesian model averaging. The predictions are used in the optimization of a portfolio with risky assets to investigate how sensitive portfolio optimization is to estimation errors in the mean vector and covariance matrix. This is performed by using a Monte Carlo based heuristic called portfolio resampling. The results show that the predictive ability of linear models is not substantially improved by taking model uncertainty into consideration. This could mean that the main problem with linear models is not model uncertainty, but rather too low predictive ability. However, we find that our approach gives better forecasts than just using the historical average as an estimate. Furthermore, we find some predictive ability in the the GDP, the short term spread and the volatility for the five years to come. Portfolio resampling proves to be useful when the input parameters in a portfolio optimization problem is suffering from vast uncertainty.
24

Characterizing User Search Intent and Behavior for Click Analysis in Sponsored Search

Ashkan, Azin January 2013 (has links)
Interpreting user actions to better understand their needs provides an important tool for improving information access services. In the context of organic Web search, considerable effort has been made to model user behavior and infer query intent, with the goal of improving the overall user experience. Much less work has been done in the area of sponsored search, i.e., with respect to the advertisement links (ads) displayed on search result pages by many commercial search engines. This thesis develops and evaluates new models and methods required to interpret user browsing and click behavior and understand query intent in this very different context. The concern of the initial part of the thesis is on extending the query categories for commercial search and on inferring query intent, with a focus on two major tasks: i) enriching queries with contextual information obtained from search result pages returned for these queries, and ii) developing relatively simple methods for the reliable labeling of training data via crowdsourcing. A central idea of this thesis work is to study the impact of contextual factors (including query intent, ad placement, and page structure) on user behavior. Later, this information is incorporated into probabilistic models to evaluate the quality of advertisement links within the context that they are displayed in their history of appearance. In order to account for these factors, a number of query and location biases are proposed and formulated into a group of browsing and click models. To explore user intent and behavior and to evaluate the performance of the proposed models and methods, logs of query and click information provided for research purposes are used. Overall, query intent is found to have substantial impact on predictions of user click behavior in sponsored search. Predictions are further improved by considering ads in the context of the other ads displayed on a result page. The parameters of the browsing and click models are learned using an expectation maximization technique applied to click signals recorded in the logs. The initial motivation of the user to browse the ad list and their browsing persistence are found to be related to query intent and browsing/click behavior. Accommodating these biases along with the location bias in user models appear as effective contextual signals, improving the performance of the existing models.
25

Bayesian multivariate spatial models and their applications

Song, Joon Jin 15 November 2004 (has links)
Univariate hierarchical Bayes models are being vigorously researched for use in disease mapping, engineering, geology, and ecology. This dissertation shows how the models can also be used to build modelbased risk maps for areabased roadway tra&#64259;c crashes. Countylevel vehicle crash records and roadway data from Texas are used to illustrate the method. A potential extension that uses univariate hierarchical models to develop networkbased risk maps is also discussed. Several Bayesian multivariate spatial models for estimating the tra&#64259;c crash rates from di&#64256;erent types of crashes simultaneously are then developed. The speci&#64257;c class of spatial models considered is conditional autoregressive (CAR) model. The univariate CAR model is generalized for several multivariate cases. A general theorem for each case is provided to ensure that the posterior distribution is proper under improper and &#64258;at prior. The performance of various multivariate spatial models is compared using a Bayesian information criterion. The Markov chain Monte Carlo (MCMC) computational techniques are used for the model parameter estimation and statistical inference. These models are illustrated and compared again with the Texas crash data. There are many directions in which this study can be extended. This dissertation concludes with a short summary of this research and recommends several promising extensions.
26

Bayesian Hierarchical Model for Combining Two-resolution Metrology Data

Xia, Haifeng 14 January 2010 (has links)
This dissertation presents a Bayesian hierarchical model to combine two-resolution metrology data for inspecting the geometric quality of manufactured parts. The high- resolution data points are scarce, and thus scatter over the surface being measured, while the low-resolution data are pervasive, but less accurate or less precise. Combining the two datasets could supposedly make a better prediction of the geometric surface of a manufactured part than using a single dataset. One challenge in combining the metrology datasets is the misalignment which exists between the low- and high-resolution data points. This dissertation attempts to provide a Bayesian hierarchical model that can handle such misaligned datasets, and includes the following components: (a) a Gaussian process for modeling metrology data at the low-resolution level; (b) a heuristic matching and alignment method that produces a pool of candidate matches and transformations between the two datasets; (c) a linkage model, conditioned on a given match and its associated transformation, that connects a high-resolution data point to a set of low-resolution data points in its neighborhood and makes a combined prediction; and finally (d) Bayesian model averaging of the predictive models in (c) over the pool of candidate matches found in (b). This Bayesian model averaging procedure assigns weights to different matches according to how much they support the observed data, and then produces the final combined prediction of the surface based on the data of both resolutions. The proposed method improves upon the methods of using a single dataset as well as a combined prediction without addressing the misalignment problem. This dissertation demonstrates the improvements over alternative methods using both simulated data and the datasets from a milled sine-wave part, measured by two coordinate measuring machines of different resolutions, respectively.
27

Forecasting the Equity Premium and Optimal Portfolios

Bjurgert, Johan, Edstrand, Marcus January 2008 (has links)
<p>The expected equity premium is an important parameter in many financial models, especially within portfolio optimization. A good forecast of the future equity premium is therefore of great interest. In this thesis we seek to forecast the equity premium, use it in portfolio optimization and then give evidence on how sensitive the results are to estimation errors and how the impact of these can be minimized.</p><p>Linear prediction models are commonly used by practitioners to forecast the expected equity premium, this with mixed results. To only choose the model that performs the best in-sample for forecasting, does not take model uncertainty into account. Our approach is to still use linear prediction models, but also taking model uncertainty into consideration by applying Bayesian model averaging. The predictions are used in the optimization of a portfolio with risky assets to investigate how sensitive portfolio optimization is to estimation errors in the mean vector and covariance matrix. This is performed by using a Monte Carlo based heuristic called portfolio resampling.</p><p>The results show that the predictive ability of linear models is not substantially improved by taking model uncertainty into consideration. This could mean that the main problem with linear models is not model uncertainty, but rather too low predictive ability. However, we find that our approach gives better forecasts than just using the historical average as an estimate. Furthermore, we find some predictive ability in the the GDP, the short term spread and the volatility for the five years to come. Portfolio resampling proves to be useful when the input parameters in a portfolio optimization problem is suffering from vast uncertainty. </p>
28

Model Likelihoods and Bayes Factors for Switching and Mixture Models

Frühwirth-Schnatter, Sylvia January 2000 (has links) (PDF)
In the present paper we explore various approaches of computing model likelihoods from the MCMC output for mixture and switching models, among them the candidate's formula, importance sampling, reciprocal importance sampling and bridge sampling. We demonstrate that the candidate's formula is sensitive to label switching. It turns out that the best method to estimate the model likelihood is the bridge sampling technique, where the MCMC sample is combined with an iid sample from an importance density. The importance density is constructed in an unsupervised manner from the MCMC output using a mixture of complete data posteriors. Whereas the importance sampling estimator as well as the reciprocal importance sampling estimator are sensitive to the tail behaviour of the importance density, we demonstrate that the bridge sampling estimator is far more robust in this concern. Our case studies range from from selecting the number of classes in a mixture of multivariate normal distributions, testing for the inhomogeneity of a discrete time Poisson process, to testing for the presence of Markov switching and order selection in the MSAR model. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
29

Bayesian Hierarchical Models for Model Choice

Li, Yingbo January 2013 (has links)
<p>With the development of modern data collection approaches, researchers may collect hundreds to millions of variables, yet may not need to utilize all explanatory variables available in predictive models. Hence, choosing models that consist of a subset of variables often becomes a crucial step. In linear regression, variable selection not only reduces model complexity, but also prevents over-fitting. From a Bayesian perspective, prior specification of model parameters plays an important role in model selection as well as parameter estimation, and often prevents over-fitting through shrinkage and model averaging.</p><p>We develop two novel hierarchical priors for selection and model averaging, for Generalized Linear Models (GLMs) and normal linear regression, respectively. They can be considered as "spike-and-slab" prior distributions or more appropriately "spike- and-bell" distributions. Under these priors we achieve dimension reduction, since their point masses at zero allow predictors to be excluded with positive posterior probability. In addition, these hierarchical priors have heavy tails to provide robust- ness when MLE's are far from zero.</p><p>Zellner's g-prior is widely used in linear models. It preserves correlation structure among predictors in its prior covariance, and yields closed-form marginal likelihoods which leads to huge computational savings by avoiding sampling in the parameter space. Mixtures of g-priors avoid fixing g in advance, and can resolve consistency problems that arise with fixed g. For GLMs, we show that the mixture of g-priors using a Compound Confluent Hypergeometric distribution unifies existing choices in the literature and maintains their good properties such as tractable (approximate) marginal likelihoods and asymptotic consistency for model selection and parameter estimation under specific values of the hyper parameters.</p><p>While the g-prior is invariant under rotation within a model, a potential problem with the g-prior is that it inherits the instability of ordinary least squares (OLS) estimates when predictors are highly correlated. We build a hierarchical prior based on scale mixtures of independent normals, which incorporates invariance under rotations within models like ridge regression and the g-prior, but has heavy tails like the Zeller-Siow Cauchy prior. We find this method out-performs the gold standard mixture of g-priors and other methods in the case of highly correlated predictors in Gaussian linear models. We incorporate a non-parametric structure, the Dirichlet Process (DP) as a hyper prior, to allow more flexibility and adaptivity to the data.</p> / Dissertation
30

Model Likelihoods and Bayes Factors for Switching and Mixture Models

Frühwirth-Schnatter, Sylvia January 2002 (has links) (PDF)
In the present paper we discuss the problem of estimating model likelihoods from the MCMC output for a general mixture and switching model. Estimation is based on the method of bridge sampling (Meng and Wong, 1996), where the MCMC sample is combined with an iid sample from an importance density. The importance density is constructed in an unsupervised manner from the MCMC output using a mixture of complete data posteriors. Whereas the importance sampling estimator as well as the reciprocal importance sampling estimator are sensitive to the tail behaviour of the importance density, we demonstrate that the bridge sampling estimator is far more robust in this concern. Our case studies range from computing marginal likelihoods for a mixture of multivariate normal distributions, testing for the inhomogeneity of a discrete time Poisson process, to testing for the presence of Markov switching and order selection in the MSAR model. (author's abstract) / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"

Page generated in 0.0641 seconds