• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 104
  • 25
  • 23
  • 16
  • 3
  • 2
  • Tagged with
  • 224
  • 224
  • 36
  • 34
  • 28
  • 27
  • 26
  • 24
  • 23
  • 22
  • 22
  • 21
  • 19
  • 19
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Bayesian Methods and Computation for Large Observational Datasets

Watts, Krista Leigh 30 September 2013 (has links)
Much health related research depends heavily on the analysis of a rapidly expanding universe of observational data. A challenge in analysis of such data is the lack of sound statistical methods and tools that can address multiple facets of estimating treatment or exposure effects in observational studies with a large number of covariates. We sought to advance methods to improve analysis of large observational datasets with an end goal of understanding the effect of treatments or exposures on health. First we compared existing methods for propensity score (PS) adjustment, specifically Bayesian propensity scores. This concept had previously been introduced (McCandless et al., 2009) but no rigorous evaluation had been done to evaluate the impact of feedback when fitting the joint likelihood for both the PS and outcome models. We determined that unless specific steps were taken to mitigate the impact of feedback, it has the potential to distort estimates of the treatment effect. Next, we developed a method for accounting for uncertainty in confounding adjustment in the context of multiple exposures. Our method allows us to select confounders based on their association with the joint exposure and the outcome while also accounting for the uncertainty in the confounding adjustment. Finally, we developed two methods to combine het- erogenous sources of data for effect estimation, specifically information coming from a primary data source that provides information for treatments, outcomes, and a limited set of measured confounders on a large number of people and smaller supplementary data sources containing a much richer set of covariates. Our methods avoid the need to specify the full joint distribution of all covariates.
72

Rich Linguistic Structure from Large-Scale Web Data

Yamangil, Elif 18 October 2013 (has links)
The past two decades have shown an unexpected effectiveness of Web-scale data in natural language processing. Even the simplest models, when paired with unprecedented amounts of unstructured and unlabeled Web data, have been shown to outperform sophisticated ones. It has been argued that the effectiveness of Web-scale data has undermined the necessity of sophisticated modeling or laborious data set curation. In this thesis, we argue for and illustrate an alternative view, that Web-scale data not only serves to improve the performance of simple models, but also can allow the use of qualitatively more sophisticated models that would not be deployable otherwise, leading to even further performance gains. / Engineering and Applied Sciences
73

Capture-recapture Estimation for Conflict Data and Hierarchical Models for Program Impact Evaluation

Mitchell, Shira Arkin 07 June 2014 (has links)
A relatively recent increase in the popularity of evidence-based activism has created a higher demand for statisticians to work on human rights and economic development projects. The statistical challenges of revealing patterns of violence in armed conflict require efficient use of the data, and careful consideration of the implications of modeling decisions on estimates. Impact evaluation of a complex economic development project requires a careful consideration of causality and transparency to donors and beneficiaries. In this dissertation, I compare marginal and conditional models for capture recapture, and develop new hierarchical models that accommodate challenges in data from the armed conflict in Colombia, and more generally, in many other capture recapture settings. Additionally, I propose a study design for a non-randomized impact evaluation of the Millennium Villages Project (MVP), to be carried out during my postdoctoral fellowship. The design includes small area estimation of baseline variables, propensity score matching, and hierarchical models for causal inference.
74

Bayesian analysis of the complex Bingham distribution

Leu, Richard Hsueh-Yee 21 February 2011 (has links)
While most statistical applications involve real numbers, some demand complex numbers. Statistical shape analysis is one such area. The complex Bingham distribution is utilized in the shape analysis of landmark data in two dimensions. Previous analysis of data arising from this distribution involved classical statistical techniques. In this report, a full Bayesian inference was carried out on the posterior distribution of the parameter matrix when data arise from a complex Bingham distribution. We utilized a Markov chain Monte Carlo algorithm to sample the posterior distribution of the parameters. A Metropolis-Hastings algorithm sampled the posterior conditional distribution of the eigenvalues while a successive conditional Monte Carlo sampler was used to sample the eigenvectors. The method was successfully verifi ed on simulated data, using both at and informative priors. / text
75

Statistical Study of Magnetic Field Reversals in Geodynamo Models and Paleomagnetic Data

Meduri, Domenico Giovanni 29 October 2014 (has links)
No description available.
76

Bayesian Hierarchical Models for Model Choice

Li, Yingbo January 2013 (has links)
<p>With the development of modern data collection approaches, researchers may collect hundreds to millions of variables, yet may not need to utilize all explanatory variables available in predictive models. Hence, choosing models that consist of a subset of variables often becomes a crucial step. In linear regression, variable selection not only reduces model complexity, but also prevents over-fitting. From a Bayesian perspective, prior specification of model parameters plays an important role in model selection as well as parameter estimation, and often prevents over-fitting through shrinkage and model averaging.</p><p>We develop two novel hierarchical priors for selection and model averaging, for Generalized Linear Models (GLMs) and normal linear regression, respectively. They can be considered as "spike-and-slab" prior distributions or more appropriately "spike- and-bell" distributions. Under these priors we achieve dimension reduction, since their point masses at zero allow predictors to be excluded with positive posterior probability. In addition, these hierarchical priors have heavy tails to provide robust- ness when MLE's are far from zero.</p><p>Zellner's g-prior is widely used in linear models. It preserves correlation structure among predictors in its prior covariance, and yields closed-form marginal likelihoods which leads to huge computational savings by avoiding sampling in the parameter space. Mixtures of g-priors avoid fixing g in advance, and can resolve consistency problems that arise with fixed g. For GLMs, we show that the mixture of g-priors using a Compound Confluent Hypergeometric distribution unifies existing choices in the literature and maintains their good properties such as tractable (approximate) marginal likelihoods and asymptotic consistency for model selection and parameter estimation under specific values of the hyper parameters.</p><p>While the g-prior is invariant under rotation within a model, a potential problem with the g-prior is that it inherits the instability of ordinary least squares (OLS) estimates when predictors are highly correlated. We build a hierarchical prior based on scale mixtures of independent normals, which incorporates invariance under rotations within models like ridge regression and the g-prior, but has heavy tails like the Zeller-Siow Cauchy prior. We find this method out-performs the gold standard mixture of g-priors and other methods in the case of highly correlated predictors in Gaussian linear models. We incorporate a non-parametric structure, the Dirichlet Process (DP) as a hyper prior, to allow more flexibility and adaptivity to the data.</p> / Dissertation
77

On Bayesian Analyses of Functional Regression, Correlated Functional Data and Non-homogeneous Computer Models

Montagna, Silvia January 2013 (has links)
<p>Current frontiers in complex stochastic modeling of high-dimensional processes include major emphases on so-called functional data: problems in which the data are snapshots of curves and surfaces representing fundamentally important scientific quantities. This thesis explores new Bayesian methodologies for functional data analysis. </p><p>The first part of the thesis places emphasis on the role of factor models in functional data analysis. Data reduction becomes mandatory when dealing with such high-dimensional data, more so when data are available on a large number of individuals. In Chapter 2 we present a novel Bayesian framework which employs a latent factor construction to represent each variable by a low dimensional summary. Further, we explore the important issue of modeling and analyzing the relationship of functional data with other covariate and outcome variables simultaneously measured on the same subjects.</p><p>The second part of the thesis is concerned with the analysis of circadian data. The focus is on the identification of circadian genes that is, genes whose expression levels appear to be rhythmic through time with a period of approximately 24 hours. While addressing this goal, most of the current literature does not account for the potential dependence across genes. In Chapter 4, we propose a Bayesian approach which employs latent factors to accommodate dependence and verify patterns and relationships between genes, while representing the true gene expression trajectories in the Fourier domain allows for inference on period, phase, and amplitude of the signal.</p><p>The third part of the thesis is concerned with the statistical analysis of computer models (simulators). The heavy computational demand of these input-output maps calls for statistical techniques that quickly estimate the surface output at untried inputs given a few preliminary runs of the simulator at a set design points. In this regard, we propose a Bayesian methodology based on a non-stationary Gaussian process. Relying on a model-based assessment of uncertainty, we envision a sequential design technique which helps choosing input points where the simulator should be run to minimize the uncertainty in posterior surface estimation in an optimal way. The proposed non-stationary approach adapts well to output surfaces of unconstrained shape.</p> / Dissertation
78

Monitoring and Improving Markov Chain Monte Carlo Convergence by Partitioning

VanDerwerken, Douglas January 2015 (has links)
<p>Since Bayes' Theorem was first published in 1762, many have argued for the Bayesian paradigm on purely philosophical grounds. For much of this time, however, practical implementation of Bayesian methods was limited to a relatively small class of "conjugate" or otherwise computationally tractable problems. With the development of Markov chain Monte Carlo (MCMC) and improvements in computers over the last few decades, the number of problems amenable to Bayesian analysis has increased dramatically. The ensuing spread of Bayesian modeling has led to new computational challenges as models become more complex and higher-dimensional, and both parameter sets and data sets become orders of magnitude larger. This dissertation introduces methodological improvements to deal with these challenges. These include methods for enhanced convergence assessment, for parallelization of MCMC, for estimation of the convergence rate, and for estimation of normalizing constants. A recurring theme across these methods is the utilization of one or more chain-dependent partitions of the state space.</p> / Dissertation
79

OBSCURATION IN ACTIVE GALACTIC NUCLEI

Nikutta, Robert 01 January 2012 (has links)
All classes of Active Galactic Nuclei (AGN) are fundamentally powered by accretion of gas onto a supermassive black hole. The process converts the potential energy of the infalling matter to X-ray and ultraviolet (UV) radiation, releasing up to several 1012 solar luminosities. Observations show that the accreting "central engines" in AGN are surrounded by dusty matter. The dust occupies a "torus" around the AGN which is comprised of discrete clumps. If the AGN radiation is propagating through the torus on its way to an observer, it will be heavily re-processed by the dust, i.e. converted from UV to infrared (IR) wavelengths. Much of the information about the input radiation is lost in this conversion process while an imprint of the dusty torus is left in the released IR photons. Our group was the first to formulate a consistent treatment of radiative transfer in a clumpy medium an important improvement over simpler models with smooth dust distributions previously used by researchers. Our code CLUMPY computes spectral energy distributions (SED) for any set of model parameters values. Fitting these models to observed AGN SEDs allows us to determine important quantities, such as the torus size, the spatial distribution of clumps, the torus covering factor, or the intrinsic AGN luminosity. Detailed modeling also permits us to study the complex behavior of certain spectral features. IR radiative transfer introduces degeneracies to the solution space: different parameter values can yield similar SEDs. The geometry of the torus further exacerbates the problem. Knowing the amount of parameter degeneracy present in our models is important for quantifying the confidence in data fits. When matching the models to observed SEDs we must employ modern statistical methods. In my research I use Bayesian statistics to determine the likely ranges of parameter values. I have developed all tools required for fitting observed SEDs with our large model database: the latest implementation of CLUMPY, the fit algorithms, the Markov Chain Monte Carlo sampler, and the Bayesian estimator. In collaboration with observing groups we have applied our methods to a multitude of real-life AGN.
80

Computational Systems Biology of Saccharomyces cerevisiae Cell Growth and Division

Mayhew, Michael Benjamin January 2014 (has links)
<p>Cell division and growth are complex processes fundamental to all living organisms. In the budding yeast, <italic>Saccharomyces cerevisiae</italic>, these two processes are known to be coordinated with one another as a cell's mass must roughly double before division. Moreover, cell-cycle progression is dependent on cell size with smaller cells at birth generally taking more time in the cell cycle. This dependence is a signature of size control. Systems biology is an emerging field that emphasizes connections or dependencies between biological entities and processes over the characteristics of individual entities. Statistical models provide a quantitative framework for describing and analyzing these dependencies. In this dissertation, I take a statistical systems biology approach to study cell division and growth and the dependencies within and between these two processes, drawing on observations from richly informative microscope images and time-lapse movies. I review the current state of knowledge on these processes, highlighting key results and open questions from the biological literature. I then discuss my development of machine learning and statistical approaches to extract cell-cycle information from microscope images and to better characterize the cell-cycle progression of populations of cells. In addition, I analyze single cells to uncover correlation in cell-cycle progression, evaluate potential models of dependence between growth and division, and revisit classical assertions about budding yeast size control. This dissertation presents a unique perspective and approach towards comprehensive characterization of the coordination between growth and division.</p> / Dissertation

Page generated in 0.047 seconds