• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 2
  • 1
  • Tagged with
  • 32
  • 32
  • 12
  • 11
  • 10
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

rstream: Streams of Random Numbers for Stochastic Simulation

L'Ecuyer, Pierre, Leydold, Josef January 2005 (has links) (PDF)
The package rstream provides a unified interface to streams of random numbers for the R statistical computing language. Features are: * independent streams of random numbers * substreams * easy handling of streams (initialize, reset) * antithetic random variates The paper describes this packages and demonstrates an simple example the usefulness of this approach. / Series: Preprint Series / Department of Applied Statistics and Data Processing
2

Estimating Freeway Travel Time Reliability for Traffic Operations and Planning

Yang, Shu, Yang, Shu January 2016 (has links)
Travel time reliability (TTR) has attracted increasing attention in recent years, and is often listed as one of the major roadway performance and service quality measures for both traffic engineers and travelers. Measuring travel time reliability is the first step towards improving travel time reliability, ensuring on-time arrivals, and reducing travel costs. Four components may be primarily considered, including travel time estimation/collection, quantity of travel time selection, probability distribution selection, and TTR measure selection. Travel time is a key transportation performance measure because of its diverse applications and it also serves the foundation of estimating travel time reliability. Various modelling approaches to estimating freeway travel time have been well developed due to widespread installation of intelligent transportation system sensors. However, estimating accurate travel time using existing freeway travel time models is still challenging under congested conditions. Therefore, this study aimed to develop an innovative freeway travel time estimation model based on the General Motors (GM) car-following model. Since the GM model is usually used in a micro-simulation environment, the concepts of virtual leading and virtual following vehicles are proposed to allow the GM model to be used in macro-scale environments using aggregated traffic sensor data. Travel time data collected from three study corridors on I-270 in St. Louis, Missouri was used to verify the estimated travel times produced by the proposed General Motors Travel Time Estimation (GMTTE) model and two existing models, the instantaneous model and the time-slice model. The results showed that the GMTTE model outperformed the two existing models due to lower mean average percentage errors of 1.62% in free-flow conditions and 6.66% in two congested conditions. Overall, the GMTTE model demonstrated its robustness and accuracy for estimating freeway travel times. Most travel time reliability measures are derived directly from continuous probability distributions and applied to the traffic data directly. However, little previous research shows a consensus of probability distribution family selection for travel time reliability. Different probability distribution families could yield different values for the same travel time reliability measure (e.g. standard deviation). It is believe that the specific selection of probability distribution families has few effects on measuring travel time reliability. Therefore, two hypotheses are proposed in hope of accurately measuring travel time reliability. An experiment is designed to prove the two hypotheses. The first hypothesis is proven by conducting the Kolmogorov–Smirnov test and checking log-likelihoods, and Akaike information criterion with a correction for finite sample sizes (AICc) and Bayesian information criterion (BIC) convergences; and the second hypothesis is proven by examining both moment-based and percentile-based travel time reliability measures. The results from the two hypotheses testing suggest that 1) underfitting may cause disagreement in distribution selection, 2) travel time can be precisely fitted using mixture models with higher value of the number of mixture distributions (K), regardless of the distribution family, and 3) the travel time reliability measures are insensitive to the selection of distribution family. Findings of this research allows researchers and practitioners to avoid the work of testing various distributions, and travel time reliability can be more accurately measured using mixture models due to higher value of log-likelihoods. As with travel time collection, the accuracy of the observed travel time and the optimal travel time data quantity should be determined before using the TTR data. The statistical accuracy of TTR measures should be evaluated so that the statistical behavior and belief can be fully understood. More specifically, this issue can be formulated as a question: using a certain amount of travel time data, how accurate is the travel time reliability for a specific freeway corridor, time of day (TOD), and day of week (DOW)? A framework for answering this question has not been proposed in the past. Our study proposes a framework based on bootstrapping to evaluate the accuracy of TTR measures and answer the question. Bootstrapping is a computer-based method for assigning measures of accuracy to multiple types of statistical estimators without requiring a specific probability distribution. Three scenarios representing three traffic flow conditions (free-flow, congestion, and transition) were used to fully understand the accuracy of TTR measures under different traffic conditions. The results of the accuracy measurements primarily showed that: 1) the proposed framework can facilitate assessment of the accuracy of TTR, and 2) stabilization of the TTR measures did not necessarily correspond to statistical accuracy. The findings in our study also suggested that moment-based TTR measures may not be statistically sufficient for measuring freeway TTR. Additionally, our study suggested that 4 or 5 weeks of travel time data is enough for measuring freeway TTR under free-flow conditions, 40 weeks for congested conditions, and 35 weeks for transition conditions. A considerable number of studies have contributed to measuring travel time reliability. Travel time distribution estimation is considered as an important starting input of measuring travel time reliability. Kernel density estimation (KDE) is used to estimate travel time distribution, instead of parametric probability distributions, e.g. Lognormal distribution, the two state models. The Hasofer Lind - Rackwitz Fiessler (HL-RF) algorithm, widely used in the field of reliability engineering, is applied to this work. It is used to compute the reliability index of a system based on its previous performance. The computing procedure for travel time reliability of corridors on a freeway is first introduced. Network travel time reliability is developed afterwards. Given probability distributions estimated by the KDE technique, and an anticipated travel time from travelers, the two equations of the corridor and network travel time reliability can be used to address the question, "How reliable is my perceived travel time?" The definition of travel time reliability is in the sense of "on time performance", and it is conducted inherently from the perspective of travelers. Further, the major advantages of the proposed method are: 1) The proposed method demonstrates an alternative way to estimate travel time distributions when the choice of probability distribution family is still uncertain; 2) the proposed method shows its flexibility for being applied onto different levels of roadways (e.g. individual roadway segment or network). A user-defined anticipated travel time can be input, and travelers can utilize the computed travel time reliability information to plan their trips in advance, in order to better manage trip time, reduce cost, and avoid frustration.
3

Model-based Learning: t-Families, Variable Selection, and Parameter Estimation

Andrews, Jeffrey Lambert 27 August 2012 (has links)
The phrase model-based learning describes the use of mixture models in machine learning problems. This thesis focuses on a number of issues surrounding the use of mixture models in statistical learning tasks: including clustering, classification, discriminant analysis, variable selection, and parameter estimation. After motivating the importance of statistical learning via mixture models, five papers are presented. For ease of consumption, the papers are organized into three parts: mixtures of multivariate t-families, variable selection, and parameter estimation. / Natural Sciences and Engineering Research Council of Canada through a doctoral postgraduate scholarship.
4

Joint Posterior Inference for Latent Gaussian Models and extended strategies using INLA

Chiuchiolo, Cristian 06 June 2022 (has links)
Bayesian inference is particularly challenging on hierarchical statistical models as computational complexity becomes a significant issue. Sampling-based methods like the popular Markov Chain Monte Carlo (MCMC) can provide accurate solutions, but they likely suffer a high computational burden. An attractive alternative is the Integrated Nested Laplace Approximations (INLA) approach, which is faster when applied to the broad class of Latent Gaussian Models (LGMs). The method computes fast and empirically accurate deterministic posterior marginal approximations of the model's unknown parameters. In the first part of this thesis, we discuss how to extend the software's applicability to a joint posterior inference by constructing a new class of joint posterior approximations, which also add marginal corrections for location and skewness. As these approximations result from a combination of a Gaussian Copula and internally pre-computed accurate Gaussian Approximations, we name this class Skew Gaussian Copula (SGC). By computing moments and correlation structure of a mixture representation of these distributions, we achieve new fast and accurate deterministic approximations for linear combinations in a subset of the model's latent field. The same mixture approximates a full joint posterior density through a Monte Carlo sampling on the hyperparameter set. We set highly skewed examples based on Poisson and Binomial hierarchical models and verify these new approximations using INLA and MCMC. The new skewness correction from the Skew Gaussian Copula is more consistent with the outcomes provided by the default INLA strategies. In the last part, we propose an extension of the parametric fit employed by the Simplified Laplace Approximation strategy in INLA when approximating posterior marginals. By default, the strategy matches log derivatives from a third-order Taylor expansion of each Laplace Approximation marginal with those derived from Skew Normal distributions. We consider a fourth-order term and adapt an Extended Skew Normal distribution to produce a more accurate approximation fit when skewness is large. We set similarly skewed data simulations with Poisson and Binomial likelihoods and show that the posterior marginal results from the new extended strategy are more accurate and coherent with the MCMC ones than its original version.
5

Sparse Latent-Space Learning for High-Dimensional Data: Extensions and Applications

White, Alexander James 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The successful treatment and potential eradication of many complex diseases, such as cancer, begins with elucidating the convoluted mapping of molecular profiles to phenotypical manifestation. Our observed molecular profiles (e.g., genomics, transcriptomics, epigenomics) are often high-dimensional and are collected from patient samples falling into heterogeneous disease subtypes. Interpretable learning from such data calls for sparsity-driven models. This dissertation addresses the high dimensionality, sparsity, and heterogeneity issues when analyzing multiple-omics data, where each method is implemented with a concomitant R package. First, we examine challenges in submatrix identification, which aims to find subgroups of samples that behave similarly across a subset of features. We resolve issues such as two-way sparsity, non-orthogonality, and parameter tuning with an adaptive thresholding procedure on the singular vectors computed via orthogonal iteration. We validate the method with simulation analysis and apply it to an Alzheimer’s disease dataset. The second project focuses on modeling relationships between large, matched datasets. Exploring regressional structures between large data sets can provide insights such as the effect of long-range epigenetic influences on gene expression. We present a high-dimensional version of mixture multivariate regression to detect patient clusters, each with different correlation structures of matched-omics datasets. Results are validated via simulation and applied to matched-omics data sets. In the third project, we introduce a novel approach to modeling spatial transcriptomics (ST) data with a spatially penalized multinomial model of the expression counts. This method solves the low-rank structures of zero-inflated ST data with spatial smoothness constraints. We validate the model using manual cell structure annotations of human brain samples. We then applied this technique to additional ST datasets. / 2025-05-22
6

Exact Markov Chain Monte Carlo for a Class of Diffusions

Qi Wang (14157183) 05 December 2022 (has links)
<p>This dissertation focuses on the simulation efficiency of the Markov process for two scenarios: Stochastic differential equations(SDEs) and simulated weather data. </p> <p><br></p> <p>For SDEs, we propose a novel Gibbs sampling algorithm that allows sampling from a particular class of SDEs without any discretization error and shows the proposed algorithm improves the sampling efficiency by orders of magnitude against the existing popular algorithms.  </p> <p><br></p> <p>In the weather data simulation study, we investigate how representative the simulated data are for three popular stochastic weather generators. Our results suggest the need for more than a single realization when generating weather data to obtain suitable representations of climate. </p>
7

An Efficient Implementation of a Robust Clustering Algorithm

Blostein, Martin January 2016 (has links)
Clustering and classification are fundamental problems in statistical and machine learning, with a broad range of applications. A common approach is the Gaussian mixture model, which assumes that each cluster or class arises from a distinct Gaussian distribution. This thesis studies a robust, high-dimensional extension of the Gaussian mixture model that automatically detects outliers and noise, and a computationally efficient implementation thereof. The contaminated Gaussian distribution is a robust elliptic distribution that allows for automatic detection of ``bad points'', and is used to make robust the usual factor analysis model. In turn, the mixtures of contaminated Gaussian factor analyzers (MCGFA) algorithm allows high-dimesional, robust clustering, classification and detection of bad points. A family of MCGFA models is created through the introduction of different constraints on the covariance structure. A new, efficient implementation of the algorithm is presented, along with an account of its development. The fast implementation permits thorough testing of the MCGFA algorithm, and its performance is compared to two natural competitors: parsimonious Gaussian mixture models (PGMM) and mixtures of modified t factor analyzers (MMtFA). The algorithms are tested systematically on simulated and real data. / Thesis / Master of Science (MSc)
8

Sparse Deep Learning and Stochastic Neural Network

Yan Sun (12425889) 13 May 2022 (has links)
<p>Deep learning has achieved state-of-the-art performance on many machine learning tasks. But the deep neural network(DNN) model still suffers a few issues. Over-parametrized neural network generally has better optimization landscape, but it is computationally expensive, hard to interpret and the model usually can not correctly quantify the prediction uncertainty. On the other hand, small DNN model could suffer from local trap and will be hard to optimize. In this dissertation, we tackle these issues from two directions, sparse deep learning and stochastic neural network. </p> <p><br></p> <p>For sparse deep learning, we proposed Bayesian neural network(BNN) model with mixture of normal prior. Theoretically, We established the posterior consistency and structure selection consistency, which ensures the sparse DNN model can be consistently identified. We also demonstrate the asymptotic normality of the prediction, which ensures the prediction uncertainty to be correctly quantified. Computationally, we proposed a prior annealing approach to optimize the posterior of BNN. The proposed methods share similar computation complexity to the standard stochastic gradient descent method for training DNN. Experiment results show that our model performs well on high dimensional variable selection as well as neural network pruning.</p> <p><br></p> <p>For stochastic neural network, we proposed a Kernel-Expanded Stochastic Neural Network model or K-StoNet model in short. We reformulate the DNN as a latent variable model and incorporate support vector regression (SVR) as the first hidden layer. The latent variable formulation breaks the training into a series of convex optimization problems and the model can be easily trained using the imputation-regularized optimization (IRO) algorithm. We provide theoretical guarantee for convergence of the algorithm and the prediction uncertainty quantification. Experiment results show that the proposed model can achieve good prediction performance and provide correct confidence region for prediction. </p>
9

Particle-based Parameter Inference in Stochastic Volatility Models: Batch vs. Online / Partikelbaseradparameterskattning i stokastiska volatilitets modeller: batch vs. online

Toft, Albin January 2019 (has links)
This thesis focuses on comparing an online parameter estimator to an offline estimator, both based on the PaRIS-algorithm, when estimating parameter values for a stochastic volatility model. By modeling the stochastic volatility model as a hidden Markov model, estimators based on particle filters can be implemented in order to estimate the unknown parameters of the model. The results from this thesis implies that the proposed online estimator could be considered as a superior method to the offline counterpart. The results are however somewhat inconclusive, and further research regarding the subject is recommended. / Detta examensarbetefokuserar på att jämföra en online och offline parameter-skattare i stokastiskavolatilitets modeller. De två parameter-skattarna som jämförs är båda baseradepå PaRIS-algoritmen. Genom att modellera en stokastisk volatilitets-model somen dold Markov kedja, kunde partikelbaserade parameter-skattare användas föratt uppskatta de okända parametrarna i modellen. Resultaten presenterade idetta examensarbete tyder på att online-implementationen av PaRIS-algorimen kanses som det bästa alternativet, jämfört med offline-implementationen.Resultaten är dock inte helt övertygande, och ytterligare forskning inomområdet
10

Adaptive Sampling Methods for Stochastic Optimization

Daniel Andres Vasquez Carvajal (10631270) 08 December 2022 (has links)
<p>This dissertation investigates the use of sampling methods for solving stochastic optimization problems using iterative algorithms. Two sampling paradigms are considered: (i) adaptive sampling, where, before each iterate update, the sample size for estimating the objective function and the gradient is adaptively chosen; and (ii) retrospective approximation (RA), where, iterate updates are performed using a chosen fixed sample size for as long as progress is deemed statistically significant, at which time the sample size is increased. We investigate adaptive sampling within the context of a trust-region framework for solving stochastic optimization problems in $\mathbb{R}^d$, and retrospective approximation within the broader context of solving stochastic optimization problems on a Hilbert space. In the first part of the dissertation, we propose Adaptive Sampling Trust-Region Optimization (ASTRO), a class of derivative-based stochastic trust-region (TR) algorithms developed to solve smooth stochastic unconstrained optimization problems in $\mathbb{R}^{d}$ where the objective function and its gradient are observable only through a noisy oracle or using a large dataset. Efficiency in ASTRO stems from two key aspects: (i) adaptive sampling to ensure that the objective function and its gradient are sampled only to the extent needed, so that small sample sizes are chosen when the iterates are far from a critical point and large sample sizes are chosen when iterates are near a critical point; and (ii) quasi-Newton Hessian updates using BFGS. We prove three main results for ASTRO and for general stochastic trust-region methods that estimate function and gradient values adaptively, using sample sizes that are stopping times with respect to the sigma algebra of the generated observations. The first asserts strong consistency when the adaptive sample sizes have a mild logarithmic lower bound, assuming that the oracle errors are light-tailed. The second and third results characterize the iteration and oracle complexities in terms of certain risk functions. Specifically, the second result asserts that the best achievable $\mathcal{O}(\epsilon^{-1})$ iteration complexity (of squared gradient norm) is attained when the total relative risk associated with the adaptive sample size sequence is finite; and the third result characterizes the corresponding oracle complexity in terms of the total generalized risk associated with the adaptive sample size sequence. We report encouraging numerical results in certain settings. In the second part of this dissertation, we consider the use of RA as an alternate adaptive sampling paradigm to solve smooth stochastic constrained optimization problems in infinite-dimensional Hilbert spaces. RA generates a sequence of subsampled deterministic infinite-dimensional problems that are approximately solved within a dynamic error tolerance. The bottleneck in RA becomes solving this sequence of problems efficiently. To this end, we propose a progressive subspace expansion (PSE) framework to solve smooth deterministic optimization problems in infinite-dimensional Hilbert spaces with a TR Sequential Quadratic Programming (SQP) solver. The infinite-dimensional optimization problem is discretized, and a sequence of finite-dimensional problems are solved where the problem dimension is progressively increased. Additionally, (i) we solve this sequence of finite-dimensional problems only to the extent necessary, i.e., we spend just enough computational work needed to solve each problem within a dynamic error tolerance, and (ii) we use the solution of the current optimization problem as the initial guess for the subsequent problem. We prove two main results for PSE. The first assesses convergence to a first-order critical point of a subsequence of iterates generated by the PSE TR-SQP algorithm. The second characterizes the relationship between the error tolerance and the problem dimension, and provides an oracle complexity result for the total amount of computational work incurred by PSE. This amount of computational work is closely connected to three quantities: the convergence rate of the finite-dimensional spaces to the infinite-dimensional space, the rate of increase of the cost of making oracle calls in finite-dimensional spaces, and the convergence rate of the solution method used. We also show encouraging numerical results on an optimal control problem supporting our theoretical findings.</p> <p>  </p>

Page generated in 0.1603 seconds