Global ETD Search

51	Queueing Models for Large Scale Call Centers Reed, Joshua E. 18 May 2007 (has links) In the first half of this thesis, we extend the results of Halfin and Whitt to generally distributed service times. This is accomplished by first writing the system equations for the G/GI/N queue in a manner similar to the system equations for G/GI/Infinity queue. We next identify a key relationship between these two queues. This relationship allows us to leverage several existing results for the G/GI/Infinity queue in order to prove our main result. Our main result in the first part of this thesis is to show that the diffusion scaled queue length process for the G/GI/N queue in the Halfin-Whitt regime converges to a limiting stochastic process which is driven by a Gaussian process and satisfies a stochastic convolution equation. We also show that a similar result holds true for the fluid scaled queue length process under general initial conditions. Customer abandonment is also a common feature of many call centers. Some researchers have even gone so far as to suggest that the level of customer abandonment is the single most important metric with regards to a call center's performance. In the second half of this thesis, we improve upon a result of Ward and Glynn's concerning the GI/GI/1+GI queue in heavy traffic. Whereas Ward and Glynn obtain a diffusion limit result for the GI/GI/1+GI queue in heavy traffic which incorporates only the density the abandonment distribution at the origin, our result incorporate the entire abandonment distribution. This is accomplished by a scaling the hazard rate function of the abandonment distribution as the system moves into heavy traffic. Our main results are to obtain diffusion limits for the properly scaled workload and queue length processes in the GI/GI/1+GI queue. The limiting diffusions we obtain are reflected at the origin with a negative drift which is dependent upon the hazard rate of the abandonment distribution. Because these diffusions have an analytically tractable steady-state distribution, they can be used to provide a closed-form approximation for the steady-state distribution of the queue length and workload processes in a GI/GI/1+GI queue. We demonstrate the accuracy of these approximations through simulation. Diffusion approximation Queueing theory Halfin and Whitt Stochastics Applied probability Gaussian process Queuing theory Call centers
52	Bayesian Hierarchical Model for Combining Two-resolution Metrology Data Xia, Haifeng 14 January 2010 (has links) This dissertation presents a Bayesian hierarchical model to combine two-resolution metrology data for inspecting the geometric quality of manufactured parts. The high- resolution data points are scarce, and thus scatter over the surface being measured, while the low-resolution data are pervasive, but less accurate or less precise. Combining the two datasets could supposedly make a better prediction of the geometric surface of a manufactured part than using a single dataset. One challenge in combining the metrology datasets is the misalignment which exists between the low- and high-resolution data points. This dissertation attempts to provide a Bayesian hierarchical model that can handle such misaligned datasets, and includes the following components: (a) a Gaussian process for modeling metrology data at the low-resolution level; (b) a heuristic matching and alignment method that produces a pool of candidate matches and transformations between the two datasets; (c) a linkage model, conditioned on a given match and its associated transformation, that connects a high-resolution data point to a set of low-resolution data points in its neighborhood and makes a combined prediction; and finally (d) Bayesian model averaging of the predictive models in (c) over the pool of candidate matches found in (b). This Bayesian model averaging procedure assigns weights to different matches according to how much they support the observed data, and then produces the final combined prediction of the surface based on the data of both resolutions. The proposed method improves upon the methods of using a single dataset as well as a combined prediction without addressing the misalignment problem. This dissertation demonstrates the improvements over alternative methods using both simulated data and the datasets from a milled sine-wave part, measured by two coordinate measuring machines of different resolutions, respectively. Bayesian hierarchical model Bayesian model averaging Data combining Gaussian process Misalignment Two-resolution data
53	A metamodeling approach for approximation of multivariate, stochastic and dynamic simulations Hernandez Moreno, Andres Felipe 04 April 2012 (has links) This thesis describes the implementation of metamodeling approaches as a solution to approximate multivariate, stochastic and dynamic simulations. In the area of statistics, metamodeling (or ``model of a model") refers to the scenario where an empirical model is build based on simulated data. In this thesis, this idea is exploited by using pre-recorded dynamic simulations as a source of simulated dynamic data. Based on this simulated dynamic data, an empirical model is trained to map the dynamic evolution of the system from the current discrete time step, to the next discrete time step. Therefore, it is possible to approximate the dynamics of the complex dynamic simulation, by iteratively applying the trained empirical model. The rationale in creating such approximate dynamic representation is that the empirical models / metamodels are much more affordable to compute than the original dynamic simulation, while having an acceptable prediction error. The successful implementation of metamodeling approaches, as approximations of complex dynamic simulations, requires understanding of the propagation of error during the iterative process. Prediction errors made by the empirical model at earlier times of the iterative process propagate into future predictions of the model. The propagation of error means that the trained empirical model will deviate from the expensive dynamic simulation because of its own errors. Based on this idea, Gaussian process model is chosen as the metamodeling approach for the approximation of expensive dynamic simulations in this thesis. This empirical model was selected not only for its flexibility and error estimation properties, but also because it can illustrate relevant issues to be considered if other metamodeling approaches were used for this purpose. Metamodeling Gaussian process model System dynamics System identification Error estimation Simulation methods Gaussian processes
54	Statistical Learning of Some Complex Systems: From Dynamic Systems to Market Microstructure Tong, Xiao Thomas 27 September 2013 (has links) A complex system is one with many parts, whose behaviors are strongly dependent on each other. There are two interesting questions about complex systems. One is to understand how to recover the true structure of a complex system from noisy data. The other is to understand how the system interacts with its environment. In this thesis, we address these two questions by studying two distinct complex systems: dynamic systems and market microstructure. To address the first question, we focus on some nonlinear dynamic systems. We develop a novel Bayesian statistical method, Gaussian Emulator, to estimate the parameters of dynamic systems from noisy data, when the data are either fully or partially observed. Our method shows that estimation accuracy is substantially improved and computation is faster, compared to the numerical solvers. To address the second question, we focus on the market microstructure of hidden liquidity. We propose some statistical models to explain the hidden liquidity under different market conditions. Our statistical results suggest that hidden liquidity can be reliably predicted given the visible state of the market. / Statistics Statistics Bayesian inference dynamic systems Gaussian process hidden liquidity market microstructure nonparametric regression
55	Transfer learning for classification of spatially varying data Jun, Goo 13 December 2010 (has links) Many real-world datasets have spatial components that provide valuable information about characteristics of the data. In this dissertation, a novel framework for adaptive models that exploit spatial information in data is proposed. The proposed framework is mainly based on development and applications of Gaussian processes. First, a supervised learning method is proposed for the classification of hyperspectral data with spatially adaptive model parameters. The proposed algorithm models spatially varying means of each spectral band of a given class using a Gaussian process regression model. For a given location, the predictive distribution of a given class is modeled by a multivariate Gaussian distribution with spatially adjusted parameters obtained from the proposed algorithm. The Gaussian process model is generally regarded as a good tool for interpolation, but not for extrapolation. Moreover, the uncertainty of the predictive distribution increases as the distance from the training instances increases. To overcome this problem, a semi-supervised learning algorithm is presented for the classification of hyperspectral data with spatially adaptive model parameters. This algorithm fits the test data with a spatially adaptive mixture-of-Gaussians model, where the spatially varying parameters of each component are obtained by Gaussian process regressions with soft memberships using the mixture-of-Gaussian-processes model. The proposed semi-supervised algorithm assumes a transductive setting, where the unlabeled data is considered to be similar to the training data. This is not true in general, however, since one may not know how many classes may existin the unexplored regions. A spatially adaptive nonparametric Bayesian framework is therefore proposed by applying spatially adaptive mechanisms to the mixture model with infinitely many components. In this method, each component in the mixture has spatially adapted parameters estimated by Gaussian process regressions, and spatial correlations between indicator variables are also considered. In addition to land cover and land use classification applications based on hyperspectral imagery, the Gaussian process-based spatio-temporal model is also applied to predict ground-based aerosol optical depth measurements from satellite multispectral images, and to select the most informative ground-based sites by active learning. In this application, heterogeneous features with spatial and temporal information are incorporated together by employing a set of covariance functions, and it is shown that the spatio-temporal information exploited in this manner substantially improves the regression model. The conventional meaning of spatial information usually refers to actual spatio-temporal locations in the physical world. In the final chapter of this dissertation, the meaning of spatial information is generalized to the parametrized low-dimensional representation of data in feature space, and a corresponding spatial modeling technique is exploited to develop a nearest-manifold classification algorithm. / text Machine learning Gaussian processes Gaussian process regressions Spatial statistics
56	Scalable Nonparametric Bayes Learning Banerjee, Anjishnu January 2013 (has links) <p>Capturing high dimensional complex ensembles of data is becoming commonplace in a variety of application areas. Some examples include</p><p>biological studies exploring relationships between genetic mutations and diseases, atmospheric and spatial data, and internet usage and online behavioral data. These large complex data present many challenges in their modeling and statistical analysis. Motivated by high dimensional data applications, in this thesis, we focus on building scalable Bayesian nonparametric regression algorithms and on developing models for joint distributions of complex object ensembles.</p><p>We begin with a scalable method for Gaussian process regression, a commonly used tool for nonparametric regression, prediction and spatial modeling. A very common bottleneck for large data sets is the need for repeated inversions of a big covariance matrix, which is required for likelihood evaluation and inference. Such inversion can be practically infeasible and even if implemented, highly numerically unstable. We propose an algorithm utilizing random projection ideas to construct flexible, computationally efficient and easy to implement approaches for generic scenarios. We then further improve the algorithm incorporating some structure and blocking ideas in our random projections and demonstrate their applicability in other contexts requiring inversion of large covariance matrices. We show theoretical guarantees for performance as well as substantial improvements over existing methods with simulated and real data. A by product of the work is that we discover hitherto unknown equivalences between approaches in machine learning, random linear algebra and Bayesian statistics. We finally connect random projection methods for large dimensional predictors and large sample size under a unifying theoretical framework.</p><p>The other focus of this thesis is joint modeling of complex ensembles of data from different domains. This goes beyond traditional relational modeling of ensembles of one type of data and relies on probability mixing measures over tensors. These models have added flexibility over some existing product mixture model approaches in letting each component of the ensemble have its own dependent cluster structure. We further investigate the question of measuring dependence between variables of different types and propose a very general novel scaled measure based on divergences between the joint and marginal distributions of the objects. Once again, we show excellent performance in both simulated and real data scenarios.</p> / Dissertation Statistics Bayes Gaussian process high-dimensional nonparametric random projections tensor factorization
57	On Bayesian Analyses of Functional Regression, Correlated Functional Data and Non-homogeneous Computer Models Montagna, Silvia January 2013 (has links) <p>Current frontiers in complex stochastic modeling of high-dimensional processes include major emphases on so-called functional data: problems in which the data are snapshots of curves and surfaces representing fundamentally important scientific quantities. This thesis explores new Bayesian methodologies for functional data analysis. </p><p>The first part of the thesis places emphasis on the role of factor models in functional data analysis. Data reduction becomes mandatory when dealing with such high-dimensional data, more so when data are available on a large number of individuals. In Chapter 2 we present a novel Bayesian framework which employs a latent factor construction to represent each variable by a low dimensional summary. Further, we explore the important issue of modeling and analyzing the relationship of functional data with other covariate and outcome variables simultaneously measured on the same subjects.</p><p>The second part of the thesis is concerned with the analysis of circadian data. The focus is on the identification of circadian genes that is, genes whose expression levels appear to be rhythmic through time with a period of approximately 24 hours. While addressing this goal, most of the current literature does not account for the potential dependence across genes. In Chapter 4, we propose a Bayesian approach which employs latent factors to accommodate dependence and verify patterns and relationships between genes, while representing the true gene expression trajectories in the Fourier domain allows for inference on period, phase, and amplitude of the signal.</p><p>The third part of the thesis is concerned with the statistical analysis of computer models (simulators). The heavy computational demand of these input-output maps calls for statistical techniques that quickly estimate the surface output at untried inputs given a few preliminary runs of the simulator at a set design points. In this regard, we propose a Bayesian methodology based on a non-stationary Gaussian process. Relying on a model-based assessment of uncertainty, we envision a sequential design technique which helps choosing input points where the simulator should be run to minimize the uncertainty in posterior surface estimation in an optimal way. The proposed non-stationary approach adapts well to output surfaces of unconstrained shape.</p> / Dissertation Statistics Bayesian statistics Computer emulation Functional data Latent factor models Latent modeling Non-stationary Gaussian process
58	Adjusting for Selection Bias Using Gaussian Process Models Du, Meng 18 July 2014 (has links) This thesis develops techniques for adjusting for selection bias using Gaussian process models. Selection bias is a key issue both in sample surveys and in observational studies for causal inference. Despite recently emerged techniques for dealing with selection bias in high-dimensional or complex situations, use of Gaussian process models and Bayesian hierarchical models in general has not been explored. Three approaches are developed for using Gaussian process models to estimate the population mean of a response variable with binary selection mechanism. The first approach models only the response with the selection probability being ignored. The second approach incorporates the selection probability when modeling the response using dependent Gaussian process priors. The third approach uses the selection probability as an additional covariate when modeling the response. The third approach requires knowledge of the selection probability, while the second approach can be used even when the selection probability is not available. In addition to these Gaussian process approaches, a new version of the Horvitz-Thompson estimator is also developed, which follows the conditionality principle and relates to importance sampling for Monte Carlo simulations. Simulation studies and the analysis of an example due to Kang and Schafer show that the Gaussian process approaches that consider the selection probability are able to not only correct selection bias effectively, but also control the sampling errors well, and therefore can often provide more efficient estimates than the methods tested that are not based on Gaussian process models, in both simple and complex situations. Even the Gaussian process approach that ignores the selection probability often, though not always, performs well when some selection bias is present. These results demonstrate the strength of Gaussian process models in dealing with selection bias, especially in high-dimensional or complex situations. These results also demonstrate that Gaussian process models can be implemented rather effectively so that the benefits of using Gaussian process models can be realized in practice, contrary to the common belief that highly flexible models are too complex to use practically for dealing with selection bias. selection bias confounding bias Gaussian process models Bayesian analysis sample survey causal inference observational studies 0463
59	Statistical Models and Algorithms for Studying Hand and Finger Kinematics and their Neural Mechanisms Castellanos, Lucia 01 August 2013 (has links) The primate hand, a biomechanical structure with over twenty kinematic degrees of freedom, has an elaborate anatomical architecture. Although the hand requires complex, coordinated neural control, it endows its owner with an astonishing range of dexterous finger movements. Despite a century of research, however, the neural mechanisms that enable finger and grasping movements in primates are largely unknown. In this thesis, we investigate statistical models of finger movement that can provide insights into the mechanics of the hand, and that can have applications in neural-motor prostheses, enabling people with limb loss to regain natural function of the hands. There are many challenges associated with (1) the understanding and modeling of the kinematics of fingers, and (2) the mapping of intracortical neural recordings into motor commands that can be used to control a Brain-Machine Interface. These challenges include: potential nonlinearities; confounded sources of variation in experimental datasets; and dealing with high degrees of kinematic freedom. In this work we analyze kinematic and neural datasets from repeated-trial experiments of hand motion, with the following contributions: We identified static, nonlinear, low-dimensional representations of grasping finger motion, with accompanying evidence that these nonlinear representations are better than linear representations at predicting the type of object being grasped over the course of a reach-to-grasp movement. In addition, we show evidence of better encoding of these nonlinear (versus linear) representations in the firing of some neurons collected from the primary motor cortex of rhesus monkeys. A functional alignment of grasping trajectories, based on total kinetic energy, as a strategy to account for temporal variation and to exploit a repeated-trial experiment structure. An interpretable model for extracting dynamic synergies of finger motion, based on Gaussian Processes, that decomposes and reduces the dimensionality of variance in the dataset. We derive efficient algorithms for parameter estimation, show accurate reconstruction of grasping trajectories, and illustrate the interpretation of the model parameters. Sound evidence of single-neuron decoding of interpretable grasping events, plus insights about the amount of grasping information extractable from just a single neuron. The Laplace Gaussian Filter (LGF), a deterministic approximation to the posterior mean that is more accurate than Monte Carlo approximations for the same computational cost, and that in an off-line decoding task is more accurate than the standard Population Vector Algorithm. Variance Decomposition Laplace Gaussian Filter Functional Data Alignment Encoding Decoding
60	Multivariate Spatial Process Gradients with Environmental Applications Terres, Maria Antonia January 2014 (has links) <p>Previous papers have elaborated formal gradient analysis for spatial processes, focusing on the distribution theory for directional derivatives associated with a response variable assumed to follow a Gaussian process model. In the current work, these ideas are extended to additionally accommodate one or more continuous covariate(s) whose directional derivatives are of interest and to relate the behavior of the directional derivatives of the response surface to those of the covariate surface(s). It is of interest to assess whether, in some sense, the gradients of the response follow those of the explanatory variable(s), thereby gaining insight into the local relationships between the variables. The joint Gaussian structure of the spatial random effects and associated directional derivatives allows for explicit distribution theory and, hence, kriging across the spatial region using multivariate normal theory. The gradient analysis is illustrated for bivariate and multivariate spatial models, non-Gaussian responses such as presence-absence and point patterns, and outlined for several additional spatial modeling frameworks that commonly arise in the literature. Working within a hierarchical modeling framework, posterior samples enable all gradient analyses to occur as post model fitting procedures.</p> / Dissertation Statistics Directional derivative Gaussian process Geostatistics Gradient Matern covariance Sensitivity analysis

Search results