Spelling suggestions: "subject:"estatistics"" "subject:"cstatistics""
241 |
Space and Space-Time Modeling of Directional DataWang, Fangpo January 2013 (has links)
<p>Directional data, i.e., data collected in the form of angles or natural directions arise in many scientific fields, such as oceanography, climatology, geology, meteorology and biology to name a few. The non-Euclidean nature of such data poses difficulties in applying ordinary statistical methods developed for inline data, motivating the need for specialized modeling framework for directional data. Motivated in particular by a marine application of modeling spatial association of wave directions and additionally association between spatial wave directions and spatial wave heights, this dissertation focuses on providing general frameworks of modeling spatial and spatio-temporal directional data, while also studying the theoretical properties of the proposed methods. In particular, the projected normal family of circular distributions is proposed as a default parametric family of distributions for directional data. Operating in a Bayesian framework and exploiting standard data augmentation techniques, the projected normal family is shown to have straightforward extensions to the regression and process setting. </p><p> </p><p>A fully model-based approach is developed to capture structured spatial dependence for modeling directional data at different spatial locations. A stochastic process taking values on the circle, a projected Gaussian spatial process, is introduced. This spatial angular process is induced from an inline bivariate Gaussian process. The properties of the projected Gaussian process is discussed with special emphasis on the ``covariance'' structure. We show how to fit this process as a model for data, using suitable latent variables with Markov chain Monte Carlo methods. We also show how to implement spatial interpolation and conduct model comparison in this setting. Simulated examples are provided as proof of concept. A real data application arises for modeling the aforementioned wave direction data in the Adriatic sea, off the coast of Italy. This directional data being available dynamically, naturally motivated extension to a space-time setting. </p><p>As the basis of the projected Gaussian process, the properties of the general projected normal distribution is first clarified. The general projected normal distribution on a circle is defined to be the distribution of a bivariate normal random variable with arbitrary mean and covariance, projected on the unit circle. The projected normal distribution is an under-utilized model for explaining directional data. In particular, the general version with non-identity covariance provides flexibility, e.g., bimodality, asymmetry, and convenient regression specification. </p><p>For analyzing non-spatial circular data, fully Bayesian hierarchical models using the general projected normal distribution are developed and fitting using Markov chain Monte Carlo methods with suitable latent variables is illustrated. The posterior inference for distributional features such as the angular mean direction and concentration can be implemented as well as how prediction within the regression setting can be handled. For analyzing spatial directional data, latent variables are also introduced to facilitate the model fitting with MCMC methods. The implementation of spatial interpolation and conduction of model comparison are demonstrated. With regard to model comparison, an out-of-sample approach using both a predictive likelihood scoring loss criterion and a cumulative rank probability score criterion is utilized.</p><p>This dissertation later focuses on building model extensions based on the framework of the projected Gaussian process. The wave directions data studied in the previous chapters also include wave height information at the same space and time resolution. Motivated by joint modeling of these important attributes of wave (wave directions and wave heights), a hierarchical framework is developed for jointly modeling spatial directional and ordinary linear observations. We show that the Bayesian model fitting under our model specification is straightforward using suitable latent variable augmentation via Markov chain Monte Carlo (MCMC). This joint model framework can easily incorporate space-time covariate information, enabling both spatial interpolation and temporal forecast. </p><p>The spatial projected Gaussian process also provides a natural application in geosciences as aspect processes for the elevation maps. Compared to conventional calculations, a fully process model for aspects is provided, allowing full inference and arbitrary interpolation. The aspect processes can directly be inferred from a sample from the surface of elevations, providing the estimate and its uncertainties of the aspect at any new location over the region.</p> / Dissertation
|
242 |
Statistics in Ella MathematicsTeng, Yunlong, Zhao, Yingrui January 2012 (has links)
"Ella Mathematics" is a web-based e-learning system which aims to improve elementary school students’ mathematics learning in Sweden. Such an e-learning tool has been partially completed in May 2012, except descriptive statistics module summarizing students’ performance in the learning process. This project report presents and describes the design and implementation of such descriptive statistics module, which intends to allow students to check their own grades and learning progress; teachers to check and compare students’ grades and progress, as well as parents to compare their children’s grades and learning progress with the average grade and progress of other students. To better understand and design such functionalities, different mathematical e-learning systems were investigated. Another contribution of this project relates to the evaluation and redesign of the existing database model of the “Ella Mathematics” system. The redesign improved performance and reduced data redundancy.
|
243 |
Quadratic Hedging with Margin Requirements and Portfolio ConstraintsTazhitdinova, Alisa January 2010 (has links)
We consider a mean-variance portfolio optimization problem, namely, a problem of minimizing the variance of the final wealth that results from trading over a fixed finite horizon in a continuous-time complete market in the presence of convex portfolio constraints, taking into account the cost imposed by margin requirements on trades and subject to the further constraint that the expected final wealth equal a specified target value. Market parameters are chosen to be random processes adapted to the information filtration available to the investor and asset prices are modeled by Itô processes. To solve this problem we use an approach based on conjugate duality: we start by synthesizing a dual optimization problem, establish a set of optimality relations that describe an optimal solution in terms of solutions of the dual problem, thus giving necessary and sufficient conditions for the given optimization problem and its dual to each have a solution. Finally, we prove existence of a solution of the dual problem, and for a particular class of dual solutions, establish existence of an optimal portfolio and also describe it explicitly. The method elegantly and rather straightforwardly constructs a dual problem and its solution, as well as provides intuition for construction of the actual optimal portfolio.
|
244 |
A comparison of unsupervised learning techniques for detection of medical abuse in automobile claimsYang, Li 10 January 2013
A comparison of unsupervised learning techniques for detection of medical abuse in automobile claims
|
245 |
Branching processes with biological applicationsJanuary 2010 (has links)
Branching processes play an important role in models of genetics, molecular biology, microbiology, ecology and evolutionary theory. This thesis explores three aspects of branching processes with biological applications. The first part of the thesis focuses on fluctuation analysis, with the main purpose to estimate mutation rates in microbial populations. We propose a novel estimator of mutation rates, and apply it to a number of Luria-Delbruck type fluctuation experiments in Saccharomyces cerevisiae. Second, we study the extinction of Markov branching processes, and derived theorems for the path to extinction in the critical case, as an extension to Jagers' theory. The third part of the thesis introduces infinite-allele Markov branching processes. As an important non-trivial example, the limiting frequency spectrum for the birth-death process has been derived. Potential application of modeling the proliferation and mutation of human Alu sequences is also discussed.
|
246 |
Model-based clustering for multivariate time series of countsJanuary 2010 (has links)
This dissertation develops a modeling framework for univariate and multivariate zero-inflated time series of counts and applies the models in a clustering scheme to identify groups of count series with similar behavior. The basic modeling framework used is observation-driven Poisson regression with generalized linear model (GLM) structure. The zero-inflated Poisson (ZIP) model is employed to characterize the possibility of extra observed zeros relative to the Poisson, a common feature of count data. These two methods are combined to characterize time series of counts where the counts and the probability of extra zeros may depend on past data observations and on exogenous covariates.
A key contribution of this work is a novel modeling paradigm for multivariate zero-inflated counts. The three related models considered are the jointly-inflated, the marginally-inflated, and the doubly-inflated multivariate Poisson. The doubly-inflated model encompasses both marginal-inflation, which allows for additional zeros at each time epoch for each individual count series, and joint-inflation, which allows for zero-inflation across all multivariate series. These models improve upon previously proposed models, which are either too rigid or too simplistic to be applicable in a wide variety of applications. To estimate the model parameters, a new Monte Carlo Estimation Maximization (MCEM) algorithm is developed. The Monte Carlo sampling eliminates complex recursion formulas needed for calculating the probability function of the multivariate Poisson. The algorithm is easily adapted for different multivariate zero-inflation schemes.
The new models, new estimation methods, and applications in clustering are demonstrated on simulated and real datasets. For an application in finance, the number of trades and the number of price changes for bonds are modeled as a bivariate doubly zero-inflated Poisson time series, where observations of zero trades or zero price changes represent the liquidity risk for that bond. In an environmental science application, the new models are used in a model-based clustering scheme to study counts of high pollution events at air quality monitoring stations around Houston, Texas. Clustering reveals regions of the air monitoring network which behave similarly in terms of time dependence and response to covariates representing atmospheric conditions and physical sources of air pollution.
|
247 |
On the separation of T Tauri star spectra using non-negative matrix factorization and Bayesian positive source separationJanuary 2010 (has links)
The objective of this study is to compare and evaluate Bayesian and deterministic methods of positive source separation of young star spectra. In the Bayesian approach, the proposed Bayesian Positive Source Separation (BPSS) method uses Gamma priors to enforce non-negativity in the source signals and mixing coefficients and a Markov Chain Monte Carlo (MCMC) algorithm, modified by suggesting simpler proposal distributions and randomly initializing the MCMC to correctly separate spectra. In the deterministic approach, two Non-negative Matrix Factorization (NNMF) algorithms, the multiplicative update rule algorithm and an alternating least squares algorithm, are used to separate the star spectra into sources. The BPSS and NNMF algorithms are applied to the field of Astrophysics by applying the source separation techniques to T Tauri star spectra, resulting in a successful decomposition of the spectra into their sources. These methods are for the first time being applied and evaluated in optical spectroscopy. The results show that, while both methods perform well, BPSS outperforms NNMF. The NNMF and BPSS algorithms improve upon the current methodology used in Astrophysics iu two important ways. First, they permit the identification of additional components of the spectra in addition to the photosphere and boundary layer which can be modeled with current methods. Second, by applying a statistical algorithm, the modeling of T Tauri stars becomes less subjective. These methods may be further extrapolated to model spectra from other types of stars or astrophysical phenomena.
|
248 |
Generalized Gaussian process models with Bayesian variable selectionJanuary 2010 (has links)
This research proposes a unified Gaussian process modeling approach that extends to data from the exponential dispersion family and survival data. Our specific interest is in the analysis of datasets with predictors possessing an a priori unknown form of possibly non-linear associations to the response. We incorporate Gaussian processes in a generalized linear model framework to allow a flexible non-parametric response surface function of the predictors. We term these novel classes "generalized Gaussian process models". We consider continuous, categorical and count responses and extend to survival outcomes. Next, we focus on the problem of selecting variables from a set of possible predictors and construct a general framework that employs mixture priors and a Metropolis-Hastings sampling scheme for the selection of the predictors with joint posterior exploration of the model and associated parameter spaces.
We build upon this framework by first enumerating a scheme to improve efficiency of posterior sampling. In particular, we compare the computational performance of the Metropolis-Hastings sampling scheme with a newer Metropolis-within-Gibbs algorithm. The new construction achieves a substantial improvement in computational efficiency while simultaneously reducing false positives. Next, leverage this efficient scheme to investigate selection methods for addressing more complex response surfaces, particularly under a high dimensional covariate space.
Finally, we employ spiked Dirichlet process (DP) prior constructions over set partitions containing covariates. Our approach results in a nonparametric treatment of the distribution of the covariance parameters of the GP covariance matrix that in turn induces a clustering of the covariates. We evaluate two prior constructions: The first employs a mixture of a point-mass and a continuous distribution as the centering distribution for the DP prior, therefore clustering all covariates. The second one employs a mixture of a spike and a DP prior with a continuous distribution as the centering distribution, which induces clustering of the selected covariates only. DP models borrow information across covariates through model-based clustering, achieving sharper variable selection and prediction than what obtained using mixture priors alone. We demonstrate that the former prior construction favors "sparsity", while the latter is computationally more efficient.
|
249 |
Contributions to the study of special purpose sampling plansMurthy, Narasimha M N 09 1900 (has links)
Special purpose sampling plans
|
250 |
Contributions to extreme value theory.Ravi, S 05 1900 (has links)
Extreme value theory
|
Page generated in 0.0873 seconds