Spelling suggestions: "subject:"estatistics"" "subject:"cstatistics""
561 |
Small Area Estimation with Random Effects SelectionUnknown Date (has links)
In this study, we propose a robust method holding a selective shrinkage power for small area estimation with automatic random effects selection referred to as SARS. In our proposed model, both fixed effects and random effects are treated as joint target. In this case, maximizing joint likelihood of fixed effects and random effects makes more sense than maximizing marginal likelihood. In practice, variance of sampling error and variance of modeling error (random effects) are unknown. SARS does not require any prior information of both variance components and dimensionality of data. Furthermore, area-specific random effects, accounting for additional area variation, are not always necessary in small area estimation model. From this observation, we can impose sparsity on random effects by assigning zero for the large area. This sparsity brings heavy tails, which means that the normality assumption of random effects is not retained any longer. The SARS holding selective and predictive power employs penalized regression using a non-convex penalty. For solving the non-convex problem of SARS, we employ iterative algorithms via a quantile thresholding procedure. The algorithms make use of the iterative selection-estimation paradigm with a variety of techniques such as progressive screening when tuning parameters, muti-start strategy with subsampling method and feature subset method to generate more efficient initial points for enhancing computation efficiency and efficacy. To achieve optimal prediction error under the dimensional relaxation, we propose a new theoretical predictive information criterion for SARS (SARS-PIC) which is derived based upon non-asymptotic oracle inequalities using minimax rate of ideal predictive risk. Experiments with simulation and real poverty data of school-age(5-17) children demonstrate the efficiency of SARS. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2017. / February 6, 2017. / Includes bibliographical references. / Yiyuan She, Professor Directing Dissertation; Giray Okten, University Representative; Danniel McGee, Committee Member; Debajyoti Sinha, Committee Member.
|
562 |
High Level Image Analysis on Manifolds via Projective Shapes and 3D Reflection ShapesUnknown Date (has links)
Shape analysis is a widely studied topic in modern Statistics with important applications in areas such as medical imaging. Here we focus on two-sample hypothesis testing for both finite and infinite extrinsic mean shapes of configurations. First, we present a test for equality of mean projective shapes of 2D contours based on rotations. Secondly, we present a test for mean 3D reflection shapes based on the Schoenberg mean. We apply these tests to footprint data (contours), clamshells (3D reflection shape) and human facial configurations extracted from digital camera images. We also present the method of MANOVA on manifolds, and apply it to face data extracted from digital camera images. Finally, we present a new statistical tool called anti-regression. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2017. / April 14, 2017. / Includes bibliographical references. / Vic Patrangenaru, Professor Directing Dissertation; Xiuwen Liu, University Representative; Adrian Barbu, Committee Member; Minjing Tao, Committee Member.
|
563 |
Bayesian Models for Capturing Heterogeneity in Discrete DataUnknown Date (has links)
Population heterogeneity exists frequently in discrete data. Many Bayesian models perform reasonably well in capturing this subpopulation structure. Typically, the Dirichlet process mixture model (DPMM) and a variable dimensional alternative that we refer to as the mixture of finite mixtures (MFM) model are used, as they both have natural byproducts of clustering derived from Polya urn schemes. The first part of this dissertation focuses on a model for the association between a binary response and binary predictors. The model incorporates Boolean combinations of predictors, called logic trees, as parameters arising from a DPMM or MFM. Joint modeling is proposed to solve the identifiability issue that arises when using a mixture model for a binary response. Different MCMC algorithms are introduced and compared for fitting these models. The second part of this dissertation is the application of the mixture of finite mixtures model to community detection problems. Here, the communities are analogous to the clusters in the earlier work. A probabilistic framework that allows simultaneous estimation of the number of clusters and the cluster configuration is proposed. We prove clustering consistency in this setting. We also illustrate the performance of these methods with simulation studies and discuss applications. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2017. / April 5, 2017. / Community Detection, Discrete data, Joint Modeling, MCMC, Mixture Model, Population heterogeneity / Includes bibliographical references. / Elizabeth H. Slate, Professor Co-Directing Dissertation; Debdeep Pati, Professor Co-Directing Dissertation; Carl P. Schmertmann, University Representative; Xin Zhang, Committee Member.
|
564 |
Robust Function Registration Using Depth on the Phase VariabilityUnknown Date (has links)
In the field of functional data analysis, registration is still a fundamental problem. Registration still has to take into consideration what underlying template is chosen for "center" or alignment purposes and the hurdles that come with each. In this dissertation we will cover the registration of temporal observations with compositional and additive noises present under a mean template and the registration of temporal observations under a median template. Our first project covers an adaptation of the Fisher-Rao framework such that can yield a consistent estimator when both noises are present. Our second project covers a similar registration but uses data depth. The adapted Fisher-Rao method gives a mean template and the data depth method gives a median. As in standard statistics, we wish to explore outlier removal, appropriate template usage, and pattern classification. Various frameworks have been developed over the past two decades where registrations are conducted based on optimal time warping between functions. Comparison of functions solely based on time warping, however, may have limited application, in particular when certain constraints are desired in the registration. In the first project, we study registration with norm-preserving constraint. A closely related problem is on signal estimation, where the goal is to estimate the ground-truth template given random observations with both compositional and additive noises. We propose to adopt the Fisher-Rao framework to compute the underlying template, and mathematically prove that such framework leads to a consistent estimator. We then illustrate the constrained Fisher-Rao registration using simulations as well as two real data sets. It is found that the constrained method is robust with respect to additive noise and has superior alignment and classification performance to conventional, unconstrained registration methods. Recently, statistical methodologies have been developed and extended to deal with functional observations. A past notion of depth has advanced the ways in which we can view these functional observations, i.e. an intuitive center-outward ordering scheme and median signal template estimation. However, functional observations often have noise. Thus, we propose a semi-parametric model and an algorithm that yields a consistent estimator for the underlying median template. We also propose a new band based boxplot methodology for outlier detection and removal. We illustrate the robustness of this depth based registration using simulations as well as two real data sets. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2017. / March 24, 2017. / alignment with constraint, data depth, functional data analysis, registration, signal estimation, time warping / Includes bibliographical references. / Wei Wu, Professor Directing Dissertation; Eric Klassen, University Representative; Anuj Srivastava, Committee Member; Minjing Tao, Committee Member.
|
565 |
A Study of A Copula-Based Approach For The Endogeneity Problem And Its ApplicationLava, Negin 24 May 2022 (has links)
No description available.
|
566 |
On the use of inconsistent normalizers for statistical inference on dependent dataPan, Qiao 02 March 2022 (has links)
Statistical inference, such as confidence interval construction, change point detection and nonparametric regression estimation, has been widely explored in many fields including climate science, economics, finance, industrial engineering and many others. The inference has been well developed in the literature under independent settings, while dependent data, especially time series data, is not uncommon to be observed in these areas. Self-normalization is then proposed to analyze statistical inference for time series data. This thesis first explores asymptotic behavior of optimal weighting in generalized self‑normalization, then proposes self‑normalized simultaneous confidence regions for high‑dimensional time series, and lastly explores unsupervised self‑normalized break test for correlation matrix.
The basic idea of self-normalization is that it uses an inconsistent variance estimator as studentizer. The original self-normalizer only considered forward estimators and recently it is generalized to involve both forward and backward estimators with deterministic weights. In the first project, we propose a data-driven weight that corresponds to confidence intervals with minimal lengths and study the asymptotic behavior of such a data-driven weight choice. An interesting dichotomy is found between linear and nonlinear quantities.
In the second project, we would like to overcome the dimension limitation of self-normalization and propose a different perspective to make statistical inference of general quantities of high-dimensional time series. Taking the advantage of data with sparse signals, we develop an asymptotic theory on the maximal modulus of self-normalized statistics. We further establish a thresholded self-normalization method to produce simultaneous confidence regions. The method is able to detect uncommon signals among NASDAQ100 in 2016‑2019 in terms of mean and median log returns.
In the last project, we move on to unsupervised test for correlation matrix breaks. We develop a self-normalized test tailored to detect correlation matrix breaks. This method is unsupervised and directly compares the estimated correlation before and after the hypothesized change point. We apply the test to the stock log returns of 10 companies and volatility indexes of 5 options on individual equities to show its power of detecting correlation matrix breaks.
|
567 |
Applications of Bayesian latent network models to causal inference and transaction set miningReynolds, David 03 March 2022 (has links)
Although networks are widely used in statistical models as a convenient representation of the relationships between elements of a system, incorporating them within an inferential procedure poses challenges. This dissertation consists of three projects that are unified in their use of a network to represent relationships among the variables being studied and incorporation of the network into a Bayesian framework for inference.
Chapter 1 addresses causal inference for time varying treatments using observational data. This problem is discussed from frequentist and Bayesian perspectives, using potential outcomes and graphical model frameworks. We focus on the Bayesian perspective and develop a method for causal inference within this paradigm that accounts for uncertainty in the causal structure of the measured variables. This structure is encoded by a directed acyclic graph (DAG). Our proposed method involves an MCMC sampling procedure in which this DAG is sampled, allowing model averaging over causal structures. Properties of the method are illustrated with simulated data as well as an analysis of data from the Women’s Interagency HIV Study (WIHS), the largest ongoing prospective cohort study of HIV among women in the U.S.
Chapter 2 considers the problem of statistical inference for multivariate binary transaction data. We develop a hierarchical model and an MCMC algorithm that features a latent graph to represent associations between products. Properties of this method are illustrated with simulated data as well as data from Instacart, a U.S. company that operates a grocery delivery and pick-up service.
Chapter 3 examines longitudinal data from Electronic Health Records (EHR) associated with a pediatric asthma study. This project, a collaboration with the BU School of Public Health and Boston Medical Center, focuses on gaining insight into pediatric lung function, as measured by forced expiratory volume (FEV1%). A longitudinal Hidden Markov Model is developed in which the parameters of the Markov process may be inferred from high dimensional and correlated covariates.
|
568 |
An exploration of alternative features in micro-finance loan default prediction modelsStone, Devon 11 November 2020 (has links)
Despite recent developments financial inclusion remains a large issue for the World's unbanked population. Financial institutions - both larger corporations and micro-finance companies - have begun to provide solutions for financial inclusion. The solutions are delivered using a combination of machine learning and alternative data. This minor dissertation focuses on investigating whether alternative features generated from Short Messaging Service (SMS) data and Android application data contained on borrowers' devices can be used to improve the performance of loan default prediction models. The improvement gained by using alternative features is measured by comparing loan default prediction models trained using only traditional credit scoring data to models developed using a combination of traditional and alternative features. Furthermore, the paper investigates which of 4 machine learning techniques is best suited for loan default prediction. The 4 techniques investigated are logistic regression, random forests, extreme gradient boosting, and neural networks. Finally the paper identifies whether or not accurate loan default prediction models can be trained using only the alternative features developed throughout this minor dissertation. The results of the research show that alternative features improve the performance of loan default prediction across 5 performance indicators, namely overall prediction accuracy, repaid prediction accuracy, default prediction accuracy, F1 score, and AUC. Furthermore, extreme gradient boosting is identified as the most appropriate technique for loan default prediction. Finally, the research identifies that models trained using the alternative features developed throughout this project can accurately predict loan that have been repaid, the models do not accurately predict loans that have not been repaid.
|
569 |
Modelling Multivariate Nonlinear Vaccine Induced Immune ResponsesLapham, Brendon M 11 November 2020 (has links)
Interpretable statistical models for multivariate vaccine induced immune response data are important as they provide a rigorous means of deciding which vaccine candidates should be advanced in the clinical trials process. We consider applications of several different statistical models to a vaccine data set which contains multivariate immune responses for several novel Tuberculosis vaccines and the current BCG vaccine. The immune responses in the data set have several features which the models need to account for. In particular, the models need to account for the multivariate repeated measures for the subjects, the nonlinear profiles of the immune responses, and the zero-inflated skew distributions of the immune responses. We find that Tweedie multivariate generalised linear mixed effect and latent variable models with cubic B-splines perform well for this data set relative to linear, nonlinear, and univariate Tweedie generalised linear mixed effect models. In addition, the Tweedie multivariate generalised linear mixed effect and latent variable models have several advantages over the other models we consider and are also capable of interpretation; importantly, we are able to draw clinical conclusions about which novel TB vaccine candidates appear to be the most promising.
|
570 |
Nelson Siegel parameterisation of the South African Sovereign Yield Curve: an exploration of its predictors, a link to the main asset classes and implementation of systematic trading strategiesPetousis, Thalia January 2014 (has links)
Includes bibliographical references. / The aims of this research are firstly to model the South African Local Government Bond Yield curve according to the Nelson Siegel Parameterisation framework, as implemented in the pivotal work of Diebold and Li (2006) in forecasting the US Treasury curve.
|
Page generated in 0.0571 seconds