• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6867
  • 727
  • 652
  • 593
  • 427
  • 427
  • 427
  • 427
  • 427
  • 424
  • 342
  • 133
  • 119
  • 111
  • 108
  • Tagged with
  • 13129
  • 2380
  • 2254
  • 2048
  • 1772
  • 1657
  • 1447
  • 1199
  • 1066
  • 904
  • 858
  • 776
  • 760
  • 741
  • 739
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
501

Estimation and Sequential Monitoring of Nonlinear Functional Responses Using Wavelet Shrinkage

Unknown Date (has links)
Statistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an in-control process, and Phase II, in which new data is monitored for deviations from the in-control form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector). Recently however, technological advances have resulted in processes in which each observation is actually an n-dimensional functional response (referred to as a profile), where n can be quite large. Additionally, these profiles are often unable to be adequately represented parametrically, making traditional SPC techniques inapplicable. This dissertation starts out by addressing the problem of nonparametric function estimation, which would be used to analyze process data in a Phase-I setting. The translation invariant wavelet estimator (TI) is often used to estimate irregular functions, despite the drawback that it tends to oversmooth jumps. A trimmed translation invariant estimator (TTI) is proposed, of which the TI estimator is a special case. By reducing the point by point variability of the TI estimator, TTI is shown to retain the desirable qualities of TI while improving reconstructions of functions with jumps. Attention is then turned to the Phase-II problem of monitoring sequences of profiles for deviations from in-control. Two profile monitoring schemes are proposed; the first monitors for changes in the noise variance using a likelihood ratio test based on the highest detail level of wavelet coefficients of the observed profile. The second offers a semiparametric test to monitor for changes in both the functional form and noise variance. Both methods make use of wavelet shrinkage in order to distinguish relevant functional information from noise contamination. Different forms of each of these test statistics are proposed and results are compared via Monte Carlo simulation. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2012. / March 30, 2012. / ARL, Nonparametric, Profiles, Statistical Process Control, Translation Invariant, Wavelets / Includes bibliographical references. / Eric Chicken, Professor Directing Thesis; John Sobanjo, University Representative; Xufeng Niu, Committee Member; Wei Wu, Committee Member.
502

Geometric Approaches for Analysis of Images, Densities and Trajectories on Manifolds

Unknown Date (has links)
In this dissertation, we focus on the problem of analyzing high-dimensional functional data using geometric approaches. The term functional data refers to images, densities and trajectories on manifolds. The nature of these data imposes difficulties on statistical analysis. First, the objects are functional type of data which are infinite dimensional. One needs to explore the possible representations of each type such that the representations can facilitate the future statistical analysis. Second, the representation spaces are often nonlinear manifolds. Thus, proper Riemannian structures are necessary to compare objects. Third, the analysis and comparison of objects need be invariant to certain nuisance variables. For example, comparison between two images should be invariant to their blur levels, and comparison between time-indexed trajectories on manifolds should be invariant to their temporal evaluation rates. We start by introducing frameworks for representing, comparing and analyzing functions in Euclidean space including signals, images and densities, and the comparisons are invariant to the Gaussian blur existed in these objects. Applications in blur levels matching, blurred image recognition, image classification and two-sample hypothesis test are discussed. Next, we present frameworks for analyzing longitudinal trajectories on a manifold M, while the analysis is invariant to the reparameterization action (temporal variation). Particularly, we are interested in analyzing trajectories in two manifolds: the two-sphere and the set of symmetric positive-definite matrices. Applications such as bird migration and hurricane tracks analysis, visual speech recognition and hand gesture recognition are used to demonstrate the advantages of the proposed frameworks. In the end, a Bayesian framework for clustering of shapes of curves is presented, and examples of clustering cell shapes and protein structures are discussed. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2015. / March 18, 2015. / Includes bibliographical references. / Anuj Srivastava, Professor Directing Dissertation; Eric Klassen, University Representative; Wei Wu, Committee Member; Debdeep Pati, Committee Member.
503

Methods of Block Thresholding Across Multiple Resolution Levels in Adaptive Wavelet Estimation

Unknown Date (has links)
Blocking methods of thresholding have demonstrated many advantages over term-by-term methods in adaptive wavelet estimation. These blocking methods are resolution-level specific, meaning the coefficients are grouped together only within the same resolution level. Techniques have not yet been proposed for blocking across multiple resolution levels and do not take into consideration varying shapes of blocks for wavelet coefficients. Presently, several methods of block thresholding across multiple resolution levels are described. Various simulation studies analyze the use of these methods among nonparametric functions, including comparisons to other blocking and non-blocking wavelet thresholding methods. The introduction of a this new technique questions when this method will be advantageous over resolution-level specific methods. Another simulation study demonstrates a method of statistically selecting when blocking across resolution levels is beneficial over traditional techniques. Additional analysis will conclude how effective the automated selection method is in both simulation and if put into practice. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2015. / July 14, 2015. / Includes bibliographical references. / Eric Chicken, Professor Directing Dissertation; Kathleen Clark, University Representative; Debdeep Pati, Committee Member; Debajyoti Sinha, Committee Member.
504

Nonparametric Estimation of Three Dimensional Projective Shapes with Applications in Medical Imaging and in Pattern Recognition

Unknown Date (has links)
This dissertation is on analysis of invariants of a 3D configuration from its 2D images in pictures of this configuration, without requiring any restriction on the camera positioning relative to the scene pictured. We briefly review some of the main results found in the literature. The methodology used is nonparametric, manifold based combined with standard computer vision re- construction techniques. More specifically, we use asymptotic results for the extrinsic sample mean and the extrinsic sample covariance to construct boot- strap confidence regions for mean projective shapes of 3D configurations. Chapters 4, 5 and 6 contain new results. In chapter 4, we develop tests for coplanarity. In chapter 5, is on reconstruction of 3D polyhedral scenes, including texture from arbitrary partial views. In chapter 6, we develop a nonparametric methodology for estimating the mean change for matched samples on a Lie group. We then notice that for k '' 4, a manifold of projective shapes of k-ads in general position in 3D has a structure of 3k and #8722; 15 dimensional Lie group (P-Quaternions) that is equivariantly embedded in an Euclidean space, therefore testing for mean 3D projective shape change amounts to a one sample test for extrinsic mean PQuaternion Objects. The Lie group technique leads to a large sample and nonparametric bootstrap test for one population extrinsic mean on a projective shape space, as recently developed by Patrangenaru, Liu and Sughatadasa [1]. On the other hand, in absence of occlusions, the 3D projective shape of a spatial configuration can be recovered from a stereo pair of images, thus allowing to test for mean glaucomatous 3D projective shape change detection from standard stereo pairs of eye images. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester, 2010. / May 3, 2010. / Includes bibliographical references. / Victor Patrangenaru, Professor Directing Dissertation; Xiuwen Liu, University Representative; Fred W. Huffer, Committee Member; Debajyoti Sinha, Committee Member.
505

A Statistical Approach for Information Extraction of Biological Relationships

Unknown Date (has links)
Vast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Text mining has four major components. First relevant articles are identified through information retrieval (IR), next important concepts and terms are flagged using entity recognition (ER), and then relationships between these entities are extracted from the literature in a process called information extraction(IE). Finally, text mining takes these elements and seeks to synthesize new information from the literature. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and small-molecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classifiers in an ensemble approach. The three classifiers we consider are Bayesian Networks, Support Vector Machines, and a mixture of logistic models defined by interaction word. The three classifiers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and cross-corpus validation to replicate an application scenario. The three classifiers are unique and we find that performance of individual classifiers varies depending on the corpus. Therefore, an ensemble of classifiers removes the need to choose one classifier and provides optimal performance. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philisophy. / Summer Semester, 2011. / June 9, 2011. / protein, protein interaction, information extraction / Includes bibliographical references. / Jinfeng Zhang, Professor Co-Directing Dissertation; Xufeng Niu, Professor Co-Directing Dissertation; Gary Tyson, University Representative; Fred Huffer, Committee Member.
506

Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis

Unknown Date (has links)
Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the variable selection problem in the context of logistic regression. Specifically, we investigated the merits of the bootstrap, ridge regression, the lasso and Bayesian model averaging (BMA) as variable selection techniques when highly correlated predictors and a dichotomous outcome are considered. This dissertation also contributes to the literature on the diet-heart hypothesis. The diet-heart hypothesis has been around since the early twentieth century. Since then, researchers have attempted to isolate the nutrients in diet that promote coronary heart disease (CHD). After a century of research, there is still no consensus. In our current research, we used some of the more recent statistical methodologies (mentioned above) to investigate the effect of twenty dietary variables on the incidence of coronary heart disease. Logistic regression models were generated for the data from the Honolulu Heart Program - a study of CHD incidence in men of Japanese descent. Our results were largely method-specific. However, regardless of method considered, there was strong evidence to suggest that alcohol consumption has a strong protective effect on the risk of coronary heart disease. Of the variables considered, dietary cholesterol and caffeine were the only variables that, at best, exhibited a moderately strong harmful association with CHD incidence. Further investigation that includes a broader array of food groups is recommended. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester, 2009. / August 10, 2009. / Logistic Regression, Bootstrap, Lasso, Ridge Regression, Bayesian Model Averaging, Diet-Heart Hypothesis / Includes bibliographical references. / Daniel McGee, Professor Directing Dissertation; Isaac Eberstein, University Representative; Fred Huffer, Committee Member; Debajyoti Sinha, Committee Member; Yiyuan She, Committee Member.
507

Semiparametric Bayesian Regression Models for Skewed Responses

Unknown Date (has links)
It is common to encounter skewed response data in medicine, epidemiology and health care studies. Methodology needs to be devised to overcome the natural difficulties that occur in analyzing such data particularly when it is multivariate. Existing Bayesian statistical methods to deal with skewed data are mostly fully parametric. We propose novel semiparametric Bayesian methods to model an analyze such data. These methods make minimal assumptions about the true form of the distribution and structure of the observed data. Through examples from real life studies, we demonstrate practical advantages of our semiparametric Bayesian methods over the existing methods. For many real-life studies with skewed multivariate responses, the level of skewness and association structure assumptions are essential for evaluating the covariate effects on the response and its predictive distribution. First, we present a novel semiparametric multivariate model class leading to a theoretically justifiable semiparametric Bayesian analysis of multivariate skewed responses. Like the multivariate Gaussian densities, this multivariate model is closed under marginalization, allows a wide class of multivariate associations, and has meaningful physical interpretations of skewness levels and covariate effects on the marginal density. Compared to existing models, our model enjoys several desirable practical properties, including Bayesian computing via available software, and assurance of consistent Bayesian estimates of parameters and the nonparametric error density under a set of plausible prior assumptions. We introduce a particular parametric version of the model as an alternative to various parametric skew-symmetric models available in the literature. We illustrate the practical advantages of our methods over existing parametric alternatives via application to a clinical study to assess periodontal disease and through a simulation study. Unlike most of the models existing in literature, this class of models advocates a latent variable approach making implementation under the Bayesian paradigm via standard software for MCMC computation like WinBUGS/JAGS straightforward. Although, JAGS and WinBUGS are flexible MCMC engines, for complex model structures they tend to be rather slow. We offer an alternative tool to implement the aforementioned parametric version of the models using PROC MCMC in SAS. Our goal is to facilitate and encourage more extensive implementation of these models. To achieve this goal, we illustrate the implementation using PROC MCMC in SAS via examples from real life and provide a full annotated SAS code. In large scale national surveys, we often come across skewed data as well as semicontinuous data, that is, data characterized by point mass at zero (degenerate) and right skewed continuous distribution on positive support. For example, in the Medical Expenditure Panel Survey (MEPS), the variable total health care expenditure (i.e., the response) for non-users of the health care services is zero, whereas for the users it is has continuous distribution typically skewed towards the right. We provide an overview of the existing models and methods to analyze such data. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 20, 2018. / Dirichlet Process, Kernel Density, Markov Chain Monte Carlo, Semicontinuous, Skewed Response / Includes bibliographical references. / Debajyoti Sinha, Professor Directing Dissertation; Sachin Shanbhag, University Representative; Antonio Linero, Committee Member; Jonathan Bradley, Committee Member; Debdeep Pati, Committee Member; Stuart Lipsitz, Committee Member.
508

Spatial Mortality Modeling in Actuarial Science

January 2020 (has links)
abstract: Modeling human survivorship is a core area of research within the actuarial com munity. With life insurance policies and annuity products as dominant financial instruments which depend on future mortality rates, there is a risk that observed human mortality experiences will differ from projected when they are sold. From an insurer’s portfolio perspective, to curb this risk, it is imperative that models of hu man survivorship are constantly being updated and equipped to accurately gauge and forecast mortality rates. At present, the majority of actuarial research in mortality modeling involves factor-based approaches which operate at a global scale, placing little attention on the determinants and interpretable risk factors of mortality, specif ically from a spatial perspective. With an abundance of research being performed in the field of spatial statistics and greater accessibility to localized mortality data, there is a clear opportunity to extend the existing body of mortality literature to wards the spatial domain. It is the objective of this dissertation to introduce these new statistical approaches to equip the field of actuarial science to include geographic space into the mortality modeling context. First, this dissertation evaluates the underlying spatial patterns of mortality across the United States, and introduces a spatial filtering methodology to generate latent spatial patterns which capture the essence of these mortality rates in space. Second, local modeling techniques are illustrated, and a multiscale geographically weighted regression (MGWR) model is generated to describe the variation of mortality rates across space in an interpretable manner which allows for the investigation of the presence of spatial variability in the determinants of mortality. Third, techniques for updating traditional mortality models are introduced, culminating in the development of a model which addresses the relationship between space, economic growth, and mortality. It is through these applications that this dissertation demonstrates the utility in updating actuarial mortality models from a spatial perspective. / Dissertation/Thesis / Doctoral Dissertation Statistics 2020
509

Sparse Factor Auto-Regression for Forecasting Macroeconomic Time Series with Very Many Predictors

Unknown Date (has links)
Forecasting a univariate target time series in high dimensions with very many predictors poses challenges in statistical learning and modeling. First, many nuisance time series exist and need to be removed. Second, from economic theories, a macroeconomic target series is typically driven by few latent factors constructed from some macroeconomic indices. Consequently, a high dimensional problem arises where deleting junk time series and constructing predictive factors simultaneously, are meaningful and advantageous for accuracy of the forecasting task. In macroeconomics, multiple categories are available with the target series belonging to one of them. With all series available we advocate constructing category level factors to enhance the performance of the forecasting task. We introduce a novel methodology, the Sparse Factor Auto-Regression (SFAR) methodology, to construct predictive factors from a reduced set of relevant time series. SFAR attains dimension reduction via joint variable selection and rank reduction in high dimensional time series data. A multivariate setting is used to achieve simultaneous low rank and cardinality control on the matrix of coefficients where $ell_{0}$-constraint regulates the number of useful series and the rank constrain elucidates the upper bound for constructed factors. The doubly-constrained matrix is a nonconvex mathematical problem optimized via an efficient iterative algorithm with a theoretical guarantee of convergence. SFAR fits factors using a sparse low rank matrix in response to a target category series. Forecasting is then performed using lagged observations and shrinkage methods. We generate a finite sample data to verify our theoretical findings via a comparative study of the SFAR. We also analyze real-world macroeconomic time series data to demonstrate the usage of the SFAR in practice. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester, 2014. / July 7, 2014. / $Ell_{0}$-Constraint, Factor Model, Forecasting, Group Sparsity, Progressive Screening, Reduced-Rank Regression / Includes bibliographical references. / Yiyuan She, Professor Directing Dissertation; Giray Okten, University Representative; Paul Beaumont, Committee Member; Fred Huffer, Committee Member; Minjing Tao, Committee Member.
510

Some methods for robust inference in econometric factor models and in machine learning

Nikolaev, Nikolay Ivanov 22 January 2016 (has links)
Traditional multivariate statistical theory and applications are often based on specific parametric assumptions. For example it is often assumed that data follow (nearly) normal distribution. In practice such assumption is rarely true and in fact the underlying data distribution is often unknown. Violations of the normality assumption can be detrimental in inference. In particular, two areas affected by violations of assumptions are quadratic discriminant analysis (QDA), used in classification, and principal component analysis (PCA), commonly employed in dimension reduction. Both PCA and QDA involve the computation of empirical covariance matrices of the data. In econometric and financial data, non-normality is often associated with heavy-tailed distributions and such distributions can create significant problems in computing sample covariance matrix. Furthermore, in PCA non-normality may lead to erroneous decisions about numbers of components to be retained due to unexpected behavior of empirical covariance matrix eigenvalues. In the first part of the dissertation, we consider the so called number-of-factors problem in econometric and financial data, which is related to the number of sources of variations (latent factors) that are common to a set of variables observed multiple times (as in time series). The approach that is commonly used in the literature is the PCA and examination of the pattern of the related eigenvalues. We employ an existing technique for robust principal component analysis, which produces properly estimated eigenvalues that are then used in an automatic inferential procedure for the identification of the number of latent factors. In a series of simulation experiments we demonstrate the superiority of our approach compared to other well-established methods. In the second part of the dissertation, we discuss a method to normalize the data empirically so that classical QDA for binary classification can be used. In addition, we successfully overcome the usual issue of large dimension-to-sample-size ratio through regularized estimation of precision matrices. Extensive simulation experiments demonstrate the advantages of our approach in terms of accuracy over other classification techniques. We illustrate the efficiency of our methods in both situations by applying them to real datasets from economics and bioinformatics.

Page generated in 0.096 seconds