• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6867
  • 727
  • 652
  • 593
  • 427
  • 427
  • 427
  • 427
  • 427
  • 424
  • 342
  • 133
  • 119
  • 111
  • 108
  • Tagged with
  • 13129
  • 2380
  • 2254
  • 2048
  • 1772
  • 1657
  • 1447
  • 1199
  • 1066
  • 904
  • 858
  • 776
  • 760
  • 741
  • 739
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
551

Functional Component Analysis and Regression Using Elastic Methods

Unknown Date (has links)
Constructing generative models for functional observations is an important task in statistical function analysis. In general, functional data contains both phase (or x or horizontal) and amplitude (or y or vertical) variability. Traditional methods often ignore the phase variability and focus solely on the amplitude variation, using cross-sectional techniques such as functional principal component analysis for dimensional reduction and regression for data modeling. Ignoring phase variability leads to a loss of structure in the data, and inefficiency in data models. Moreover, most methods use a "pre-processing'' alignment step to remove the phase-variability; without considering a more natural joint solution. This dissertation presents three approaches to this problem. The first relies on separating the phase (x-axis) and amplitude (y-axis), then modeling these components using joint distributions. This separation in turn, is performed using a technique called elastic alignment of functions that involves a new mathematical representation of functional data. Then, using individual principal components, one for each phase and amplitude components, it imposes joint probability models on principal coefficients of these components while respecting the nonlinear geometry of the phase representation space. The second combines the phase-variability into the objective function for two component analysis methods, functional principal component analysis and functional principal least squares. This creates a more complete solution, as the phase-variability is removed while simultaneously extracting the components. The third approach combines the phase-variability into the functional linear regression model and then extends the model to logistic and multinomial logistic regression. Through incorporating the phase-variability a more parsimonious regression model is obtained and therefore, more accurate prediction of observations is achieved. These models then are easily extended from functional data to curves (which are essentially functions in R2) to perform regression with curves as predictors. These ideas are demonstrated using random sampling for models estimated from simulated and real datasets, and show their superiority over models that ignore phase-amplitude separation. Furthermore, the models are applied to classification of functional data and achieve high performance in applications involving SONAR signals of underwater objects, handwritten signatures, periodic body movements recorded by smart phones, and physiological data. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester, 2014. / May 20, 2014. / Amplitude Variability, Functional Data Analysis, Function Alignment, Functional Regression, Function Principal Component Analysis, Phase Variability / Includes bibliographical references. / Anuj Srivastava, Professor Co-Directing Dissertation; Wei Wu, Professor Co-Directing Dissertation; Eric Klassen, University Representative; Fred Huffer, Committee Member.
552

Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity

Unknown Date (has links)
The age of big data has re-invited much interest in dimension reduction. How to cope with high-dimensional data remains a difficult problem in statistical learning. In this study, we consider the task of dimension reduction---projecting data into a lower-rank subspace while preserving maximal information. We investigate the pitfalls of classical PCA, and propose a set of algorithm that functions under high dimension, extends to all exponential family distributions, performs feature selection at the mean time, and takes missing value into consideration. Based upon the best performing one, we develop the SG-PCA algorithm. With acceleration techniques and a progressive screening scheme, it demonstrates superior scalability and accuracy compared to existing methods. Concerned with the independence assumption of dimension reduction techniques, we propose a novel framework, the Generalized Indirect Dependency Learning (GIDL), to learn and incorporate association structure in multivariate statistical analysis. Without constraints on the particular distribution of the data, GIDL takes any pre-specified smooth loss function and is able to both extract and infuse its association into the regression, classification or dimension reduction problem. Experiments at the end serve to demonstrate its efficacy. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2016. / March 29, 2016. / Includes bibliographical references. / Yiyuan She, Professor Directing Dissertation; Teng Ma, University Representative; Xufeng Niu, Committee Member; Debajyoti Sinha, Committee Member; Elizabeth Slate, Committee Member.
553

Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques

Unknown Date (has links)
Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The measures of interest for our research include the area under the ROC curve, Brier Score, discrimination slope, Log-Loss, R-squared and F-score. To examine the underlying relationships among all of the measures, real data and simulation studies are used. The real data comes from multiple cardiovascular research studies and the simulation studies are run under general conditions and also for incidence rates ranging from 2% to 50%. The results of these analyses provide insight into the relationships among the measures and raise concern for scenarios when the measures may yield different conclusions. The impact of incidence rate on the relationships provides a basis for exploring alternative maximization routines to logistic regression. While most of the measures are easily optimized using the Newton-Raphson algorithm, the maximization of the area under the ROC curve requires optimization of a non-linear, non-differentiable function. Usage of the Nelder-Mead simplex algorithm and close connections to economics research yield unique parameter estimates and general asymptotic conditions. Using real and simulated data to compare optimizing the area under the ROC curve to logistic regression further reveals the impact of incidence rate on the relationships, significant increases in achievable areas under the ROC curve, and differences in conclusions about including a variable in a model. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2016. / April 8, 2016. / auc, brier score, incidence rate, logistic regression, optimization / Includes bibliographical references. / Daniel McGee, Professor Co-Directing Thesis; Elizabeth Slate, Professor Co-Directing Thesis; Isaac Eberstein, University Representative; Fred Huffer, Committee Member.
554

Intensity Estimation in Poisson Processes with Phase Variability

Unknown Date (has links)
Intensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. However, current methods of intensity estimation assume phase variability or compositional noise, i.e. a nonlinear shift along the time axis, is nonexistent in the data which is an unreasonable assumption for practical observations. The key challenge is that these observations are not "aligned'', and registration procedures are required for successful estimation. As a result, these estimation methods can yield estimators that are inefficient or that under-perform in simulations and applications. This dissertation summarizes two key projects which examine estimation of the intensity of a Poisson process in the presence of phase variability. The first project proposes an alignment-based framework for intensity estimation. First, it is shown that the intensity function is area-preserved with respect to compositional noise. Such a property implies that the time warping is only encoded in the density, or normalized intensity, function. Then, the intensity function can be decomposed into the product of the estimated total intensity (a scalar value) and the estimated density function. The estimation of the density relies on a metric which measures the phase difference between two density functions. An asymptotic study shows that the proposed estimation algorithm provides a consistent estimator for the normalized intensity. The success of the proposed estimation algorithm is illustrated using two simulations and the new framework is applied in a real data set of neural spike trains, showing that the proposed estimation method yields improved classification accuracy over previous methods. The second project utilizes 2014 Florida data from the Healthcare Cost and Utilization Project's State Inpatient Database and State Emergency Department Database (provided to the U.S. Department of Health and Human Services, Agency for Healthcare Research and Quality by the Florida Agency for Health Care Administration) to examine heart failure emergency department arrival times. Current estimation methods for examining emergency department arrival data ignore the functional nature of the data and implement naive analysis methods. In this dissertation, the arrivals are treated as a Poisson process and the intensity of the process is estimated using existing density estimation and function registration methods. The results of these analyses show the importance of considering the functional nature of emergency department arrival data and the critical role that function registration plays in the intensity estimation of the arrival process. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2016. / October 7, 2016. / emergency department utilization, functional data analysis, function registration, intensity estimation, Poisson process, spike train / Includes bibliographical references. / Wei Wu, Professor Directing Dissertation; James Whyte, IV, University Representative; Anuj Srivastava, Committee Member; Eric Chicken, Committee Member.
555

Sparse Feature and Element Selection in High-Dimensional Vector Autoregressive Models

Unknown Date (has links)
This thesis is to identify the underlying structures of multivariate time series and propose a methodology to construct predictive VAR models. Due to the complexity of high dimensions in multivariate time series, forecasting a target series with many predictors in VAR models poses a challenge in statistical learning and modeling. The quadratically increasing dimension of parameter space, which is known as "curse of dimensionality" poses considerable challenges to multivariate time series models. Meanwhile, there are two facts involved in reducing dimensions in multivariate time series: first, some nuisance time series exist and better to be removed, second a target time series is typically driven by few dependent elements constructed from some indices. To address these challenge and facts, our approach is to reduce both the dimensions of the series and the features involved in each series simultaneously. As a result, the original high dimensional structure can be modeled using a lower dimensional time series, and subsequently the forecasting performance will be improved. The methodology we introduced in this work is called Sparse Feature and Element Selection (SFES). It employs a "L1 + group L1" penalty to conduct group selection and variable selection within each group simultaneously. Our contributions in this thesis are two-folds. First, the doubly-constrained regularization in SFES is a convex mathematical problem, and we optimize it using a fast but simple-to-implement algorithm. We evaluate this algorithm with a large-scale dataset and theoretically prove that it has guaranteed strict iterative convergence and global optimality. Second, we theoretically present non-asymptotic results based on combined statistical and computational analysis. A sharp oracle inequality is proved to reveal its power in predictive learning. We compare SFES with the related work of Sparse Group Lasso (SGL) to show that the proposed method is both computationally efficient and theoretically justified. Experiments using simulation data and real-world macroeconomic time series data are conducted to demonstrate the efficiency and efficacy of the proposed SFES in practice. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2016. / October 28, 2016. / consistency, Feature Selection, Spase, VAR / Includes bibliographical references. / Xufeng Niu, Professor Co-Directing Dissertation; Yiyuan She, Professor Co-Directing Dissertation; Yingmei Cheng, University Representative; Fred Huffer, Committee Member; Wei Wu, Committee Member.
556

First Steps towards Image Denoising under Low-Light Conditions

Unknown Date (has links)
The application of noise reduction or performing denoising on an image is a very important topic in the field of computer vision and computational photography. Many popular state of the art denoising algorithms are trained and evaluated using images with artificial noise. These trained algorithms and their evaluations on synthetic data may lead to incorrect conclusions about their performances. In this paper we will first introduce a benchmark dataset of uncompressed color images corrupted by natural noise due to low-light conditions, together with spatially and intensity-aligned low noise images of the same scenes. The dataset contains over 100 scenes and more than 500 images, including both RAW formatted images and 8 bit BMP pixel and intensity aligned images. We will also introduce a method for estimating the true noise level in each of our images, since even the low noise images contain a small amount of noise. Through this noise estimation method we will construct a convolutional neural network model for automatic noise estimation in single noisy images. Finally, we improve upon a state-of-the-art denoising algorithm Block Matching through 3D filtering (BM3D) by learning a specialized denoising parameter using another developed convolutional neural network. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the Doctor of Philosophy. / Fall Semester 2016. / November 18, 2016. / Convolutional Neural Networks, Image Denoising, Machine Learning, Mobile Phone RAW data, RAW Uncompressed Images / Includes bibliographical references. / Anke Meyer-Baese, University Representative; Antonio Linero, Committee Member; Jinfeng Zhang, Committee Member.
557

Modeling Multivariate Data with Parameter-Based Subspaces

Unknown Date (has links)
When modeling multivariate data such as vectorized images, one might have an extra parameter of contextual information that could be used to treat some observations as more similar to others. For example, images of faces can vary by yaw rotation, and one would expect a face rotated 65 degrees to the left to have characteristics more similar to a face rotated 55 degrees to the left than to a face rotated 65 degrees to the right. We introduce a novel method, parameterized principal component analysis (PPCA), that can model data with linear variation like principal component analysis (PCA), but can also take advantage of this parameter of contextual information like yaw rotation. Like PCA, PPCA models an observation using a mean vector and the product of observation-specific coefficients and basis vectors. Unlike PCA, PPCA treats the elements of the mean vector and basis vectors as smooth, piecewise linear functions of the contextual parameter. PPCA is fit by a penalized optimization that penalizes potential models which have overly large differences between corresponding mean or basis vector elements for similar parameter values. The penalty ensures that each observation's projection will share information with observations that have similar parameter values, but not with observations that have dissimilar parameter values. We tested PPCA on artificial data based on known, smooth functions of an added parameter, as well as on three real datasets with different types of parameters. We compared PPCA to independent principal component analysis (IPCA), which groups observations by their parameter values and projects each group using principal component analysis with no sharing of information for different groups. PPCA recovers the known functions with less error and projects the datasets' test set observations with consistently less reconstruction error than IPCA does. PPCA's performance is particularly strong, relative to IPCA, when there are limited training data. We also tested the use of spectral clustering to form the groups in an IPCA model. In our experiment, the clustered IPCA model had very similar error to the parameter-based IPCA model, suggesting that spectral clustering might be a viable alternative if one did not know the parameter values for an application. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2016. / May 18, 2016. / Includes bibliographical references. / Adrian Barbu, Professor Directing Dissertation; Anke Meyer-Baese, University Representative; Yihuan She, Committee Member; Jinfeng Zhang, Committee Member.
558

Bayesian Inference and Novel Models for Survival Data with Cured Fraction

Unknown Date (has links)
Existing cure-rate survival models are generally not convenient for modeling and estimating the survival quantiles of a patient with specified covariate values. They also do not allow inference on the change in the number of clonogens over time. This dissertation proposes two novel classes of cure-rate model, the transform-both-sides cure-rate model (TBSCRM) and the clonogen proliferation cure-rate model (CPCRM). Both can be used to make inference about both the cure-rate and the survival probabilities over time. The TBSCRM can also produce estimates of a patient's quantiles of survival time, and the CPCRM can produce estimates of a patient's expected number of clonogens at each time. We develop methods of Bayesian inference about the covariate effects on relevant quantities such as the cure-rate, methods which use Markov Chain Monte Carlo (MCMC) tools. We also show that the TBSCRM-based and CPCRM-based Bayesian methods perform well in simulation studies and outperform existing cure-rate models in application to the breast cancer survival data from the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) database. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2016. / July 14, 2016. / Includes bibliographical references. / Debajyoti Sinha, Professor Directing Dissertation; Robert Glueckauf, University Representative; Elizabeth Slate, Committee Member; Debdeep Pati, Committee Member.
559

Investigating the Chi-Square-Based Model-Fit Indexes for WLSMV and ULSMV Estimators

Unknown Date (has links)
In structural equation modeling (SEM), researchers use the model chi-square statistic and model-fit indexes to evaluate model-data fit. Root mean square error of approximation (RMSEA), comparative fit index (CFI), and Tucker-Lewis index (TLI) are widely applied model-fit indexes. When data are ordered and categorical, the most popular estimator is the diagonally weighted least squares (DWLS) estimator. Robust corrections have been proposed to adjust the uncorrected chi-square statistic from DWLS so that its first and second order moments are in alignment with the target central chi-square distribution under correctly specified models. DWLS with such a correction is called the mean- and variance-adjusted weighted least squares (WLSMV) estimator. An alternative to WLSMV is the mean-and variance-adjusted unweighted least squares (ULSMV) estimator, which has been shown to perform as well as, or slightly better than WLSMV. Because the chi-square statistic is corrected, the chi-square-based RMSEA, CFI, and TLI are thus also corrected by replacing the uncorrected chi-square statistic with the robust chi-square statistic. The robust model fit indexes calculated in such a way are named as the population-corrected robust (PR) model fit indexes following Brosseau-Liard, Savalei, and Li (2012). The PR model fit indexes are currently reported in almost every application when WLSMV or ULSMV is used. Nevertheless, previous studies have found the PR model fit indexes from WLSMV are sensitive to several factors such as sample sizes, model sizes, and thresholds for categorization. The first focus of this dissertation is on the dependency of model fit indexes on the thresholds for ordered categorical data. Because the weight matrix in the WLSMV fit function and the correction factors for both WLSMV and ULSMV include the asymptotic variances of thresholds and polychoric correlations, the model fit indexes are very likely to depend on the thresholds. The dependency of model fit indexes on the thresholds is not a desirable property, because when the misspecification lies in the factor structures (e.g., cross loadings are ignored or two factors are considered as a single factor), model fit indexes should reflect such misspecification rather than the threshold values. As alternatives to the PR model fit indexes, Brosseau-Liard et al. (2012), Brosseau-Liard and Savalei (2014), and Li and Bentler (2006) proposed the sample-corrected robust (SR) model fit indexes. The PR fit indexes are found to converge to distorted asymptotic values, but the SR fit indexes converge to their definitions asymptotically. However, the SR model fit indexes were proposed for continuous data, and have been neither investigated nor implemented in SEM software when WLSMV and ULSMV are applied. This dissertation thus investigates the PR and SR model fit indexes for WLSMV and ULSMV. The first part of the simulation study examines the dependency of the model fit indexes on the thresholds when the model misspecification results from omitting cross-loadings or collapsing factors in confirmatory factor analysis. The study is conducted on extremely large computer-generated datasets in order to approximate the asymptotic values of model fit indexes. The results find that only the SR fit indexes from ULSMV are independent of the population threshold values, given the other design factors. The PR fit indexes from ULSMV, and the PR and SR fit indexes from WLSMV are influenced by thresholds, especially when data are binary and the hypothesized model is greatly misspecified. The second part of the simulation varies the sample sizes from 100 to 1000 to investigate whether the SR fit indexes under finite samples are more accurate estimates of the defined values of RMSEA, CFI, and TLI, compared with the uncorrected model fit indexes without robust correction and the PR fit indexes. Results show that the SR fit indexes are the more accurate in general. However, when the thresholds are different across items, data are binary, and sample size is less than 500, all versions of these indexes can be very inaccurate. In such situations, larger sample sizes are needed. In addition, the conventional cutoffs developed from continuous data with maximum likelihood (e.g., RMSEA < .06, CFI > .95, and TLI > .95; Hu & Bentler, 1999) have been applied to WLSMV and ULSMV regardless of the arguments against such a practice (e.g., Marsh, Hau, & Wen, 2004). For comparison purposes, this dissertation reports the RMSEA, CFI, and TLI based on continuous data using maximum likelihood before the variables are categorized to create ordered categorical data. Results show that the model fit indexes from maximum likelihood are very different from those from WLSMV and ULSMV, suggesting that the conventional rules should not be applied to WLSMV and ULSMV. / A Dissertation submitted to the Department of Educational Psychology and Learning Systems in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2016. / July 5, 2016. / Model Fit Indexes, Ordered Categorical Data, Structural Equation Modeling, ULSMV, WLSMV / Includes bibliographical references. / Yanyun Yang, Professor Directing Dissertation; Fred W. Huffer, University Representative; Russell G. Almond, Committee Member; Betsy J. Becker, Committee Member; Insu Paek, Committee Member.
560

Nonparametric Detection of Arbitrary Changes to Distributions and Methods of Regularization of Piecewise Constant Functional Data

Unknown Date (has links)
Nonparametric statistical methods can refer a wide variety of techniques. In this dissertation, we focus on two problems in statistics which are common applications of nonparametric statistics. The main body of the dissertation focuses on distribution-free process control for detection of arbitrary changes to the distribution of an underlying random variable. A secondary problem, also part of the broad umbrella of nonparametric statistics, is the proper approximation of a function. Statistical process control minimizes disruptions to a properly controlled process and quickly terminates out of control processes. Although rarely satisfied in practice, strict distributional assumptions are often needed to monitor these processes. Previous models have often exclusively focused on monitoring changes in the mean or variance of the underlying process. The proposed model establishes a monitoring method requiring few distributional assumptions while monitoring all changes in the underlying distribution generating the data. No assumptions on the form of the in-control distribution are made other than independence within and between observed samples. Windowing is employed to reduce computational complexity of the algorithm as well as ensure fast detection of changes. Results indicate quicker detection of large jumps than in many previously established methods. It is now common to analyze large quantities of data generated by sensors over time. Traditional analysis techniques do not incorporate the inherent functional structure often present in this type of data. The second focus of this dissertation is the development of a analysis method for functional data where the range of the function has a discrete, ordinal structure. Use is made of spline based methods using a piecewise constant function approximation. After a large amount of data reduction is achieved, generalized linear mixed model methodology is employed in order to model the data. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2017. / April 6, 2017. / functional data, nonparametric, process control, regularization / Includes bibliographical references. / Eric Chicken, Professor Directing Dissertation; Guosheng Liu, University Representative; Debdeep Pati, Committee Member; Minjing Tao, Committee Member.

Page generated in 0.0965 seconds