• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6757
  • 117
  • 29
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 6757
  • 1456
  • 1226
  • 1216
  • 1130
  • 963
  • 638
  • 636
  • 579
  • 465
  • 462
  • 453
  • 451
  • 404
  • 396
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Elastic Functional Regression Model

Unknown Date (has links)
Functional variables serve important roles as predictors in a variety of pattern recognition and vision applications. Focusing on a specific subproblem, termed scalar-on-function regression, most current approaches adopt the standard L2 inner product to form a link between functional predictors and scalar responses. These methods may perform poorly when predictor functions contain nuisance phase variability, i.e., predictors are temporally misaligned due to noise. While a simple solution could be to pre-align predictors as a pre-processing step, before applying a regression model, this alignment is seldom optimal from the perspective of regression. In this dissertation, we propose a new approach, termed elastic functional regression, where alignment is included in the regression model itself, and is performed in conjunction with the estimation of other model parameters. This model is based on a norm-preserving warping of predictors, not the standard time warping of functions, and provides better prediction in situations where the shape or the amplitude of the predictor is more useful than its phase. We demonstrate the effectiveness of this framework using simulated and real data. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 17, 2018. / Functional Data Analysis, Functional Regression Model, Phase Variation, Scalar-on-Function Regression / Includes bibliographical references. / Anuj Srivastava, Professor Directing Thesis; Eric Klassen, University Representative; Wei Wu, Committee Member; Fred Huffer, Committee Member.
52

Elastic Functional Principal Component Analysis for Modeling and Testing of Functional Data

Unknown Date (has links)
Statistical analysis of functional data requires tools for comparing, summarizing and modeling observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA) is the presence of the phase variability in the observed data. A successful statistical model of functional data has to account for the presence of phase variability. Otherwise the ensuing inferences can be inferior. Recent methods for FDA include steps for phase separation or functional alignment. For example, Elastic Functional Principal Component Analysis (Elastic FPCA) uses the strengths of Functional Principal Component Analysis (FPCA), along with the tools from Elastic FDA, to perform joint phase-amplitude separation and modeling. A related problem in FDA is to quantify and test for the amount of phase in a given data. We develop two types of hypothesis tests for testing the significance of phase variability: a metric-based approach and a model-based approach. The metric-based approach treats phase and amplitude as independent components and uses their respective metrics to apply the Friedman-Rafsky Test, Schilling's Nearest Neighbors, and Energy Test to test the differences between functions and their amplitudes. In the model-based test, we use Concordance Correlation Coefficients as a tool to quantify the agreement between functions and their reconstructions using FPCA and Elastic FPCA. We demonstrate this framework using a number of simulated and real data, including weather, tecator, and growth data. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 19, 2018. / Includes bibliographical references. / Anuj Srivastava, Professor Directing Thesis; Eric Klassen, University Representative; Fred Huffer, Committee Member; Wei Wu, Committee Member.
53

Building a Model Performance Measure for Examining Clinical Relevance Using Net Benefit Curves

Unknown Date (has links)
ROC curves are often used to evaluate predictive accuracy of statistical prediction models. This thesis studies other measures which not only incorporate the statistical but also the clinical consequences of using a particular prediction model. Depending on the disease and population under study, the mis-classification costs of false positives and false negatives vary. The concept of Decision Curve Analysis (DCA) takes this cost into account, by using the threshold probability (the probability above which a patient opts for treatment). Using the DCA technique, a Net Benefit Curve is built by plotting "Net Benefit", a function of the expected benefit and expected harm of using a model, by the threshold probability. Only the threshold probability range that is relevant to the disease and the population under study is used to plot the net benefit curve to obtain the optimum results using a particular statistical model. This thesis concentrates on the process of construction of a summary measure to find which predictive model yields highest net benefit. The most intuitive approach is to calculate the area under the net benefit curve. We examined whether the use of weights such as, the estimated empirical distribution of the threshold probability to compute the weighted area under the curve, creates a better summary measure. Real data from multiple cardiovascular research studies- The Diverse Population Collaboration (DPC) datasets, is used to compute the summary measures: area under the ROC curve (AUROC), area under the net benefit curve (ANBC) and weighted area under the net benefit curve (WANBC). The results from the analysis are used to compare these measures to examine whether these measures are in agreement with each other and which would be the best to use in specified clinical scenarios. For different models the summary measures and its standard errors (SE) were calculated to study the variability in the measure. The method of meta-analysis is used to summarize these estimated summary measures to reveal if there is significant variability among these studies. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 11, 2018. / Area under ROC Curve, Meta analysis, Net Benefit Curve, Predictive Accuracy, Summary Measure, Threshold Probability / Includes bibliographical references. / Daniel L. McGee, Professor Directing Dissertation; Myra Hurt, University Representative; Elizabeth Slate, Committee Member; Debajyoti Sinha, Committee Member.
54

Non-Parametric and Semi-Parametric Estimation and Inference with Applications to Finance and Bioinformatics

Unknown Date (has links)
In this dissertation, we develop tools from non-parametric and semi-parametric statistics to perform estimation and inference. In the first chapter, we propose a new method called Non-Parametric Outlier Identification and Smoothing (NOIS), which robustly smooths stock prices, automatically detects outliers and constructs pointwise confidence bands around the resulting curves. In real- world examples of high-frequency data, NOIS successfully detects erroneous prices as outliers and uncovers borderline cases for further study. NOIS can also highlight notable features and reveal new insights in inter-day chart patterns. In the second chapter, we focus on a method for non-parametric inference called empirical likelihood (EL). Computation of EL in the case of a fixed parameter vector is a convex optimization problem easily solved by Lagrange multipliers. In the case of a composite empirical likelihood (CEL) test where certain components of the parameter vector are free to vary, the optimization problem becomes non-convex and much more difficult. We propose a new algorithm for the CEL problem named the BI-Linear Algorithm for Composite EmPirical Likelihood (BICEP). We extend the BICEP framework by introducing a new method called Robust Empirical Likelihood (REL) that detects outliers and greatly improves the inference in comparison to the non-robust EL. The REL method is combined with CEL by the TRI-Linear Algorithm for Composite EmPirical Likelihood (TRICEP). We demonstrate the efficacy of the proposed methods on simulated and real world datasets. We present a novel semi-parametric method for variable selection with interesting biological applications in the final chapter. In bioinformatics datasets the experimental units often have structured relationships that are non-linear and hierarchical. For example, in microbiome data the individual taxonomic units are connected to each other through a phylogenetic tree. Conventional techniques for selecting relevant taxa either do not account for the pairwise dependencies between taxa, or assume linear relationships. In this work we propose a new framework for variable selection called Semi-Parametric Affinity Based Selection (SPAS), which has the flexibility to utilize struc- tured and non-parametric relationships between variables. In synthetic data experiments SPAS outperforms existing methods and on real world microbiome datasets it selects taxa according to their phylogenetic similarities. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 19, 2018. / Bioinformatics, Empirical likelihood, Finance, Non-parametric, Outlier detection, Variable selection / Includes bibliographical references. / Yiyuan She, Professor Directing Dissertation; Giray Okten, University Representative; Eric Chicken, Committee Member; Xufeng Niu, Committee Member; Minjing Tao, Committee Member.
55

Generalized Mahalanobis Depth in Point Process and Its Application in Neural Coding and Semi-Supervised Learning in Bioinformatics

Unknown Date (has links)
In the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the center-outward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters in the defined depth. In the case of Poisson process, the observed events are order statistics where the parameters can be estimated robustly with respect to sample size. We demonstrate the use of the new depth by ranking realizations from a Poisson process. We also test the new method in classification problems using simulations as well as real neural spike train data. It is found that the new framework provides more accurate and robust classifications as compared to commonly used likelihood methods. In the second project, we demonstrate the value of semi-supervised dimension reduction in clinical area. The advantage of semi-supervised dimension reduction is very easy to understand. Semi-Supervised dimension reduction method adopts the unlabeled data information to perform dimension reduction and it can be applied to help build a more precise prediction model comparing with common supervised dimension reduction techniques. After thoroughly comparing with dimension embedding methods with label data only, we show the improvement of semi-supervised dimension reduction with unlabeled data in breast cancer chemotherapy clinical area. In our semi-supervised dimension reduction method, we not only explore adding unlabeled data to linear dimension reduction such as PCA, we also explore semi-supervised non-linear dimension reduction, such as semi-supervised LLE and semi-supervised Isomap. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / March 21, 2018. / depth, point process, semi-supervised learning / Includes bibliographical references. / Wei Wu, Professor Directing Dissertation; Xiaoqiang Wang, University Representative; Jinfeng Zhang, Committee Member; Qing Mai, Committee Member.
56

Wavelet-Based Bayesian Approaches to Sequential Profile Monitoring

Unknown Date (has links)
We consider change-point detection and estimation in sequences of functional observations. This setting often arises when the quality of a process is characterized by such observations, termed profiles, and monitoring profiles for changes in structure can be used to ensure the stability of the process over time. While interest in profile monitoring has grown, few methods approach the problem from a Bayesian perspective. In this dissertation, we propose three wavelet-based Bayesian approaches to profile monitoring -- the last of which can be extended to a general process monitoring setting. First, we develop a general framework for the problem of interest in which we base inference on the posterior distribution of the change point without placing restrictive assumptions on the form of profiles. The proposed method uses an analytic form of the posterior distribution in order to run online without relying on Markov chain Monte Carlo (MCMC) simulation. Wavelets, an effective tool for estimating nonlinear signals from noise-contaminated observations, enable the method to flexibly distinguish between sustained changes in profiles and the inherent variability of the process. Second, we modify the initial framework in a posterior approximation algorithm designed to utilize past information in a computationally efficient manner. We show that the approximation can detect changes of smaller magnitude better than traditional alternatives for curbing computational cost. Third, we introduce a monitoring scheme that allows an unchanged process to run infinitely long without a false alarm; the scheme maintains the ability to detect a change with probability one. We include theoretical results regarding these properties and illustrate the implementation of the scheme in the previously established framework. We demonstrate the efficacy of proposed methods on simulated data and significantly outperform a relevant frequentist competitor. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 20, 2018. / Includes bibliographical references. / Eric Chicken, Professor Co-Directing Dissertation; Antonio Linero, Professor Co-Directing Dissertation; Kevin Huffenberger, University Representative; Yun Yang, Committee Member.
57

Volatility Matrix Estimation for High-Frequency Financial Data

Unknown Date (has links)
Volatility is usually employed to measure the dispersion of asset returns, and it’s widely used in risk analysis and asset management. This first chapter studies a kernel-based spot volatility matrix estimator with pre-averaging approach for high-frequency data contaminated by market microstructure noise. When the sample size goes to infinity and the bandwidth vanishes, we show that our estimator is consistent and its asymptotic normality is established with achieving an optimal convergence rate. We also construct a consistent pairwise spot co-volatility estimator with Hayashi-Yoshida method for non-synchronous high-frequency data with noise contamination. The simulation studies demonstrate that the proposed estimators work well under different noise levels, and their estimation performances are improved by the increasing sample frequency. In empirical applications, we implement the estimators on the intraday prices of four component stocks of Dow Jones Industrial Average. The second chapter shows a factor-based vast volatility matrix estimation method for high- frequency financial data with market microstructure noise, finite large jumps and infinite activity small jumps. We construct the sample volatility matrix estimator based on the approximate factor model, and use the pre-averaging and thresholding estimation method (PATH) to digest the noise and jumps. After using the principle component analysis (PCA) to decompose the sample volatility matrix estimator, our proposed volatility matrix estimator is finally obtained by imposing the block-diagonal regularization on the residual covariance matrix through sorting the assets with the global industry classification standard (GICS) codes. The Monte Carlo simulation shows that our proposed volatility matrix estimator can remove the majority effects of noise and jumps, and its estimation performance improves fast when the sample frequency increases. Finally, the PCA-based estimators are employed to perform volatility matrix estimation and asset allocation for S&P 500 stocks. To compare with PCA-based estimators, we also include the exchange-traded funds (ETFs) data to construct observable factors such as the Fama-French factors for volatility estimation. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 17, 2018. / Factor Model, High-frequency data, Jumps, Market microstructure noise, PCA, Volatility matrix / Includes bibliographical references. / Minjing Tao, Professor Directing Dissertation; Yingmei Cheng, University Representative; Fred Huffer, Committee Member; Xu-Feng Niu, Committee Member.
58

Tests and Classifications in Adaptive Designs with Applications

Unknown Date (has links)
Statistical tests for biomarker identification and classification methods for patient grouping are two important topics in adaptive designs of clinical trials. In this article, we evaluate four test methods for biomarker identification: a model-based identification method, the popular t-test, the nonparametric Wilcoxon Rank Sum test, and the Least Absolute Shrinkage and Selection Operator (Lasso) method. For selecting the best classification methods in Stage 2 of an adaptive design, we examine classification methods including the recently developed machine learning approaches such as Random Forest, Lasso and Elastic-Net Regularized Generalized Linear Models (Glmnet), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), and Extreme Gradient Boost- ing (XGBoost). Statistical simulations are carried out in our study to assess the performance of biomarker identification methods and the classification methods. The best identification method and the classification technique will be selected based on the True Positive Rate (TPR,also called Sensitivity) and the True Negative Rate (TNR,also called Specificity). The optimal test method for gene identification and classification method for patient grouping will be applied to the Adap- tive Signature Design (ASD) for the purpose of evaluating the performance of ASD in different situations, including simulated data and a real data set for breast cancer patients. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / February 20, 2018. / Includes bibliographical references. / XuFeng Niu, Professor Directing Dissertation; Richard S. Nowakowski, University Representative; Dan McGee, Committee Member; Elizabeth Slate, Committee Member; Jinfeng Zhang, Committee Member.
59

Fused Lasso and Tensor Covariance Learning with Robust Estimation

Unknown Date (has links)
With the increase in computation and data storage, there has been a vast collection of information gained with scientific measurement devices. However, with this increase in data and variety of domain applications, statistical methodology must be tailored to specific problems. This dissertation is focused on analyzing chemical information with an underlying structure. Robust fused lasso leverages information about the neighboring regression coefficient structure to create blocks of coefficients. Robust modifications are made to the mean to account for gross outliers in the data. This method is applied to near infrared spectral measurements in prediction of an aqueous analyte concentration and is shown to improve prediction accuracy. Expansion on the robust estimation and structure analysis is performed by examining graph structures within a clustered tensor. The tensor is subjected to wavelet smoothing and robust sparse precision matrix estimation for a detailed look into the covariance structure. This methodology is applied to catalytic kinetics data where the graph structure estimates the elementary steps within the reaction mechanism. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2018. / October 18, 2018. / Includes bibliographical references. / Yiyuan She, Professor Directing Dissertation; Albert Stiegman, University Representative; Qing Mai, Committee Member; Eric Chicken, Committee Member.
60

Estimation under censoring with missing failure indicators

Unknown Date (has links)
The Kaplan-Meier estimator of a survival function is well-known to be asymptotically efficient when cause of failure (censored or non-censored) is always observed. We consider the problem of finding an estimator when the failure indicators are missing completely at random. Under this assumption, it is known that the method of nonparametric maximum likelihood fails to work in this problem. We introduce a new estimator that is a smooth functional of the Nelson-Aalen estimators of certain cumulative transition intensities. The asymptotic distribution of the estimator is derived using the functional delta method. Simulation studies reveal that this estimator competes well with the existing estimators. The idea is extended to the Cox model, and estimators are introduced for the regression parameter and the cumulative baseline hazard function. / Source: Dissertation Abstracts International, Volume: 57-01, Section: B, page: 0441. / Major Professor: Ian W. McKeague. / Thesis (Ph.D.)--The Florida State University, 1995.

Page generated in 0.0784 seconds