• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 56
  • 8
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 66
  • 66
  • 20
  • 16
  • 11
  • 8
  • 8
  • 8
  • 7
  • 7
  • 5
  • 5
  • 5
  • 5
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Statistically and Computationally Efficient Resampling and Distributionally Robust Optimization with Applications

Liu, Zhenyuan January 2024 (has links)
Uncertainty quantification via construction of confidence regions has been long studied in statistics. While these existing methods are powerful and commonly used, some modern problems that require expensive model fitting, or those that elicit convoluted interactions between statistical and computational noises, could challenge the effectiveness of these methods. To remedy some of these challenges, this thesis proposes novel approaches that not only guarantee statistical validity but also are computationally efficient. We study two main methodological directions: resampling-based methods in the first half (Chapters 2 and 3) and optimization-based methods, in particular so-called distributionally robust optimization, in the second half (Chapters 4 to 6) of this thesis. The first half focuses on the bootstrap, a common approach for statistical inference. This approach resamples data and hinges on the principle of using the resampling distribution as an approximation to the sampling distribution. However, implementing the bootstrap often demands extensive resampling and model refitting effort to wash away the Monte Carlo error, which can be computationally expensive for modern problems. Chapters 2 and 3 study bootstrap approaches using fewer resamples while maintaining coverage validity, and also the quantification of uncertainty for models with both statistical and Monte Carlo computation errors. In Chapter 2, we investigate bootstrap-based construction of confidence intervals using minimal resampling. We use a “cheap” bootstrap perspective based on sample-resample independence that yields valid coverage with as small as one resample, even when the problem dimension grows closely with the data size. We validate our theoretical findings and assess our approach against other benchmarks through various large-scale or high-dimensional problems. In Chapter 3, we focus on the so-called input uncertainty problem in stochastic simulation, which refers to the propagation of the statistical noise in calibrating input models to impact output accuracy. Unlike most existing literature that focuses on real-valued output quantities, we aim at constructing confidence bands for the entire output distribution function that can contain more holistic information. We develop a new test statistic that generalizes the Kolmogorov-Smirnov statistic to construct confidence bands that account for input uncertainty on top of Monte Carlo errors via an additional asymptotic component formed by a mean-zero Gaussian process. We also demonstrate how subsampling can be used to efficiently estimate the covariance function of this Gaussian process in a computationally cheap fashion. The second part of the thesis is devoted to optimization-based methods, in particular distributionally robust optimization (DRO). Originally built to tackle the uncertainty of the underlying distribution in a stochastic optimization, DRO adopts a worst-case perspective and seeks decisions that optimize under the worst-case scenario, over the so-called ambiguity set that represents the distributional uncertainty. In this thesis, we turn DRO broadly into a statistical tool (still referred to as DRO) by optimizing targets of interest over the ambiguity set and transforming the coverage guarantee of the ambiguity set into confidence bounds for targets. The flexibility of ambiguity sets advantageously allows the injection of prior distribution knowledge that operates with less data requirement than existing methods. In Chapter 4, motivated by the bias-variance tradeoff and other technical complications in conventional multivariate extreme value theory, we propose a shape-constrained DRO called orthounimodality DRO (OU-DRO) as a vehicle to incorporate natural and verifiable information into the tail. We study the statistical guarantee, and tractability especially in the bivariate setting via a new Choquet representation in convex analysis. Chapter 5 further studies a general approach that applies to higher dimensions via sample average approximation (SAA) and importance sampling. We establish convergence guarantee of the SAA optimal value for OU-DRO in any dimension under regularity conditions. We also argue that the resulting SAA problem is a linear program that can be solved by off-the-shelf algorithms. In Chapter 6, we study the connection between the out-of-sample errors of data-driven stochastic optimization and DRO via large deviations theory. We propose a special type of DRO formulation which uses an ambiguity set based on a Kullback Leibler divergence smoothed by the Wasserstein or Levy-Prokhorov distance. We relate large deviations theory to the performance of the proposed DRO and show it achieves nearly optimal out-of-sample performance in terms of the exponential decay rate of the generalization error. Furthermore, the computation of the proposed DRO is not harder than DRO problems based on f-divergence or Wasserstein distances, which leads to a statistically optimal and computationally tractable DRO formulation.
62

Simulating Statistical Power Curves with the Bootstrap and Robust Estimation

Herrington, Richard S. 08 1900 (has links)
Power and effect size analysis are important methods in the psychological sciences. It is well known that classical statistical tests are not robust with respect to power and type II error. However, relatively little attention has been paid in the psychological literature to the effect that non-normality and outliers have on the power of a given statistical test (Wilcox, 1998). Robust measures of location exist that provide much more powerful tests of statistical hypotheses, but their usefulness in power estimation for sample size selection, with real data, is largely unknown. Furthermore, practical approaches to power planning (Cohen, 1988) usually focus on normal theory settings and in general do not make available nonparametric approaches to power and effect size estimation. Beran (1986) proved that it is possible to nonparametrically estimate power for a given statistical test using bootstrap methods (Efron, 1993). However, this method is not widely known or utilized in data analysis settings. This research study examined the practical importance of combining robust measures of location with nonparametric power analysis. Simulation and analysis of real world data sets are used. The present study found that: 1) bootstrap confidence intervals using Mestimators gave shorter confidence intervals than the normal theory counterpart whenever the data had heavy tailed distributions; 2) bootstrap empirical power is higher for Mestimators than the normal theory counterpart when the data had heavy tailed distributions; 3) the smoothed bootstrap controls type I error rate (less than 6%) under the null hypothesis for small sample sizes; and 4) Robust effect sizes can be used in conjuction with Cohen's (1988) power tables to get more realistic sample sizes given that the data distribution has heavy tails.
63

Bootstrap distribution for testing a change in the cox proportional hazard model.

January 2000 (has links)
Lam Yuk Fai. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 41-43). / Abstracts in English and Chinese. / Chapter 1 --- Basic Concepts --- p.9 / Chapter 1.1 --- Survival data --- p.9 / Chapter 1.1.1 --- An example --- p.9 / Chapter 1.2 --- Some important functions --- p.11 / Chapter 1.2.1 --- Survival function --- p.12 / Chapter 1.2.2 --- Hazard function --- p.12 / Chapter 1.3 --- Cox Proportional Hazards Model --- p.13 / Chapter 1.3.1 --- A special case --- p.14 / Chapter 1.3.2 --- An example (continued) --- p.15 / Chapter 1.4 --- Extension of the Cox Proportional Hazards Model --- p.16 / Chapter 1.5 --- Bootstrap --- p.17 / Chapter 2 --- A New Method --- p.19 / Chapter 2.1 --- Introduction --- p.19 / Chapter 2.2 --- Definition of the test --- p.20 / Chapter 2.2.1 --- Our test statistic --- p.20 / Chapter 2.2.2 --- The alternative test statistic I --- p.22 / Chapter 2.2.3 --- The alternative test statistic II --- p.23 / Chapter 2.3 --- Variations of the test --- p.24 / Chapter 2.3.1 --- Restricted test --- p.24 / Chapter 2.3.2 --- Adjusting for other covariates --- p.26 / Chapter 2.4 --- Apply with bootstrap --- p.28 / Chapter 2.5 --- Examples --- p.29 / Chapter 2.5.1 --- Male mice data --- p.34 / Chapter 2.5.2 --- Stanford heart transplant data --- p.34 / Chapter 2.5.3 --- CGD data --- p.34 / Chapter 3 --- Large Sample Properties and Discussions --- p.35 / Chapter 3.1 --- Large sample properties and relationship to goodness of fit test --- p.35 / Chapter 3.1.1 --- Large sample properties of A and Ap --- p.35 / Chapter 3.1.2 --- Large sample properties of Ac and A --- p.36 / Chapter 3.2 --- Discussions --- p.37
64

Empirical likelihood and extremes

Gong, Yun 17 January 2012 (has links)
In 1988, Owen introduced empirical likelihood as a nonparametric method for constructing confidence intervals and regions. Since then, empirical likelihood has been studied extensively in the literature due to its generality and effectiveness. It is well known that empirical likelihood has several attractive advantages comparing to its competitors such as bootstrap: determining the shape of confidence regions automatically using only the data; straightforwardly incorporating side information expressed through constraints; being Bartlett correctable. The main part of this thesis extends the empirical likelihood method to several interesting and important statistical inference situations. This thesis has four components. The first component (Chapter II) proposes a smoothed jackknife empirical likelihood method to construct confidence intervals for the receiver operating characteristic (ROC) curve in order to overcome the computational difficulty when we have nonlinear constrains in the maximization problem. The second component (Chapter III and IV) proposes smoothed empirical likelihood methods to obtain interval estimation for the conditional Value-at-Risk with the volatility model being an ARCH/GARCH model and a nonparametric regression respectively, which have applications in financial risk management. The third component(Chapter V) derives the empirical likelihood for the intermediate quantiles, which plays an important role in the statistics of extremes. Finally, the fourth component (Chapter VI and VII) presents two additional results: in Chapter VI, we present an interesting result by showing that, when the third moment is infinity, we may prefer the Student's t-statistic to the sample mean standardized by the true standard deviation; in Chapter VII, we present a method for testing a subset of parameters for a given parametric model of stationary processes.
65

Influences of climate variability and change on precipitation characteristics and extremes

Unknown Date (has links)
This study focuses on two main broad areas of active research on climate: climate variability and climate change and their implications on regional precipitation characteristics. All the analysis is carried out for a climate change-sensitive region, the state of Florida, USA. The focus of the climate variability analysis is to evaluate the influence of individual and coupled phases (cool and warm) of Atlantic multidecadal oscillation (AMO) and El Niäno southern oscillation (ENSO) on regional precipitation characteristics. The two oscillations in cool and warm phases modulate each other which have implications on flood control and water supply in the region. Extreme precipitation indices, temporal distribution of rainfall within extreme storm events, dry and wet spell transitions and antecedent conditions preceding extremes are evaluated. Kernel density estimates using Gaussian kernel for distribution-free comparative analysis and bootstrap sampling-based confidence intervals are used to compare warm and cool phases of different lengths. Depth-duration-frequency (DDF) curves are also developed using generalized extreme value (GEV) distributions characterizing the extremes. ... This study also introduces new approaches to optimally select the predictor variables which help in modeling regional precipitation and further provides a mechanism to select an optimum spatial resolution to downscale the precipitation projections. New methods for correcting the biases in monthly downscaled precipitation projections are proposed, developed and evaluated in this study. The methods include bias corrections in an optimization framework using various objective functions, hybrid methods based on universal function approximation and new variants. / by Aneesh Goly. / Thesis (Ph.D.)--Florida Atlantic University, 2013. / Includes bibliography. / Mode of access: World Wide Web. / System requirements: Adobe Reader.
66

Multiscale and meta-analytic approaches to inference in clinical healthcare data

Hamilton, Erin Kinzel 29 March 2013 (has links)
The field of medicine is regularly faced with the challenge of utilizing information that is complicated or difficult to characterize. Physicians often must use their best judgment in reaching decisions or recommendations for treatment in the clinical setting. The goal of this thesis is to use innovative statistical tools in tackling three specific challenges of this nature from current healthcare applications. The first aim focuses on developing a novel approach to meta-analysis when combining binary data from multiple studies of paired design, particularly in cases of high heterogeneity between studies. The challenge is in properly accounting for heterogeneity when dealing with a low or moderate number of studies, and with a rarely occurring outcome. The proposed approach uses a Rasch model for translating data from multiple paired studies into a unified structure that allows for properly handling variability associated with both pair effects and study effects. Analysis is then performed using a Bayesian hierarchical structure, which accounts for heterogeneity in a direct way within the variances of the separate generating distributions for each model parameter. This approach is applied to the debated topic within the dental community of the comparative effectiveness of materials used for pit-and-fissure sealants. The second and third aims of this research both have applications in early detection of breast cancer. The interpretation of a mammogram is often difficult since signs of early disease are often minuscule, and the appearance of even normal tissue can be highly variable and complex. Physicians often have to consider many important pieces of the whole picture when trying to assess next steps. The final two aims focus on improving the interpretation of findings in mammograms to aid in early cancer detection. When dealing with high frequency and irregular data, as is seen in most medical images, the behaviors of these complex structures are often difficult or impossible to quantify by standard modeling techniques. But a commonly occurring phenomenon in high-frequency data is that of regular scaling. The second aim in this thesis is to develop and evaluate a wavelet-based scaling estimator that reduces the information in a mammogram down to an informative and low-dimensional quantification of the innate scaling behavior, optimized for use in classifying the tissue as cancerous or non-cancerous. The specific demands for this estimator are that it be robust with respect to distributional assumptions on the data, and with respect to outlier levels in the frequency domain representation of the data. The final aim in this research focuses on enhancing the visualization of microcalcifications that are too small to capture well on screening mammograms. Using scale-mixing discrete wavelet transform methods, the existing detail information contained in a very small and course image will be used to impute scaled details at finer levels. These "informed" finer details will then be used to produce an image of much higher resolution than the original, improving the visualization of the object. The goal is to also produce a confidence area for the true location of the shape's borders, allowing for more accurate feature assessment. Through the more accurate assessment of these very small shapes, physicians may be more confident in deciding next steps.

Page generated in 0.097 seconds