Global ETD Search

1	Statistical comparisons for nonlinear curves and surfaces Zhao, Shi 31 May 2018 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Estimation of nonlinear curves and surfaces has long been the focus of semiparametric and nonparametric regression. The advances in related model fitting methodology have greatly enhanced the analyst’s modeling flexibility and have led to scientific discoveries that would be otherwise missed by the traditional linear model analysis. What has been less forthcoming are the testing methods concerning nonlinear functions, particularly for comparisons of curves and surfaces. Few of the existing methods are carefully disseminated, and most of these methods are subject to important limitations. In the implementation, few off-the-shelf computational tools have been developed with syntax similar to the commonly used model fitting packages, and thus are less accessible to practical data analysts. In this dissertation, I reviewed and tested the existing methods for nonlinear function comparison, examined their operational characteristics. Some theoretical justifications were provided for the new testing procedures. Real data exampleswere included illustrating the use of the newly developed software. A new R package and a more user-friendly interface were created for enhanced accessibility. / 2020-08-22 Comparison of nonlinear functions Resampling method Software development
2	A New Method Of Resampling Testing Nonparametric Hypotheses: Balanced Randomization Tests January 2014 (has links) Background: Resampling methods such as the Monte Carlo (MC) and Bootstrap Approach (BA) are very flexible tools for statistical inference. They are used in general in experiments with small sample size or where the parametric test assumptions are not met. They are also used in situations where expressions for properties of complex estimators are statistically intractable. However, the MC and BA methods require relatively large random samples to estimate the parameters of the full permutation (FP) or exact distribution. Objective: The objective of this research study was to develop an efficient statistical computational resampling method that compares two population parameters, using a balanced and controlled sampling design. The application of the new method, the balanced randomization (BR) method, is discussed using microarray data where sample sizes are generally small. Methods: Multiple datasets were simulated from real data to compare the accuracy and efficiency of the methods (BR, MC, and BA). Datasets, probability distributions, parameters, and sample sizes were varied in the simulation. The correlation between the exact p-value and the p-values generated by simulation provide a measure of accuracy/consistency to compare methods. Sensitivity, specificity, power function, false negative and positive rates using graphical and multivariate analyses were used to compare methods. Results and Discussions: The correlation between the exact p-value and those estimated from simulation are higher for BR and MC, (increasing somewhat with increasing sample size), much less for BA, and most pronounced for skewed distributions (lognormal, exponential). Furthermore, the relative proportion of 95%/99% CI containing the true p-value for BR vs. MC=3%/1.3% (p<0.0001) and BR vs. BA=20%/15% (p<0.0001). The sensitivity, specificity and power function of the BR method were shown to have a slight advantage compared to those of MC and BA in most situations. As an example, the BR method was applied to a microarray study to discuss significantly differentially expressed genes. / acase@tulane.edu Resampling Method Balanced Sampling Incomplet Blocks Designs Biostatistics Ph.D
3	Expeditious Causal Inference for Big Observational Data Yumin Zhang (13163253) 28 July 2022 (has links) <p>This dissertation address two significant challenges in the causal inference workflow for Big Observational Data. The first is designing Big Observational Data with high-dimensional and heterogeneous covariates. The second is performing uncertainty quantification for estimates of causal estimands that are obtained from the application of black box machine learning algorithms on the designed Big Observational Data. The methodologies developed by addressing these challenges are applied for the design and analysis of Big Observational Data from a large public university in the United States. </p> <h4>Distributed Design</h4> <p>A fundamental issue in causal inference for Big Observational Data is confounding due to covariate imbalances between treatment groups. This can be addressed by designing the study prior to analysis. The design ensures that subjects in the different treatment groups that have comparable covariates are subclassified or matched together. Analyzing such a designed study helps to reduce biases arising from the confounding of covariates with treatment. Existing design methods, developed for traditional observational studies consisting of a single designer, can yield unsatisfactory designs with sub-optimum covariate balance for Big Observational Data due to their inability to accommodate the massive dimensionality, heterogeneity, and volume of the Big Data. We propose a new framework for the distributed design of Big Observational Data amongst collaborative designers. Our framework first assigns subsets of the high-dimensional and heterogeneous covariates to multiple designers. The designers then summarize their covariates into lower-dimensional quantities, share their summaries with the others, and design the study in parallel based on their assigned covariates and the summaries they receive. The final design is selected by comparing balance measures for all covariates across the candidates and identifying the best amongst the candidates. We perform simulation studies and analyze datasets from the 2016 Atlantic Causal Inference Conference Data Challenge to demonstrate the flexibility and power of our framework for constructing designs with good covariate balance from Big Observational Data.</p> <h4>Designed Bootstrap</h4> <p>The combination of modern machine learning algorithms with the nonparametric bootstrap can enable effective predictions and inferences on Big Observational Data. An increasingly prominent and critical objective in such analyses is to draw causal inferences from the Big Observational Data. A fundamental step in addressing this objective is to design the observational study prior to the application of machine learning algorithms. However, the application of the traditional nonparametric bootstrap on Big Observational Data requires excessive computational efforts. This is because every bootstrap sample would need to be re-designed under the traditional approach, which can be prohibitive in practice. We propose a design-based bootstrap for deriving causal inferences with reduced bias from the application of machine learning algorithms on Big Observational Data. Our bootstrap procedure operates by resampling from the original designed observational study. It eliminates the need for additional, costly design steps on each bootstrap sample that are performed under the standard nonparametric bootstrap. We demonstrate the computational efficiency of this procedure compared to the traditional nonparametric bootstrap, and its equivalency in terms of confidence interval coverage rates for the average treatment effects, by means of simulation studies and a real-life case study.</p> <h4>Case Study</h4> <p>We apply the distributed design and designed bootstrap methodologies in a case study involving institutional data from a large public university in the United States. The institutional data contains comprehensive information about the undergraduate students in the university, ranging from their academic records to on-campus activities. We study the causal effects of undergraduate students’ attempted course load on their academic performance based on a selection of covariates from these data. Ultimately, our real-life case study demonstrates how our methodologies enable researchers to effectively use straightforward design procedures to obtain valid causal inferences with reduced computational efforts from the application of machine learning algorithms on Big Observational Data.</p> <p><br></p> Econometric and statistical methods Applied statistics Statistical data science Statistics not elsewhere classified Causal inference Design of observational studies Propensity score method Bootstrap resampling method Causal machine learning Institutional research Big data
4	Statistical Design of Sequential Decision Making Algorithms Chi-hua Wang (12469251) 27 April 2022 (has links) <p>Sequential decision-making is a fundamental class of problem that motivates algorithm designs of online machine learning and reinforcement learning. Arguably, the resulting online algorithms have supported modern online service industries for their data-driven real-time automated decision making. The applications span across different industries, including dynamic pricing (Marketing), recommendation (Advertising), and dosage finding (Clinical Trial). In this dissertation, we contribute fundamental statistical design advances for sequential decision-making algorithms, leaping progress in theory and application of online learning and sequential decision making under uncertainty including online sparse learning, finite-armed bandits, and high-dimensional online decision making. Our work locates at the intersection of decision-making algorithm designs, online statistical machine learning, and operations research, contributing new algorithms, theory, and insights to diverse fields including optimization, statistics, and machine learning.</p> <p><br></p> <p>In part I, we contribute a theoretical framework of continuous risk monitoring for regularized online statistical learning. Such theoretical framework is desirable for modern online service industries on monitoring deployed model's performance of online machine learning task. In the first project (Chapter 1), we develop continuous risk monitoring for the online Lasso procedure and provide an always-valid algorithm for high-dimensional dynamic pricing problems. In the second project (Chapter 2), we develop continuous risk monitoring for online matrix regression and provide new algorithms for rank-constrained online matrix completion problems. Such theoretical advances are due to our elegant interplay between non-asymptotic martingale concentration theory and regularized online statistical machine learning.</p> <p><br></p> <p>In part II, we contribute a bootstrap-based methodology for finite-armed bandit problems, termed Residual Bootstrap exploration. Such a method opens a possibility to design model-agnostic bandit algorithms without problem-adaptive optimism-engineering and instance-specific prior-tuning. In the first project (Chapter 3), we develop residual bootstrap exploration for multi-armed bandit algorithms and shows its easy generalizability to bandit problems with complex or ambiguous reward structure. In the second project (Chapter 4), we develop a theoretical framework for residual bootstrap exploration in linear bandit with fixed action set. Such methodology advances are due to our development of non-asymptotic theory for the bootstrap procedure.</p> <p><br></p> <p>In part III, we contribute application-driven insights on the exploration-exploitation dilemma for high-dimensional online decision-making problems. Such insights help practitioners to implement effective high-dimensional statistics methods to solve online decisionmaking problems. In the first project (Chapter 5), we develop a bandit sampling scheme for online batch high-dimensional decision making, a practical scenario in interactive marketing, and sequential clinical trials. In the second project (Chapter 6), we develop a bandit sampling scheme for federated online high-dimensional decision-making to maintain data decentralization and perform collaborated decisions. These new insights are due to our new bandit sampling design to address application-driven exploration-exploitation trade-offs effectively. </p> Statistics Decision Making Sequential decision making LASSO regression models Bandit Algorithms high-dimensional statistics bootstrap resampling method exploration-exploitation trade-off Dynamic pricing multi armed bandit Contextual bandits regularization method Online Machine Learning statistical learning methods martingale

1

Page generated in 0.0566 seconds