Global ETD Search

1	Multiple significance tests and their relation to P-values Li, Xiao Bo (Alice) 10 September 2008 This thesis is about multiple hypothesis testing and its relation to the P-value. In Chapter 1, the methodologies of hypothesis testing among the three inference schools are reviewed. Jeffreys, Fisher, and Neyman advocated three different approaches for testing by using the posterior probabilities, P-value, and Type I error and Type II error probabilities respectively. In Berger's words ``Each was quite critical of the other approaches." Berger proposed a potential methodological unified conditional frequentist approach for testing. His idea is to follow Fisher in using the P-value to define the strength of evidence in data and to follow Fisher's method of conditioning on strength of evidence; then follow Neyman by computing Type I and Type II error probabilities conditioning on strength of evidence in the data, which equal the objective posterior probabilities of the hypothesis advocated by Jeffreys. Bickis proposed another estimate on calibrating the null and alternative components of the distribution by modeling the set of P-values as a sample from a mixed population composed of a uniform distribution for the null cases and an unknown distribution for the alternatives. For tackling multiplicity, exploiting the empirical distribution of P-values is applied. A variety of density estimators for calibrating posterior probabilities of the null hypothesis given P-values are implemented. Finally, a noninterpolatory and shape-preserving estimator based on B-splines as smoothing functions is proposed and implemented. multiple hypothesis tesing
2	Multiple significance tests and their relation to P-values Li, Xiao Bo (Alice) 10 September 2008 (has links) This thesis is about multiple hypothesis testing and its relation to the P-value. In Chapter 1, the methodologies of hypothesis testing among the three inference schools are reviewed. Jeffreys, Fisher, and Neyman advocated three different approaches for testing by using the posterior probabilities, P-value, and Type I error and Type II error probabilities respectively. In Berger's words ``Each was quite critical of the other approaches." Berger proposed a potential methodological unified conditional frequentist approach for testing. His idea is to follow Fisher in using the P-value to define the strength of evidence in data and to follow Fisher's method of conditioning on strength of evidence; then follow Neyman by computing Type I and Type II error probabilities conditioning on strength of evidence in the data, which equal the objective posterior probabilities of the hypothesis advocated by Jeffreys. Bickis proposed another estimate on calibrating the null and alternative components of the distribution by modeling the set of P-values as a sample from a mixed population composed of a uniform distribution for the null cases and an unknown distribution for the alternatives. For tackling multiplicity, exploiting the empirical distribution of P-values is applied. A variety of density estimators for calibrating posterior probabilities of the null hypothesis given P-values are implemented. Finally, a noninterpolatory and shape-preserving estimator based on B-splines as smoothing functions is proposed and implemented. multiple hypothesis tesing
3	Sequential Procedures for the "Selection" Problems in Discrete Simulation Optimization Wenyu Wang (7491243) 17 October 2019 (has links) <div>The simulation optimization problems refer to the nonlinear optimization problems whose objective function can be evaluated through stochastic simulations. We study two significant discrete simulation optimization problems in this thesis: Ranking and Selection (R&S) and Factor Screening (FS). Both R&S and FS are the "selection" problems defined upon a finite set of candidate systems or factors. They vary mainly in their objectives: the R&S problems is to find the "best" system(s) among all alternatives; whereas the FS is to select important factors that are critical to the stochastic systems. </div><div><br></div><div>In this thesis, we develop efficient sequential procedures for these two problems. For the R&S problem, we propose fully-sequential procedures for selecting the "best" systems with a guaranteed probability of correct selection (PCS). The main features of the stated methods are: (1) a Bonferroni-free model, these procedures overcome the conservativeness of the Bonferroni correction and deliver the exact probabilistic guarantee without overshooting; (2) asymptotic optimality, these procedures achieve the lower bound of average sample size asymptotically; (3) an indifference-zone-flexible formulation, these procedures bridge the gap between the indifference-zone formulation and the indifference-zone-free formulation so that the indifference-zone parameter is not indispensable but could be helpful if provided. We establish the validity and asymptotic efficiency for the proposed procedure and conduct numerical studies to investigates the performance under multiple configurations.</div><div><br></div><div>We also consider the multi-objective R&S (MOR&S) problem. To the best of our knowledge, the procedure proposed is the first frequentist approach for MOR&S. These procedures identify the Pareto front with a guaranteed probability of correct selection (PCS). In particular, these procedures are fully sequential using the test statistics built upon the Generalized Sequential Probability Ratio Test (GSPRT). The main features are: 1) an objective-dimension-free model, the performance of these procedures do not deteriorate as the number of objectives increases, and achieve the same efficiency as KN family procedures for single-objective ranking and selection problem; 2) an indifference-zone-flexible formulation, the new methods eliminate the necessity of indifference-zone parameter while makes use of the indifference-zone information if provided. A numerical evaluation demonstrates the validity efficiency of the new procedure.</div><div><br></div><div>For the FS problem, our objective is to identify important factors for simulation experiments with controlled Family-Wise Error Rate. We assume a Multi-Objective first-order linear model where the responses follow a multivariate normal distribution. We offer three fully-sequential procedures: Sum Intersection Procedure (SUMIP), Sort Intersection Procedure (SORTIP), and Mixed Intersection procedure (MIP). SUMIP uses the Bonferroni correction to adjust for multiple comparisons; SORTIP uses the Holms procedure to overcome the conservative of the Bonferroni method, and MIP combines both SUMIP and SORTIP to work efficiently in the parallel computing environment. Numerical studies are provided to demonstrate the validity and efficiency, and a case study is presented.</div> Operations Research ranking and selection multiple hypothesis testing sequential testing
4	Bayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene Expression Dhavala, Soma Sekhar 2010 December 1900 (has links) We are concerned with testing for differential expression and consider three different aspects of such testing procedures. First, we develop an exact ANOVA type model for discrete gene expression data, produced by technologies such as a Massively Parallel Signature Sequencing (MPSS), Serial Analysis of Gene Expression (SAGE) or other next generation sequencing technologies. We adopt two Bayesian hierarchical models—one parametric and the other semiparametric with a Dirichlet process prior that has the ability to borrow strength across related signatures, where a signature is a specific arrangement of the nucleotides. We utilize the discreteness of the Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using non-parametric approaches, while controlling the false discovery rate. Next, we consider ways to combine expression data from different studies, possibly produced by different technologies resulting in mixed type responses, such as Microarrays and MPSS. Depending on the technology, the expression data can be continuous or discrete and can have different technology dependent noise characteristics. Adding to the difficulty, genes can have an arbitrary correlation structure both within and across studies. Performing several hypothesis tests for differential expression could also lead to false discoveries. We propose to address all the above challenges using a Hierarchical Dirichlet process with a spike-and-slab base prior on the random effects, while smoothing splines model the unknown link functions that map different technology dependent manifestations to latent processes upon which inference is based. Finally, we propose an algorithm for controlling different error measures in a Bayesian multiple testing under generic loss functions, including the widely used uniform loss function. We do not make any specific assumptions about the underlying probability model but require that indicator variables for the individual hypotheses are available as a component of the inference. Given this information, we recast multiple hypothesis testing as a combinatorial optimization problem and in particular, the 0-1 knapsack problem which can be solved efficiently using a variety of algorithms, both approximate and exact in nature. Bayesian Models Generalized linear models Semiparametric models Dirichlet process Meta-analysis Multiple hypothesis testing Bioinformatics
5	The performance of multiple hypothesis testing procedures in the presence of dependence Clarke, Sandra Jane January 2010 (has links) Hypothesis testing is foundational to the discipline of statistics. Procedures exist which control for individual Type I error rates and more global or family-wise error rates for a series of hypothesis tests. However, the ability of scientists to produce very large data sets with increasing ease has led to a rapid rise in the number of statistical tests performed, often with small sample sizes. This is seen particularly in the area of biotechnology and the analysis of microarray data. This thesis considers this high-dimensional context with particular focus on the effects of dependence on existing multiple hypothesis testing procedures. / While dependence is often ignored, there are many existing techniques employed currently to deal with this context but these are typically highly conservative or require difficult estimation of large correlation matrices. This thesis demonstrates that, in this high-dimensional context when the distribution of the test statistics is light-tailed, dependence is not as much of a concern as in the classical contexts. This is achieved with the use of a moving average model. One important implication of this is that, when this is satisfied, procedures designed for independent test statistics can be used confidently on dependent test statistics. / This is not the case however for heavy-tailed distributions, where we expect an asymptotic Poisson cluster process of false discoveries. In these cases, we estimate the parameters of this process along with the tail-weight from the observed exceedences and attempt to adjust procedures. We consider both conservative error rates such as the family-wise error rate and more popular methods such as the false discovery rate. We are able to demonstrate that, in the context of DNA microarrays, it is rare to find heavy-tailed distributions because most test statistics are averages.
6	Detection and Tracking of Human Targets using Ultra-Wideband Radar Östman, Andreas January 2016 (has links) The purpose of this thesis was to assess the plausibility of using two Ultra- Wideband radars for detecting and tracking human targets. The detection has been performed by two different types of methods, constant false-alarm rate methods and a type of CLEAN algorithm. For tracking the targets, multiple hypothesis tracking has been studied. Particle filtering has been used for the state prediction, considering a significant amount of uncertainty in a motion model used in this thesis project. The detection and tracking methods have been implemented in MATLAB. Tracking in the cases of a single target and multiple targets has been investigated in simulation and experiment. The simulation results in these cases were compared with accurate ground truth data obtained using a VICON optical tracking system. The detection methods showed poor performance when using data that had been collected by the two radars and post-processed to enhance target features. For single targets, the detections were accurate enough to continuously track a target moving randomly in a controlled area. In the multiple target cases the tracker was not able to distinguish the multiple moving subjects. Ultra-Wideband Radar Constant false-alarm rate Multiple Hypothesis Tracking Tracking Detection
7	Sdílení investičních nápadu: Rola štěstí a dovednosti / Sharing investment ideas: Role of luck and skill Turlík, Tomáš January 2021 (has links) i Abstract In the environment of a large group of analysts who are willing to share their investment ideas publicly, it is a challenging task to find the ones who have a great skill and whose recommendations generate abnormal returns. We explore one such famous group, Value Investors Club, consisting of 1223 analysts be- tween the years 2000 and 2019. We separate the analysts into multiple groups, each representing their inherent abilities. The commonly used method of single hypothesis testing cannot be used as we test many analysts at once, and the multiple hypothesis testing methods need to be employed. Using these meth- ods, we are able to detect the subgroup of analysts who have abnormal returns from the Fama-French 4 factor portfolio. However, different methods lead to different groups of analysts deemed to be skilled. An overall portfolio consist- ing of all analysts generates large abnormal returns, which diminish with the increases in the holding period. Furthermore, analyses from analysts estimated to be skilled are used to form portfolios. We find that there are methods that have significantly larger abnormal returns compared to the overall portfolio; however, the methods are not consistent at producing such portfolios. Keywords multiple hypothesis testing, luck and skill, in- vestment ideas Title...
8	Sensitivity to Distributional Assumptions in Estimation of the ODP Thresholding Function Bunn, Wendy Jill 06 July 2007 (has links) (PDF) Recent technological advances in fields like medicine and genomics have produced high-dimensional data sets and a challenge to correctly interpret experimental results. The Optimal Discovery Procedure (ODP) (Storey 2005) builds on the framework of Neyman-Pearson hypothesis testing to optimally test thousands of hypotheses simultaneously. The method relies on the assumption of normally distributed data; however, many applications of this method will violate this assumption. This thesis investigates the sensitivity of this method to detection of significant but nonnormal data. Overall, estimation of the ODP with the method described in this thesis is satisfactory, except when the nonnormal alternative distribution has high variance and expectation only one standard deviation away from the null distribution. estimation gene expression multiple hypothesis testing multiple comparisons nonnormal optimal discovery procedure statistics Statistics and Probability
9	Population SAMC, ChIP-chip Data Analysis and Beyond Wu, Mingqi 2010 December 1900 (has links) This dissertation research consists of two topics, population stochastics approximation Monte Carlo (Pop-SAMC) for Baysian model selection problems and ChIP-chip data analysis. The following two paragraphs give a brief introduction to each of the two topics, respectively. Although the reversible jump MCMC (RJMCMC) has the ability to traverse the space of possible models in Bayesian model selection problems, it is prone to becoming trapped into local mode, when the model space is complex. SAMC, proposed by Liang, Liu and Carroll, essentially overcomes the difficulty in dimension-jumping moves, by introducing a self-adjusting mechanism. However, this learning mechanism has not yet reached its maximum efficiency. In this dissertation, we propose a Pop-SAMC algorithm; it works on population chains of SAMC, which can provide a more efficient self-adjusting mechanism and make use of crossover operator from genetic algorithms to further increase its efficiency. Under mild conditions, the convergence of this algorithm is proved. The effectiveness of Pop-SAMC in Bayesian model selection problems is examined through a change-point identification example and a large-p linear regression variable selection example. The numerical results indicate that Pop- SAMC outperforms both the single chain SAMC and RJMCMC significantly. In the ChIP-chip data analysis study, we developed two methodologies to identify the transcription factor binding sites: Bayesian latent model and population-based test. The former models the neighboring dependence of probes by introducing a latent indicator vector; The later provides a nonparametric method for evaluation of test scores in a multiple hypothesis test by making use of population information of samples. Both methods are applied to real and simulated datasets. The numerical results indicate the Bayesian latent model can outperform the existing methods, especially when the data contain outliers, and the use of population information can significantly improve the power of multiple hypothesis tests. Markov Chain Monte Carlo Stochastic Approximation Metropolis-Hastings Algorithm Bayesian Model Selection ChIP-chip Latent Variable Multiple Hypothesis Test
10	Multiple Hypothesis Tracking For Multiple Visual Targets Turker, Burcu 01 April 2010 (has links) (PDF) Visual target tracking problem consists of two topics: Obtaining targets from camera measurements and target tracking. Even though it has been studied for more than 30 years, there are still some problems not completely solved. Especially in the case of multiple targets, association of measurements to targets, creation of new targets and deletion of old ones are among those. What is more, it is very important to deal with the occlusion and crossing targets problems suitably. We believe that a slightly modified version of multiple hypothesis tracking can successfully deal with most of the aforementioned problems with sufficient success. Distance, track size, track color, gate size and track history are used as parameters to evaluate the hypotheses generated for measurement to track association problem whereas size and color are used as parameters for occlusion problem. The overall tracker has been fine tuned over some scenarios and it has been observed that it performs well over the testing scenarios as well. Furthermore the performance of the tracker is analyzed according to those parameters in both association and occlusion handling situations.

Search results