1 |
Small sample multiple testing with application to cDNA microarray dataHintze, Eric Poole 30 October 2006 (has links)
Many tests have been developed for comparing means in a two-sample scenario.
Microarray experiments lead to thousands of such comparisons in a single study. Several
multiple testing procedures are available to control experiment-wise error or the false
discovery rate. In this dissertation, individual two-sample tests are compared based on
accuracy, correctness, and power. Four multiple testing procedures are compared via
simulation, based on data from the lab of Dr. Rajesh Miranda. The effect of sample size
on power is also carefully examined. The two sample t-test followed by the Benjamini
and Hochberg (1995) false discovery rate controlling procedure result in the highest
power.
|
2 |
Familywise Robustness Criteria Revisited for Newer Multiple Testing ProceduresMiller, Charles W. January 2009 (has links)
As the availability of large datasets becomes more prevalent, so does the need to discover significant findings among a large collection of hypotheses. Multiple testing procedures (MTP) are used to control the familywise error rate (FWER) or the chance to commit at least one type I error when performing multiple hypotheses testing. When controlling the FWER, the power of a MTP to detect significant differences decreases as the number of hypotheses increases. It would be ideal to discover the same false null hypotheses despite the family of hypotheses chosen to be tested. Holland and Cheung (2002) developed measures called familywise robustness criteria (FWR) to study the effect of family size on the acceptance and rejection of a hypothesis. Their analysis focused on procedures that controlled FWER and false discovery rate (FDR). Newer MTPs have since been developed which control the generalized FWER (gFWER (k) or k-FWER) and false discovery proportion (FDP) or tail probabilities for the proportion of false positives (TPPFP). This dissertation reviews these newer procedures and then discusses the effect of family size using the FWRs of Holland and Cheung. In the case where the test statistics are independent and the null hypotheses are all true, the Type R enlargement familywise robustness measure can be expressed as a ratio of the expected number of Type I errors. In simulations, positive dependence among the test statistics was introduced, the expected number of Type I errors and the Type R enlargement FWR increased for step-up procedures with higher levels of correlation, but not for step-down or single-step procedures. / Statistics
|
3 |
A Lego System for Conditional InferenceHothorn, Torsten, Hornik, Kurt, Wiel, Mark A. van de, Zeileis, Achim January 2005 (has links) (PDF)
Conditioning on the observed data is an important and flexible design principle for statistical test procedures. Although generally applicable, permutation tests currently in use are limited to the treatment of special cases, such as contingency tables or K-sample problems. A new theoretical framework for permutation tests opens up the way to a unified and generalized view. We argue that the transfer of such a theory to practical data analysis has important implications in many applications and requires tools that enable the data analyst to compute on the theoretical concepts as closely as possible. We re-analyze four data sets by adapting the general conceptual framework to these non-standard inference procedures and utilizing the coin add-on package in the R system for statistical computing to show what one can gain from going beyond the `classical' test procedures. / Series: Research Report Series / Department of Statistics and Mathematics
|
4 |
Statistical methods to account for different sources of bias in Genome-Wide association studies / Méthodes statistiques pour la prise en compte de différentes sources de biais dans les études d'association à grande échelleBouaziz, Matthieu 22 November 2012 (has links)
Les études d'association à grande échelle sont devenus un outil très performant pour détecter les variants génétiques associés aux maladies. Ce manuscrit de doctorat s'intéresse à plusieurs des aspects clés des nouvelles problématiques informatiques et statistiques qui ont émergé grâce à de telles recherches. Les résultats des études d'association à grande échelle sont critiqués, en partie, à cause du biais induit par la stratification des populations. Nous proposons une étude de comparaison des stratégies qui existent pour prendre en compte ce problème. Leurs avantages et limites sont discutés en s'appuyant sur divers scénarios de structure des populations dans le but de proposer des conseils et indications pratiques. Nous nous intéressons ensuite à l'interférence de la structure des populations dans la recherche génétique. Nous avons développé au cours de cette thèse un nouvel algorithme appelé SHIPS (Spectral Hierarchical clustering for the Inference of Population Structure). Cet algorithme a été appliqué à un ensemble de jeux de données simulés et réels, ainsi que de nombreux autres algorithmes utilisés en pratique à titre de comparaison. Enfin, la question du test multiple dans ces études d'association est abordée à plusieurs niveaux. Nous proposons une présentation générale des méthodes de tests multiples et discutons leur validité pour différents designs d'études. Nous nous concertons ensuite sur l'obtention de résultats interprétables aux niveaux de gènes, ce qui correspond à une problématique de tests multiples avec des tests dépendants. Nous discutons et analysons les différentes approches dédiées à cette fin. / Genome-Wide association studies have become powerful tools to detect genetic variants associated with diseases. This PhD thesis focuses on several key aspects of the new computational and methodological problematics that have arisen with such research. The results of Genome-Wide association studies have been questioned, in part because of the bias induced by population stratification. Many stratégies are available to account for population stratification scenarios are highlighted in order to propose pratical guidelines to account for population stratification. We then focus on the inference of population structure that has many applications for genetic research. We have developed and present in this manuscript a new clustering algoritm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS). This algorithm in the field to propose a comparison of their performances. Finally, the issue of multiple-testing in Genome-Wide association studies is discussed on several levels. We propose a review of the multiple-testing corrections and discuss their validity for different study settings. We then focus on deriving gene-wise interpretation of the findings that corresponds to multiple-stategy to obtain valid gene-disease association measures.
|
5 |
Robust Computational Tools for Multiple Testing with Genetic Association StudiesWelbourn, William L., Jr. 01 May 2012 (has links)
Resolving the interplay of the genetic components of a complex disease is a challenging endeavor. Over the past several years, genome-wide association studies (GWAS) have emerged as a popular approach at locating common genetic variation within the human genome associated with disease risk. Assessing genetic-phenotype associations upon hundreds of thousands of genetic markers using the GWAS approach, introduces the potentially high number of false positive signals and requires statistical correction for multiple hypothesis testing. Permutation tests are considered the gold standard for multiple testing correction in GWAS, because they simultaneously provide unbiased Type I error control and high power. However, they demand heavy computational effort, especially with large-scale data sets of modern GWAS. In recent years, the computational problem has been circumvented by using approximations to permutation tests, but several studies have posed sampling conditions in which these approximations are suggestive to be biased.
We have developed an optimized parallel algorithm for the permutation testing approach to multiple testing correction in GWAS, whose implementation essentially abates the computational problem. When introduced to GWAS data, our algorithm yields rapid, precise, and powerful multiplicity adjustment, many orders of magnitude faster than existing employed GWAS statistical software.
Although GWAS have identified many potentially important genetic associations which will advance our understanding of human disease, the common variants with modest effects on disease risk discovered through this approach likely account for a small proportion of the heritability in complex disease. On the other hand, interactions between genetic and environmental factors could account for a substantial proportion of the heritability in a complex disease and are overlooked within the GWAS approach.
We have developed an efficient and easily implemented tool for genetic association studies, whose aim is identifying genes involved in a gene-environment interaction. Our approach is amenable to a wide range of association studies and assorted densities in sampled genetic marker panels, and incorporates resampling for multiple testing correction. Within the context of a case-control study design we demonstrate by way of simulation that our proposed method offers greater statistical power to detect gene-environment interaction, when compared to several competing approaches to assess this type of interaction.
|
6 |
On Bayesian Multiplicity Adjustment in Multiple TestingGecili, Emrah January 2018 (has links)
No description available.
|
7 |
Incorporating Correlations to Improve Multiple Testing Procedures Controlling False DiscoveriesHe, Li January 2011 (has links)
Multiple testing is playing an important role in analyzing data from modern scientific investigations. Some fundamentally important theoretical and methodological issues related to multiple testing still remain to be fully investigated. Often the correlation structure among test statistics involved in multiple testing is known a priori or it can be estimated from the data, yet this structure is not often properly taken into consideration while developing multiple testing procedures, even though not doing so might result in a less powerful method than one would like to have or lead to irrelevant or misleading conclusions. This dissertation focuses on research related to improving some of the commonly used multiple testing procedures by incorporating correlations into them. We propose several new results in this dissertation and present some ideas to carry out further research. / Statistics
|
8 |
Multiple testing using the posterior probability of half-space: application to gene expression data.Labbe, Aurelie January 2005 (has links)
We consider the problem of testing the equality of two sample means, when the number of tests performed is large. Applying this problem to the context of gene expression data, our goal is to detect a set of genes differentially expressed under two treatments or two biological conditions. A null hypothesis of no difference in the gene expression under the two conditions is constructed. Since such a hypothesis is tested for each gene, it follows that thousands of tests are performed simultaneously, and multiple testing issues then arise. The aim of our research is to make a connection between Bayesian analysis and frequentist theory in the context of multiple comparisons by deriving some properties shared by both p-values and posterior probabilities. The ultimate goal of this work is to use the posterior probability of the one-sided alternative hypothesis (or equivalently, posterior probability of the half-space) in the same spirit as a p-value. We show for instance that such a Bayesian probability can be used as an input in some standard multiple testing procedures controlling for the False Discovery rate.
|
9 |
Multiple testing using the posterior probability of half-space: application to gene expression data.Labbe, Aurelie January 2005 (has links)
We consider the problem of testing the equality of two sample means, when the number of tests performed is large. Applying this problem to the context of gene expression data, our goal is to detect a set of genes differentially expressed under two treatments or two biological conditions. A null hypothesis of no difference in the gene expression under the two conditions is constructed. Since such a hypothesis is tested for each gene, it follows that thousands of tests are performed simultaneously, and multiple testing issues then arise. The aim of our research is to make a connection between Bayesian analysis and frequentist theory in the context of multiple comparisons by deriving some properties shared by both p-values and posterior probabilities. The ultimate goal of this work is to use the posterior probability of the one-sided alternative hypothesis (or equivalently, posterior probability of the half-space) in the same spirit as a p-value. We show for instance that such a Bayesian probability can be used as an input in some standard multiple testing procedures controlling for the False Discovery rate.
|
10 |
Controlling Type 1 Error Rate in Evaluating Differential Item Functioning for Four DIF Methods: Use of Three Procedures for Adjustment of Multiple Item TestingKim, Jihye 25 October 2010 (has links)
In DIF studies, a Type I error refers to the mistake of identifying non-DIF items as DIF items, and a Type I error rate refers to the proportion of Type I errors in a simulation study. The possibility of making a Type I error in DIF studies is always present and high possibility of making such an error can weaken the validity of the assessment. Therefore, the quality of a test assessment is related to a Type I error rate and to how to control such a rate. Current DIF studies regarding a Type I error rate have found that the latter rate can be affected by several factors, such as test length, sample size, test group size, group mean difference, group standard deviation difference, and an underlying model. This study focused on another undiscovered factor that may affect a Type I error rate; the effect of multiple testing. DIF analysis conducts multiple significance testing of items in a test, and such multiple testing may increase the possibility of making a Type I error at least once. The main goal of this dissertation was to investigate how to control a Type I error rate using adjustment procedures for multiple testing which have been widely used in applied statistics but rarely used in DIF studies. In the simulation study, four DIF methods were performed under a total of 36 testing conditions; the methods were the Mantel-Haenszel method, the logistic regression procedure, the Differential Functioning Item and Test framework, and the Lord’s chi-square test. Then the Bonferroni correction, the Holm’s procedure, and the BH method were applied as an adjustment of multiple significance testing. The results of this study showed the effectiveness of three adjustment procedures in controlling a Type I error rate.
|
Page generated in 0.1054 seconds