Global ETD Search

21	Bayesian Adjustment for Multiplicity Scott, James Gordon January 2009 (has links) <p>This thesis is about Bayesian approaches for handling multiplicity. It considers three main kinds of multiple-testing scenarios: tests of exchangeable experimental units, tests for variable inclusion in linear regresson models, and tests for conditional independence in jointly normal vectors. Multiplicity adjustment in these three areas will be seen to have many common structural features. Though the modeling approach throughout is Bayesian, frequentist reasoning regarding error rates will often be employed.</p><p>Chapter 1 frames the issues in the context of historical debates about Bayesian multiplicity adjustment. Chapter 2 confronts the problem of large-scale screening of functional data, where control over Type-I error rates is a crucial issue. Chapter 3 develops new theory for comparing Bayes and empirical-Bayes approaches for multiplicity correction in regression variable selection. Chapters 4 and 5 describe new theoretical and computational tools for Gaussian graphical-model selection, where multiplicity arises in performing many simultaneous tests of pairwise conditional independence. Chapter 6 introduces a new approach to sparse-signal modeling based upon local shrinkage rules. Here the focus is not on multiplicity per se, but rather on using ideas from Bayesian multiple-testing models to motivate a new class of multivariate scale-mixture priors. Finally, Chapter 7 describes some directions for future study, many of which are the subjects of my current research agenda.</p> / Dissertation Statistics Bayesian statistics graphical models model selection multiple testing variable selection
22	Statistical Learning and Behrens Fisher Distribution Methods for Heteroscedastic Data in Microarray Analysis Manandhr-Shrestha, Nabin K. 29 March 2010 (has links) The aim of the present study is to identify the di®erentially expressed genes be- tween two di®erent conditions and apply it in predicting the class of new samples using the microarray data. Microarray data analysis poses many challenges to the statis- ticians because of its high dimensionality and small sample size, dubbed as "small n large p problem". Microarray data has been extensively studied by many statisticians and geneticists. Generally, it is said to follow a normal distribution with equal vari- ances in two conditions, but it is not true in general. Since the number of replications is very small, the sample estimates of variances are not appropriate for the testing. Therefore, we have to consider the Bayesian approach to approximate the variances in two conditions. Because the number of genes to be tested is usually large and the test is to be repeated thousands of times, there is a multiplicity problem. To remove the defect arising from multiple comparison, we use the False Discovery Rate (FDR) correction. Applying the hypothesis test repeatedly gene by gene for several thousands of genes, there is a great chance of selecting false genes as di®erentially expressed, even though the signi¯cance level is set very small. For the test to be reliable, the probability of selecting true positive should be high. To control the false positive rate, we have applied the FDR correction, in which the p -values for each of the gene is compared with its corresponding threshold. A gene is, then, said to be di®erentially expressed if the p-value is less than the threshold. We have developed a new method of selecting informative genes based on the Bayesian Version of Behrens-Fisher distribution which assumes the unequal variances in two conditions. Since the assumption of equal variances fail in most of the situation and the equal variance is a special case of unequal variance, we have tried to solve the problem of ¯nding di®erentially expressed genes in the unequal variance cases. We have found that the developed method selects the actual expressed genes in the simulated data and compared this method with the recent methods such as Fox and Dimmic’s t-test method, Tusher and Tibshirani’s SAM method among others. The next step of this research is to check whether the genes selected by the pro- posed Behrens -Fisher method is useful for the classi¯cation of samples. Using the genes selected by the proposed method that combines the Behrens Fisher gene se- lection method with some other statistical learning methods, we have found better classi¯cation result. The reason behind it is the capability of selecting the genes based on the knowledge of prior and data. In the case of microarray data due to the small sample size and the large number of variables, the variances obtained by the sample is not reliable in the sense that it is not positive de¯nite and not invertible. So, we have derived the Bayesian version of the Behrens Fisher distribution to remove that insu±ciency. The e±ciency of this established method has been demonstrated by ap- plying them in three real microarray data and calculating the misclassi¯cation error rates on the corresponding test sets. Moreover, we have compared our result with some of the other popular methods, such as Nearest Shrunken Centroid and Support Vector Machines method, found in the literature. We have studied the classi¯cation performance of di®erent classi¯ers before and after taking the correlation between the genes. The classi¯cation performance of the classi¯er has been signi¯cantly improved once the correlation was accounted. The classi¯cation performance of di®erent classi¯ers have been measured by the misclas- si¯cation rates and the confusion matrix. The another problem in the multiple testing of large number of hypothesis is the correlation among the test statistics. we have taken the correlation between the test statistics into account. If there were no correlation, then it will not a®ect the shape of the normalized histogram of the test statistics. As shown by Efron, the degree of the correlation among the test statistics either widens or shrinks the tail of the histogram of the test statistics. Thus the usual rejection region as obtained by the signi¯cance level is not su±cient. The rejection region should be rede¯ned accordingly and depends on the degree of correlation. The e®ect of the correlation in selecting the appropriate rejection region have also been studied. Genes False Discovery Rate Multiple Testing Correlation Classi¯cation American Studies Arts and Humanities Mathematics Statistics and Probability
23	Mnohorozměrná statistika a aplikace na studium genů / Multidimensional statistics and applications to study genes Bubelíny, Peter January 2014 (has links) Title: Multidimensional statistics and applications to study genes Author: Mgr. Peter Bubelíny Department: Department of probability and mathematical statistics Supervisor: prof. Lev Klebanov, DrSc., KPMS MFF UK Abstract: Microarray data of gene expressions consist of thousands of genes and just some tens of observations. Moreover, genes are highly correlated between themselves and contain systematic errors. Hence the magnitude of these data does not afford us to estimate their correlation structure. In many statistical problems with microarray data, we have to test some thousands of hypotheses simultaneously. Due to dependence between genes, p-values of these hypotheses are dependent as well. In this work, we compared conve- nient multiple testing procedures reasonable for dependent hypotheses. The common manner to make microarray data more uncorrelated and partially eliminate systematic errors is normalizing them. We proposed some new normalizations and studied how different normalizations influence hypothe- ses testing. Moreover, we compared tests for finding differentially expressed genes or gene sets and identified some interesting properties of some tests such as bias of two-sample Kolmogorov-Smirnov test and interesting behav- ior of Hotelling's test for dependent components of observations. In the end of...
24	Improving the efficiency of clinical trial designs by using historical control data or adding a treatment arm to an ongoing trial Bennett, Maxine Sarah January 2018 (has links) The most common type of confirmatory trial is a randomised trial comparing the experimental treatment of interest to a control treatment. Confirmatory trials are expensive and take a lot of time in the planning, set up and recruitment of patients. Efficient methodology in clinical trial design is critical to save both time and money and allow treatments to become available to patients quickly. Often there are data available on the control treatment from a previous trial. These historical data are often used to design new trials, forming the basis of sample size calculations, but are not used in the analysis of the new trial. Incorporating historical control data into the design and analysis could potentially lead to more efficient trials. When the historical and current control data agree, incorporating historical control data could reduce the number of control patients required in the current trial and therefore the duration of the trial, or increase the precision of parameter estimates. However, when the historical and current data are inconsistent, there is a potential for biased treatment effect estimates, inflated type I error and reduced power. We propose two novel weights to assess agreement between the current and historical control data: a probability weight based on tail area probabilities; and a weight based on the equivalence of the historical and current control data parameters. For binary outcome data, agreement is assessed using the posterior distributions of the response probability in the historical and current control data. For normally distributed outcome data, agreement is assessed using the marginal posterior distributions of the difference in means and the ratio of the variances of the current and historical control data. We consider an adaptive design with an interim analysis. At the interim, the agreement between the historical and current control data is assessed using the probability or equivalence probability weight approach. The allocation ratio is adapted to randomise fewer patients to control when there is agreement and revert back to a standard trial design when there is disagreement. The final analysis is Bayesian utilising the analysis approach of the power prior with a fixed weight. The operating characteristics of the proposed design are explored and we show how the equivalence bounds can be chosen at the design stage of the current study to control the maximum inflation in type I error. We then consider a design where a treatment arm is added to an ongoing clinical trial. For many disease areas, there are often treatments in different stages of the development process. We consider the design of a two-arm parallel group trial where it is planned to add a new treatment arm during the trial. This could potentially save money, patients, time and resources. The addition of a treatment arm creates a multiple comparison problem. Dunnett (1955) proposed a design that controls the family-wise error rate when comparing multiple experimental treatments to control and determined the optimal allocation ratio. We have calculated the correlation between test statistics for the method proposed by Dunnett when a treatment arm is added during the trial and only concurrent controls are used for each treatment comparison. We propose an adaptive design where the sample size of all treatment arms are increased to control the family-wise error rate. We explore adapting the allocation ratio once the new treatment arm is added to maximise the overall power of the trial.
25	Estimating the Local False Discovery Rate via a Bootstrap Solution to the Reference Class Problem: Application to Genetic Association Data Abbas Aghababazadeh, Farnoosh January 2015 (has links) Modern scientific technology such as microarrays, imaging devices, genome-wide association studies or social science surveys provide statisticians with hundreds or even thousands of tests to consider simultaneously. Testing many thousands of null hypotheses may increase the number of Type $I$ errors. In large-scale hypothesis testing, researchers can use different statistical techniques such as family-wise error rates, false discovery rates, permutation methods, local false discovery rate, where all available data usually should be analyzed together. In applications, the thousands of tests are related by a scientifically meaningful structure. Ignoring that structure can be misleading as it may increase the number of false positives and false negatives. As an example, in genome-wide association studies each test corresponds to a specific genetic marker. In such a case, the scientific structure for each genetic marker can be its minor allele frequency. In this research, the local false discovery rate as a relevant statistical approach is considered to analyze the thousands of tests together. We present a model for multiple hypothesis testing when the scientific structure of each test is incorporated as a co-variate. The purpose of this model is to incorporate the co-variate to improve the performance of testing procedures. The method we consider has different estimates depending on the tuning parameter. We would like to estimate the optimal value of that parameter by considering observed statistics. Thus, among those estimators, the one which minimizes the estimated errors due to bias and to variance is chosen by applying the bootstrap approach. Such an estimation method is called an adaptive reference class method. Under the combined reference class method, the effect of the co-variates is ignored and all null hypotheses should be analyzed together. In this research, under some assumptions for the co-variates and the prior probabilities, the proposed adaptive reference class method shows smaller error than the combined reference class method in estimating the local false discovery rate, when the number of tests gets large. We describe the adaptive reference class method to the coronary artery disease data, and we use simulation data to evaluate the performance of the estimator associated with the adaptive reference class method. Multiple Testing Local False Discovery Rtae Reference Class Bias-variance Trade Off Tuning Parameter Bootstrap Approach
26	A Geometry-Based Multiple Testing Correction for Contingency Tables by Truncated Normal Distribution / 切断正規分布を用いた分割表の幾何学的マルチプルテスティング補正法 Basak, Tapati 24 May 2021 (has links) 京都大学 / 新制・課程博士 / 博士(医学) / 甲第23367号 / 医博第4736号 / 新制\|\|医\|\|1051(附属図書館) / 京都大学大学院医学研究科医学専攻 / (主査)教授森田智視, 教授川上浩司, 教授佐藤俊哉 / 学位規則第4条第1項該当 / Doctor of Medical Science / Kyoto University / DFAM Contingency table Convex polytope MAX-test Multiple testing Type I error Truncated normal distribution 490
27	Párová porovnání v analýze jednoduchého třídění / Paired comparisons in ANOVA Hrušková, Iveta January 2022 (has links) The problem of testing multiple hypotheses at once is called the problem of multiple testing. We focused on comparing more than two means in one- way analysis of variance, also known as ANOVA. We dealt with the Tukey me- thod, the Hothorn-Bretz-Westfall method, the bootstrap-based methods and also the Bonferroni method and its modification by the Holm method, the last two methods being popular mainly for their simplicity. We focused in detail on the asymptotic behavior of these methods and then compared them using si- mulations in terms of compliance with the prescribed level and in terms of average strength. Bonferroni's method, which is conservative, is known to lose strength compared to other methods. However, its modification of Holm's method, which is also conservative, in some cases by its strength equates to other more complex methods. 1
28	Statistical Methods for Biological and Relational Data Anderson, Sarah G. 12 July 2013 (has links) No description available. Biostatistics gene expression T-cell receptors classification multiple testing relational data social networking
29	FURTHER CONTRIBUTIONS TO MULTIPLE TESTING METHODOLOGIES FOR CONTROLLING THE FALSE DISCOVERY RATE UNDER DEPENDENCE Zhang, Shiyu, 0000-0001-8921-2453 12 1900 (has links) This thesis presents innovative approaches for controlling the False Discovery Rate (FDR) in both high-dimensional statistical inference and finite-sample cases, addressing challenges arising from various dependency structures in the data. The first project introduces novel multiple testing methods for matrix-valued data, motivated by an electroencephalography (EEG) experiment, where we model the inherent complex row-column cross-dependency using a matrix normal distribution. We proposed two methods designed for structured matrix-valued data, to approximate the true FDP that captures the underlying cross-dependency with statistical accuracy. In the second project, we focus on simultaneous testing of multivariate normal means under diverse covariance matrix structures. By adjusting p-values using a BH-type step-up procedure tailored to the known correlation matrix, we achieve robust finite-sample FDR control. Both projects demonstrate superior performance through extensive numerical studies and real-data applications, significantly advancing the field of multiple testing under dependency. The third project presented exploratory simulation results to demonstrate the methods constructed based on the paired-p-values framework that controls the FDR within the multivariate normal means testing framework. / Statistics Statistics Dependence False discovery proportion False discovery rate Matrix-valued data Multiple testing
30	Statistical Analysis of Microarray Experiments in Pharmacogenomics Rao, Youlan 09 September 2009 (has links) No description available. Statistics Microarray Normalization Multiple Testing Generalized Familywise Error Rate Bootstrap Sample Size Sensitivity Specificity Pharmacogenomics

Search results