Global ETD Search

1	AN APPROACH FOR FINDING A GENERAL APPROXIMATION TO THE GROUP SEQUENTIAL BOOTSTRAP TEST Ekstedt, Douglas January 2022 (has links) Randomized experiments are regarded as the gold standard for estimating causal effects. Commonly, a single test is performed using a fixed sample size. However, observations may also be observed sequentially and because of economical and ethical reasons, it may be desirable to terminate the trial early. The group sequential design allows for interim analyses and early stopping of a trial without the need for continuous monitoring of the accumulating data. The implementation of a group sequential procedure requires that the sampling distribution of the test statistic observed at each wave of testing to have a known or asymptotically known sampling distribution. This thesis investigates an approach for finding a general approximation to the group sequential bootstrap test for test statistics with unknown or analytically intractable sampling distributions. There is currently no bootstrap version of the group sequential test. The approach implies approximating the covariance structure of the test statistics over time, but not the marginal sampling distribution, with that of a normal test statistic. The evaluation is performed with a Monte Carlo simulation study where the achieved significance level is compared to the nominal. Evidence from the Monte Carlo simulations suggests that the approach performs well for test statistics with sampling distributions close to a normal distribution. group sequential design alpha spending bootstrap false positive rate Probability Theory and Statistics Sannolikhetsteori och statistik
2	Statistical methods to identify differentially methylated regions using illumina methylation arrays Zheng, Yuanchao 08 February 2024 (has links) DNA methylation is an epigenetic mechanism that usually occurs at CpG sites in the genome. Both sequencing and array-based techniques are available to detect methylation patterns. Whole-genome bisulfite sequencing is the most comprehensive but cost-prohibitive approach, and microarrays represent an affordable alternative approach. Array-based methods are generally cheaper but assess a specific number of genomic loci, such as Illumina methylation arrays. Differentially methylated regions (DMRs) are genomic regions with specific methylation patterns across multiple CpG sites that associate with a phenotype. Methylation at nearby sites tends to be correlated, therefore it may be more powerful to study sets of sites to detect methylation differences as well as reduce the multiple testing burden, compared to utilizing individual sites. Several statistical approaches exist for identifying DMRs, and a few prior publications compared the performance of several commonly used DMR methods. However, as far as we know, no comprehensive comparisons have been made based on genome-wide simulation studies. This dissertation provides some comprehensive suggestions for DMR analysis based on genome-wide evaluations of existing DMR tools and presents the development of a novel approach to increase the power to identify DMRs with clinical value in genomic research. The second chapter presents genome-wide null simulations to compare five commonly used array-based DMR methods (Bumphunter, comb-p, DMRcate, mCSEA and coMethDMR) and identifies coMethDMR as the only approach that consistently yields appropriate Type I error control. We suggest that a genome-wide evaluation of false positive (FP) rates is critical for DMR methods. The third chapter develops a novel Principal Component Analysis based DMR method (denoted as DMRPC), which demonstrates its ability to identify DMRs using genome-wide methylation arrays with well-controlled FP rates at the level of 0.05. Compared to coMethDMR, DMRPC is a robust and powerful novel DMR tool that can examine more genomic regions and extract signals from low-correlation regions. The fourth chapter applies the new DMR approach DMRPC in two “real-world” datasets and identifies novel DMRs that are associated with several inflammatory markers. Biostatistics Cytokine Differentially methylated region DNA methylation False positive rate Principal component analysis
3	Controlling false positive rate in network analysis of transcriptomic data Xu, Huan 01 October 2019 (has links) No description available. Bioinformatics Network analysis transcriptomic data false positive rate inter-gene correlation network node degree heterogeneous node association
4	Robust estimation for spatial models and the skill test for disease diagnosis Lin, Shu-Chuan 25 August 2008 (has links) This thesis focuses on (1) the statistical methodologies for the estimation of spatial data with outliers and (2) classification accuracy of disease diagnosis. Chapter I, Robust Estimation for Spatial Markov Random Field Models: Markov Random Field (MRF) models are useful in analyzing spatial lattice data collected from semiconductor device fabrication and printed circuit board manufacturing processes or agricultural field trials. When outliers are present in the data, classical parameter estimation techniques (e.g., least squares) can be inefficient and potentially mislead the analyst. This chapter extends the MRF model to accommodate outliers and proposes robust parameter estimation methods such as the robust M- and RA-estimates. Asymptotic distributions of the estimates with differentiable and non-differentiable robustifying function are derived. Extensive simulation studies explore robustness properties of the proposed methods in situations with various amounts of outliers in different patterns. Also provided are studies of analysis of grid data with and without the edge information. Three data sets taken from the literature illustrate advantages of the methods. Chapter II, Extending the Skill Test for Disease Diagnosis: For diagnostic tests, we present an extension to the skill plot introduced by Mozer and Briggs (2003). The method is motivated by diagnostic measures for osteoporosis in a study. By restricting the area under the ROC curve (AUC) according to the skill statistic, we have an improved diagnostic test for practical applications by considering the misclassification costs. We also construct relationships, using the Koziol-Green model and mean-shift model, between the diseased group and the healthy group for improving the skill statistic. Asymptotic properties of the skill statistic are provided. Simulation studies compare the theoretical results and the estimates under various disease rates and misclassification costs. We apply the proposed method in classification of osteoporosis data. True positive rate False positive rate Classification Disease diagnosis Skill test Robust estimation Spatial models Markov random field models Spatial lattice data Koziol-Green model and mean-shift model Area under the curve ROC curve Markov random fields Lattice theory Outliers (Statistics)
5	A performance measurement of a Speaker Verification system based on a variance in data collection for Gaussian Mixture Model and Universal Background Model Bekli, Zeid, Ouda, William January 2018 (has links) Voice recognition has become a more focused and researched field in the last century,and new techniques to identify speech has been introduced. A part of voice recognition isspeaker verification which is divided into Front-end and Back-end. The first componentis the front-end or feature extraction where techniques such as Mel-Frequency CepstrumCoefficients (MFCC) is used to extract the speaker specific features of a speech signal,MFCC is mostly used because it is based on the known variations of the humans ear’scritical frequency bandwidth. The second component is the back-end and handles thespeaker modeling. The back-end is based on the Gaussian Mixture Model (GMM) andGaussian Mixture Model-Universal Background Model (GMM-UBM) methods forenrollment and verification of the specific speaker. In addition, normalization techniquessuch as Cepstral Means Subtraction (CMS) and feature warping is also used forrobustness against noise and distortion. In this paper, we are going to build a speakerverification system and experiment with a variance in the amount of training data for thetrue speaker model, and to evaluate the system performance. And further investigate thearea of security in a speaker verification system then two methods are compared (GMMand GMM-UBM) to experiment on which is more secure depending on the amount oftraining data available.This research will therefore give a contribution to how much data is really necessary fora secure system where the False Positive is as close to zero as possible, how will theamount of training data affect the False Negative (FN), and how does this differ betweenGMM and GMM-UBM.The result shows that an increase in speaker specific training data will increase theperformance of the system. However, too much training data has been proven to beunnecessary because the performance of the system will eventually reach its highest point and in this case it was around 48 min of data, and the results also show that the GMMUBM model containing 48- to 60 minutes outperformed the GMM models. Speaker recognition Speaker Verification Speaker authentication Speaker classification Gaussian mixture model Biometric System Equal Error Rate False Negative Rate False Positive Rate Mel Frequency Cepstrum Coefficients Engineering and Technology Teknik och teknologier

1

Page generated in 0.0837 seconds