Global ETD Search

71	Multiple hypothesis testing and multiple outlier identification methods Yin, Yaling 13 April 2010 (has links) Traditional multiple hypothesis testing procedures, such as that of Benjamini and Hochberg, fix an error rate and determine the corresponding rejection region. In 2002 Storey proposed a fixed rejection region procedure and showed numerically that it can gain more power than the fixed error rate procedure of Benjamini and Hochberg while controlling the same false discovery rate (FDR). In this thesis it is proved that when the number of alternatives is small compared to the total number of hypotheses, Storeys method can be less powerful than that of Benjamini and Hochberg. Moreover, the two procedures are compared by setting them to produce the same FDR. The difference in power between Storeys procedure and that of Benjamini and Hochberg is near zero when the distance between the null and alternative distributions is large, but Benjamini and Hochbergs procedure becomes more powerful as the distance decreases. It is shown that modifying the Benjamini and Hochberg procedure to incorporate an estimate of the proportion of true null hypotheses as proposed by Black gives a procedure with superior power.<p> Multiple hypothesis testing can also be applied to regression diagnostics. In this thesis, a Bayesian method is proposed to test multiple hypotheses, of which the i-th null and alternative hypotheses are that the i-th observation is not an outlier versus it is, for i=1,...,m. In the proposed Bayesian model, it is assumed that outliers have a mean shift, where the proportion of outliers and the mean shift respectively follow a Beta prior distribution and a normal prior distribution. It is proved in the thesis that for the proposed model, when there exists more than one outlier, the marginal distributions of the deletion residual of the i-th observation under both null and alternative hypotheses are doubly noncentral t distributions. The outlyingness of the i-th observation is measured by the marginal posterior probability that the i-th observation is an outlier given its deletion residual. An importance sampling method is proposed to calculate this probability. This method requires the computation of the density of the doubly noncentral F distribution and this is approximated using Patnaiks approximation. An algorithm is proposed in this thesis to examine the accuracy of Patnaiks approximation. The comparison of this algorithms output with Patnaiks approximation shows that the latter can save massive computation time without losing much accuracy.<p> The proposed Bayesian multiple outlier identification procedure is applied to some simulated data sets. Various simulation and prior parameters are used to study the sensitivity of the posteriors to the priors. The area under the ROC curves (AUC) is calculated for each combination of parameters. A factorial design analysis on AUC is carried out by choosing various simulation and prior parameters as factors. The resulting AUC values are high for various selected parameters, indicating that the proposed method can identify the majority of outliers within tolerable errors. The results of the factorial design show that the priors do not have much effect on the marginal posterior probability as long as the sample size is not too small.<p> In this thesis, the proposed Bayesian procedure is also applied to a real data set obtained by Kanduc et al. in 2008. The proteomes of thirty viruses examined by Kanduc et al. are found to share a high number of pentapeptide overlaps to the human proteome. In a linear regression analysis of the level of viral overlaps to the human proteome and the length of viral proteome, it is reported by Kanduc et al. that among the thirty viruses, human T-lymphotropic virus 1, Rubella virus, and hepatitis C virus, present relatively higher levels of overlaps with the human proteome than the predicted level of overlaps. The results obtained using the proposed procedure indicate that the four viruses with extremely large sizes (Human herpesvirus 4, Human herpesvirus 6, Variola virus, and Human herpesvirus 5) are more likely to be the outliers than the three reported viruses. The results with thefour extreme viruses deleted confirm the claim of Kanduc et al. mean shift noncentrality parameter area under ROC curve receiver operating characteristic false discovery rate microarray doubly noncentral t distribution pentapeptide amino acid sequence similarity
72	Empirical likelihood and extremes Gong, Yun 17 January 2012 (has links) In 1988, Owen introduced empirical likelihood as a nonparametric method for constructing confidence intervals and regions. Since then, empirical likelihood has been studied extensively in the literature due to its generality and effectiveness. It is well known that empirical likelihood has several attractive advantages comparing to its competitors such as bootstrap: determining the shape of confidence regions automatically using only the data; straightforwardly incorporating side information expressed through constraints; being Bartlett correctable. The main part of this thesis extends the empirical likelihood method to several interesting and important statistical inference situations. This thesis has four components. The first component (Chapter II) proposes a smoothed jackknife empirical likelihood method to construct confidence intervals for the receiver operating characteristic (ROC) curve in order to overcome the computational difficulty when we have nonlinear constrains in the maximization problem. The second component (Chapter III and IV) proposes smoothed empirical likelihood methods to obtain interval estimation for the conditional Value-at-Risk with the volatility model being an ARCH/GARCH model and a nonparametric regression respectively, which have applications in financial risk management. The third component(Chapter V) derives the empirical likelihood for the intermediate quantiles, which plays an important role in the statistics of extremes. Finally, the fourth component (Chapter VI and VII) presents two additional results: in Chapter VI, we present an interesting result by showing that, when the third moment is infinity, we may prefer the Student's t-statistic to the sample mean standardized by the true standard deviation; in Chapter VII, we present a method for testing a subset of parameters for a given parametric model of stationary processes. Diffusion processes Nonparametric likelihood GARCH ROC curve Value at Risk Empirical likelihood Extremal problems (Mathematics) Golden section Calculus of variations Bootstrap (Statistics) Mathematical statistics
73	Nichtparametrische Analyse diagnostischer Gütemaße bei Clusterdaten / Nonparametric analysis of diagnostic accuracy measurements regarding clustered data Lange, Katharina 04 March 2011 (has links) No description available. Diagnosestudien ROC-Kurve AUC Sensitvität Spezifität prädiktive Werte Clusterdaten Diagnostic trials ROC-Curve AUC sensitivity specificity predictive values clustered data nonparametric Behrens-Fisher problem
74	Nichtparametrische Analyse von diagnostischen Tests / Nonparametric Analysis of diagnostic trials Werner, Carola 07 July 2006 (has links) No description available. 310 Statistik EGC070 Mathematics and Natural Science Nichtparametrik Diagnosestudie ROC-Kurve AUC clustered data nonparametric diagnostic trials ROC curve AUC clustered data 44.32 31.73
75	A Generalization of AUC to an Ordered Multi-Class Diagnosis and Application to Longitudinal Data Analysis on Intellectual Outcome in Pediatric Brain-Tumor Patients Li, Yi 10 April 2009 (has links) Receiver operating characteristic (ROC) curves have been widely used in evaluation of the goodness of the diagnostic method in many study fields, such as disease diagnosis in medicine. The area under the ROC curve (AUC) naturally became one of the most used variables in gauging the goodness of the diagnosis (Mossman, Somoza 1991). Since medical diagnosis often is not dichotomous, the ROC curve and AUC need to be generalized to a multi-dimensional case. The generalization of AUC to multi-class case has been studied by many researchers in the past decade. Most recently, Nakas & Yiannoutsos (2004) considered the ordered d classes ROC analysis by only considering the sensitivities of each class. Hence, their dimension is only d. Cha (2005) considered more types of mis-classification in the ordered multiple-class case, but reduced the dimension of Ferri, at.el. from d(d-1) to 2(d-1). In this dissertation we are trying to adjust and calculate the VUS for an ordered multipleclass with Cha’s 2(d-1)-dimension method. Our methodology of finding the VUS is introduced. We present the method of adjusting and calculating VUS and their statistical inferences for the 2(d-1)-dimension. Some simulation results are included and a real example will be presented. Intellectual outcomes in pediatric brain-tumor patients were investigated in a prospective longitudinal study. The Standard-Binet Intelligence Scale-Fourth Edition (SB-IV) Standard Age Score (SAS) and Composite intelligence quotient (IQ) score are examined as cognitive outcomes in pediatric brain-tumor patients. Treatment factors, patient factors and time since diagnosis are taken into account as the risk factors. Hierarchical linear/quadratic models and Gompertz based hierarchical nonlinear growth models were applied to build linear and nonlinear longitudinal curves. We use PRESS and Volume Under the Surface (VUS) as the criterions to compare these two methods. Some model interpretations are presented in this dissertation. Pediatric brain tumors PRESS Gompertz growth model Hierarchical linear model Standard Binet Intelligence Longitudinal data ROC curve Volume under the surface (VUS) Mathematics
76	Some Topics in Roc Curves Analysis Huang, Xin 07 May 2011 (has links) The receiver operating characteristic (ROC) curves is a popular tool for evaluating continuous diagnostic tests. The traditional definition of ROC curves incorporates implicitly the idea of "hard" thresholding, which also results in the empirical curves being step functions. The first topic is to introduce a novel definition of soft ROC curves, which incorporates the idea of "soft" thresholding. The softness of a soft ROC curve is controlled by a regularization parameter that can be selected suitably by a cross-validation procedure. A byproduct of the soft ROC curves is that the corresponding empirical curves are smooth. The second topic is on combination of several diagnostic tests to achieve better diagnostic accuracy. We consider the optimal linear combination that maximizes the area under the receiver operating characteristic curve (AUC); the estimates of the combination's coefficients can be obtained via a non-parametric procedure. However, for estimating the AUC associated with the estimated coefficients, the apparent estimation by re-substitution is too optimistic. To adjust for the upward bias, several methods are proposed. Among them the cross-validation approach is especially advocated, and an approximated cross-validation is developed to reduce the computational cost. Furthermore, these proposed methods can be applied for variable selection to select important diagnostic tests. However, the above best-subset variable selection method is not practical when the number of diagnostic tests is large. The third topic is to further develop a LASSO-type procedure for variable selection. To solve the non-convex maximization problem in the proposed procedure, an efficient algorithm is developed based on soft ROC curves, difference convex programming, and coordinate descent algorithm. Area under curve Coordinate descent Cross-validation Difference convex programming Diagnostic test Over-fitting Regularization ROC curve Thresholding Variable selection Mathematics
77	Infrared Spectroscopy in Combination with Advanced Statistical Methods for Distinguishing Viral Infected Biological Cells Tang, Tian 17 November 2008 (has links) Fourier Transform Infrared (FTIR) microscopy is a sensitive method for detecting difference in the morphology of biological cells. In this study FTIR spectra were obtained for uninfected cells, and cells infected with two different viruses. The spectra obtained are difficult to discriminate visually. Here we apply advanced statistical methods to the analysis of the spectra, to test if such spectra are useful for diagnosing viral infections in cells. Logistic Regression (LR) and Partial Least Squares Regression (PLSR) were used to build models which allow us to diagnose if spectral differences are related to infection state of the cells. A three-fold, balanced cross-validation method was applied to estimate the shrinkages of the area under the receiving operator characteristic curve (AUC), and specificities at sensitivities of 95%, 90% and 80%. AUC, sensitivity and specificity were used to gauge the goodness of the discrimination methods. Our statistical results shows that the spectra associated with different cellular states are very effectively discriminated. We also find that the overall performance of PLSR is better than that of LR, especially for new data validation. Our analysis supports the idea that FTIR microscopy is a useful tool for detection of viral infections in biological cells. Cross-validation Wilcoxon Rank Sum Test Logistic Regression Partial Least Square Regression Area under the ROC Curve Sensitivity and specificity Infrared spectroscopy Mathematics
78	Advanced Statistical Methodologies in Determining the Observation Time to Discriminate Viruses Using FTIR Luo, Shan 13 July 2009 (has links) Fourier transform infrared (FTIR) spectroscopy, one method of electromagnetic radiation for detecting specific cellular molecular structure, can be used to discriminate different types of cells. The objective is to find the minimum time (choice among 2 hour, 4 hour and 6 hour) to record FTIR readings such that different viruses can be discriminated. A new method is adopted for the datasets. Briefly, inner differences are created as the control group, and Wilcoxon Signed Rank Test is used as the first selecting variable procedure in order to prepare the next stage of discrimination. In the second stage we propose either partial least squares (PLS) method or simply taking significant differences as the discriminator. Finally, k-fold cross-validation method is used to estimate the shrinkages of the goodness measures, such as sensitivity, specificity and area under the ROC curve (AUC). There is no doubt in our mind 6 hour is enough for discriminating mock from Hsv1, and Coxsackie viruses. Adeno virus is an exception. Bootstrap method K-fold Cross-Validation Shrinkage Sensitivity Specificity Inner-difference Intra-difference Wilcoxon Signed-Rank Test Partial Least Square Regression Area Under the ROC Curve Mathematics
79	L'évaluation du risque de récidive chez les agresseurs sexuels adultes Parent, Geneviève January 2008 (has links) Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal Délinquance sexuelle Sexual delinquency Récidive Recidivism Prédiction Prediction Évaluation du risque Risk assessment Courbe ROC RoC curve Classification and regression tree
80	Detection of malignancy associated changes in cervical cells using statistical and evolutionary computation techniques Hallinan, Jennifer Susan Unknown Date (has links) Abstract Malignancy Associated Changes are subtle alterations in the morphology and nuclear texture of cells in the vicinity of a malignant lesion. The phenomenon was first described in 1959, and has been the subject of considerable research in the four intervening decades, due to its potential utility to cancer screening programs. In this thesis the history of research into malignancy associated changes is reviewed, and the major findings of previous workers summarized. Original work aimed at improving the accuracy of classification of Pap smear slides is described in detail. A novel algorithm, which incorporates a genetic algorithm for feature selection and training of a neural network, is described. The algorithm was tested upon a large artificial dataset consisting of points from nested spheres in multiple dimensions. It was able to select the most discriminatory features and classify data with 99% accuracy on 80% of runs for two dimensional data, and on 90% of runs for three-dimensional data. The algorithm was also tested on two real data sets from the UCI Machine Learning Repository, the sonar data and the ionosphere data. On both of these datasets the algorithm produced a classifier using a subset of features which performed as well as previously reported classifiers using the full feature set. This algorithm was then tested on a large dataset of cell images, and its performance compared with that of the standard stepwise linear discriminant analysis approach. Both of these approaches produced similar results, which are comparable to those of previous workers in this field. Interestingly, runs of the genetic algorithm with different random number seeds tended to select different feature subsets, which produced approximately equivalent performance. This finding indicates that amongst the features used, which were selected from those previously identified in the literature as useful for MACs detection, many subsets exist which are equally discriminatory. malignancy associated changes ROC curve genetic algorithms evolutionary computations cervical cancer neural networks Pap Smear computer image analysis automated screening pattern recognition

Search results