• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 3
  • 3
  • 3
  • 1
  • 1
  • Tagged with
  • 25
  • 25
  • 6
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Empirical Likelihood Confidence Intervals for ROC Curves with Missing Data

An, Yueheng 25 April 2011 (has links)
The receiver operating characteristic, or the ROC curve, is widely utilized to evaluate the diagnostic performance of a test, in other words, the accuracy of a test to discriminate normal cases from diseased cases. In the biomedical studies, we often meet with missing data, which the regular inference procedures cannot be applied to directly. In this thesis, the random hot deck imputation is used to obtain a 'complete' sample. Then empirical likelihood (EL) confidence intervals are constructed for ROC curves. The empirical log-likelihood ratio statistic is derived whose asymptotic distribution isproved to be a weighted chi-square distribution. The results of simulation study show that the EL confidence intervals perform well in terms of the coverage probability and the average length for various sample sizes and response rates.
2

Use of fecal and serologic biomarkers in the prediction clinical outcomes in children presenting with abdominal pain and/or diarrhea

Rogerson, Sara M. 13 July 2017 (has links)
INTRODUCTION: Abdominal pain and diarrhea are two of the most common pediatric complaints. They are often associated with a diagnosis of Crohn Disease or Ulcerative Colitis, collectively known as inflammatory bowel disease (IBD). IBD is set of diseases with ill-defined pathogenesis but similar clinical presentation. Clinicians rely on colonoscopic evaluation to distinguish between the two disorders, and the rate of colonoscopies has been increasing over the past several years. With the risks and costs associated with colonoscopic evaluation, our study sought to identify physiologic variables with significant predictive value in order to better determine those most likely to have an abnormal colonoscopy. Those variables could then be incorporated into a predictive model to stratify the risk of a patient having an abnormal colonoscopy and be used as a decision assist tool for physicians. METHODS: We conducted a retrospective cohort study examining 443 patients who underwent a colonoscopy between the years of 2012 and 2016 at Boston Children’s Hospital. Data on demographics, lab work, and stool studies was collected into an online database for three separate data sets. It was analyzed using SAS 9.4 and logistic regression was performed to identify four variables with the most predictive value relating to abnormal colonoscopy. Those variables were incorporated into a predictive model. RESULTS: Several variables were determined to be statistically significant in the prediction of abnormal colonoscopy. The four variables with the most predictive value based on calculated odds ratios were family history of IBD in a first-degree relative, serum albumin, fecal lactoferrin, and platelet count. When ROC curves were generated to validate the model using the four variables for each of the data sets, the area under the ROC curve was used to assess the robustness of the predictive model. The area under the curve (AUC) for the training data set was .81, the first validation set was .79, and the second validation set was .6. DISCUSSION: ROC curves were generated for each of the data sets in order to assess the predictive ability of the model, and the AUCS were calculated. An AUC of 1.0 would indicate a predictive model with perfect predictability. The AUC of the model building set at .81 and the first validation set at .79 are indicative of a predictive model with strong predictive value. The second validation set, used to assess the success of the model on an external data set, had an AUC of .6, which is less robust in its predictive value but is of more predictive utility than a coin flip. CONCLUSION: Logistic regression yielded a parsimonious model consisting of four variables with the strongest predictive value in terms of having an abnormal colonoscopy. The variables are metrics that are routinely collected as part of ambulatory and inpatient clinic visits. When the model was validated using an external data set, it did not perform as well as expected based on the results of the training and first validation set. If the robustness of the model can be improved when validated using an external data set, it could be of great clinical utility to physicians as a decision assist tool and help to limit the number of less clinically indicated colonoscopies being performed in the future.
3

Negative Remembering

Kapucu, Aycan 01 January 2007 (has links) (PDF)
ABSTRACT NEGATIVE REMEMBERING SEPTEMBER 2007 AYCAN KAPUCU, B.A., BOGAZICI UNIVERSITY ISTANBUL M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Caren M. Rotello Three experiments investigated the use of recall-to-accept and recall-to-reject processes in recognition and remember-know decisions. In all three experiments, participants studied a mixed list of singular and plural words. During the recognition test, participants made old-new confidence ratings and remember-know judgments for studied items, lures that were similar to studied items, and new lures. Old-similar ROC curves were constructed from the confidence ratings and found to be linear, consistent with the use of a high-threshold recollective process. The ROC intercepts and remember response rates converged on the same estimates of the amount of recollection for both positive (recall-to-accept) and negative (recall-to-reject) decisions.
4

Analysis of the Effects of JPEG2000 Compression on Texture Features Extracted from Digital Mammograms

Agatheeswaran, Anuradha 11 December 2004 (has links)
The aim of this thesis is to investigate the effects of JPEG2000 compression on texture feature extraction from digitized mammograms. A partially automated computer aided diagnosis system is designed, implemented, and tested for this analysis. The system is tested on a database of 60 digital mammograms obtained from the Digital Database for Screening Mammography at the University of South Florida. Using JPEG2000, the mammograms are compressed at 20 different compression ratios ranging from 17:1 to 10,000:1. Two approaches to texture feature extraction are investigated: (i) region of interest (ROI), which is a bounding box around the segmented mass and (ii) rubber band straightening transform (RBST), which is a band of pixels around the segmented mass transformed to a rectangular strip. The gray tone spatial dependent matrices are computed from the ROI and the RBST for the original uncompressed mammograms as well as each group of compressed images. Feature selection and optimization is achieved via stepwise linear discriminant analysis. The efficacy of the features is measured using receiver operator characteristic (ROC) curves. The efficacy of the texture features obtained from the original mammograms is compared to those of the compressed mammograms. Overall, the texture feature efficacy was preserved even for relatively high compression ratios. For example, the area under the ROC curve was greater than 0.99 for compression ratios as high as 5000:1, when the RBST method was utilized. Overall, the JPEG2000 compression distorted the RBST texture features lesser than the ROI texture features.
5

Conception d’un outil simple d'utilisation pour réaliser des analyses statistiques ajustées valorisant les données de cohortes observationnelles de pathologies chroniques : application à la cohorte DIVAT / Conception of an easy to use application allowing to perform adjusted statistical analysis for the valorization of observational data from cohorts of chronic disease : application to the DIVAT cohort

Le Borgne, Florent 06 October 2016 (has links)
En recherche médicale, les cohortes permettent de mieux comprendre l'évolution d'une pathologie et d'améliorer la prise en charge des patients. La mise en évidence de liens de causalité entre certains facteurs de risque et l'évolution de l'état de santé des patients est possible grâce à des études étiologiques. L'analyse de cohortes permet aussi d'identifier des marqueurs pronostiques de l'évolution d'un état de santé. Cependant, les facteurs de confusion constituent souvent une source de biais importante dans l'interprétation des résultats des études étiologiques ou pronostiques. Dans ce manuscrit, nous présentons deux travaux de recherche en Biostatistique dans la thématique des scores de propension. Dans le premier travail, nous comparons les performances de différents modèles permettant d'évaluer la causalité d'une exposition sur l'incidence d'un événement en présence de données censurées à droite. Dans le second travail, nous proposons un estimateur de courbes ROC dépendantes du temps standardisées et pondérées permettant d'estimer la capacité prédictive d'un marqueur en prenant en compte les facteurs de confusion potentiels.En cohérence avec l'objectif de fournir des outils statistiques adaptés, nous présentons également dans ce manuscrit une application nommée Plug-Stat®. En lien direct avec la base de données, elle permet de réaliser des analyses statistiques adaptées à la pathologie afin de faciliter la recherche épidémiologique et de mieux valoriser les données de cohortes observationnelles. / In medical research, cohorts help to better understandthe evolution of a pathology and improve the care ofpatients. Causal associations between risk factors andoutcomes are regularly studied through etiological studies. Cohorts analysis also allow the identification of new markers for the prediction of the patient evolution.However, confounding factors are often source of bias in the interpretation of the results of etiologic or prognostic studies.In this manuscript, we presented two research works in Biostatistics, the common topic being propensity scores.In the first work, we compared the performances of different models allowing to evaluate the causality of an exposure on an outcome in the presence of rightc ensored data. In the second work, we proposed anestimator of standardized and weighted time-dependentROC curves. This estimator provides a measure of theprognostic capacities of a marker by taking into accountthe possible confounding factors. Consistent with our objective to provide adapted statistical tools, we also present in this manuscript an application, so-calledPlug-Stat®. Directly linked with the database, it allows toperform statistical analyses adapted to the pathology in order to facilitate epidemiological studies and improve the valorization of data from observational cohorts.
6

Comparing trend and gap statistics across tests: distributional change using ordinal methods and bayesian inference

Denbleyker, John Nickolas 01 May 2012 (has links)
The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better alternative is using mean-based statistics that can translate to parametric effect-size measures. However, these statistics as well can be problematic. When Gaussian assumptions are not met, reasonable transformations of a score scale produce non-monotonic outcomes. The present study develops a distribution-wide approach to summarize trend, gap, and gap trend (TGGT) measures. This approach counters the limitations of PAC-based measures and mean-based statistics in addition to addressing TGGT-related statistics in a manner more closely tied to both the data and questions regarding student achievement. This distribution-wide approach encompasses visual graphics such as percentile trend displays and probability-probability plots fashioned after Receiver Operating Characteristic (ROC) curve methodology. The latter is framed as the P-P plot framework that was proposed by Ho (2008) as a way to examine trends and gaps with more consideration given to questions of scale and policy decisions. The extension in this study involves three main components: (1) incorporating Bayesian inference, (2) using a multivariate structure for longitudinal data, and (3) accounting for measurement error at the individual level. The analysis is based on mathematical assessment data comprising Grade 3 to Grade 7 from a large Midwestern school district. Findings suggest that PP-based effect sizes provide a useful framework to measure aggregate test score change and achievement gaps. The distribution-wide perspective adds insight by examining both visually and numerically how trends and gaps are affected throughout the score distribution. Two notable findings using the PP-based effect sizes were (1) achievement gaps were very similar between the Focal and Audit test, and (2) trend measures were significantly larger for the Audit test. Additionally, measurement error corrections using the multivariate Bayesian CTT approach had effect sizes disattenuated from those based on observed scores. Also, the ordinal-based effect size statistics were generally larger than their parametric-based counterparts, and this disattenuation was practically equivalent to that seen by accounting for measurement error. Finally, the rank-based estimator of P(X>Y) via estimated true scores had smaller standard errors than for its parametric-based counterpart.
7

Empirical Likelihood Confidence Intervals for ROC Curves Under Right Censorship

Yang, Hanfang 16 September 2010 (has links)
In this thesis, we apply smoothed empirical likelihood method to investigate confidence intervals for the receiver operating characteristic (ROC) curve with right censoring. As a particular application of comparison of distributions from two populations, the ROC curve is constructed by the combination of cumulative distribution function and quantile function. Under mild conditions, the smoothed empirical likelihood ratio converges to chi-square distribution, which is the well-known Wilks's theorem. Furthermore, the performances of the empirical likelihood method are also illustrated by simulation studies in terms of coverage probability and average length of confidence intervals. Finally, a primary biliary cirrhosis data is used to illustrate the proposed empirical likelihood procedure.
8

Psychometric and Machine Learning Approaches to Diagnostic Classification

January 2018 (has links)
abstract: The goal of diagnostic assessment is to discriminate between groups. In many cases, a binary decision is made conditional on a cut score from a continuous scale. Psychometric methods can improve assessment by modeling a latent variable using item response theory (IRT), and IRT scores can subsequently be used to determine a cut score using receiver operating characteristic (ROC) curves. Psychometric methods provide reliable and interpretable scores, but the prediction of the diagnosis is not the primary product of the measurement process. In contrast, machine learning methods, such as regularization or binary recursive partitioning, can build a model from the assessment items to predict the probability of diagnosis. Machine learning predicts the diagnosis directly, but does not provide an inferential framework to explain why item responses are related to the diagnosis. It remains unclear whether psychometric and machine learning methods have comparable accuracy or if one method is preferable in some situations. In this study, Monte Carlo simulation methods were used to compare psychometric and machine learning methods on diagnostic classification accuracy. Results suggest that classification accuracy of psychometric models depends on the diagnostic-test correlation and prevalence of diagnosis. Also, machine learning methods that reduce prediction error have inflated specificity and very low sensitivity compared to the data-generating model, especially when prevalence is low. Finally, machine learning methods that use ROC curves to determine probability thresholds have comparable classification accuracy to the psychometric models as sample size, number of items, and number of item categories increase. Therefore, results suggest that machine learning models could provide a viable alternative for classification in diagnostic assessments. Strengths and limitations for each of the methods are discussed, and future directions are considered. / Dissertation/Thesis / Doctoral Dissertation Psychology 2018
9

Record Linkage

Larsen, Stasha Ann Bown 11 December 2013 (has links) (PDF)
This document explains the use of different metrics involved with record linkage. There are two forms of record linkage: deterministic and probabilistic. We will focus on probabilistic record linkage used in merging and updating two databases. Record pairs will be compared using character-based and phonetic-based similarity metrics to determine at what level they match. Performance measures are then calculated and Receiver Operating Characteristic (ROC) curves are formed. Finally, an economic model is applied that returns the optimal tolerance level two databases should use to determine a record pair match in order to maximize profit.
10

Multi-objective ROC learning for classification

Clark, Andrew Robert James January 2011 (has links)
Receiver operating characteristic (ROC) curves are widely used for evaluating classifier performance, having been applied to e.g. signal detection, medical diagnostics and safety critical systems. They allow examination of the trade-offs between true and false positive rates as misclassification costs are varied. Examination of the resulting graphs and calcu- lation of the area under the ROC curve (AUC) allows assessment of how well a classifier is able to separate two classes and allows selection of an operating point with full knowledge of the available trade-offs. In this thesis a multi-objective evolutionary algorithm (MOEA) is used to find clas- sifiers whose ROC graph locations are Pareto optimal. The Relevance Vector Machine (RVM) is a state-of-the-art classifier that produces sparse Bayesian models, but is unfor- tunately prone to overfitting. Using the MOEA, hyper-parameters for RVM classifiers are set, optimising them not only in terms of true and false positive rates but also a novel measure of RVM complexity, thus encouraging sparseness, and producing approximations to the Pareto front. Several methods for regularising the RVM during the MOEA train- ing process are examined and their performance evaluated on a number of benchmark datasets demonstrating they possess the capability to avoid overfitting whilst producing performance equivalent to that of the maximum likelihood trained RVM. A common task in bioinformatics is to identify genes associated with various genetic conditions by finding those genes useful for classifying a condition against a baseline. Typ- ically, datasets contain large numbers of gene expressions measured in relatively few sub- jects. As a result of the high dimensionality and sparsity of examples, it can be very easy to find classifiers with near perfect training accuracies but which have poor generalisation capability. Additionally, depending on the condition and treatment involved, evaluation over a range of costs will often be desirable. An MOEA is used to identify genes for clas- sification by simultaneously maximising the area under the ROC curve whilst minimising model complexity. This method is illustrated on a number of well-studied datasets and ap- plied to a recent bioinformatics database resulting from the current InChianti population study. Many classifiers produce “hard”, non-probabilistic classifications and are trained to find a single set of parameters, whose values are inevitably uncertain due to limited available training data. In a Bayesian framework it is possible to ameliorate the effects of this parameter uncertainty by averaging over classifiers weighted by their posterior probabil- ity. Unfortunately, the required posterior probability is not readily computed for hard classifiers. In this thesis an Approximate Bayesian Computation Markov Chain Monte Carlo algorithm is used to sample model parameters for a hard classifier using the AUC as a measure of performance. The ability to produce ROC curves close to the Bayes op- timal ROC curve is demonstrated on a synthetic dataset. Due to the large numbers of sampled parametrisations, averaging over them when rapid classification is needed may be impractical and thus methods for producing sparse weightings are investigated.

Page generated in 0.0366 seconds