• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 42
  • 22
  • 10
  • 9
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 118
  • 118
  • 30
  • 23
  • 20
  • 19
  • 17
  • 15
  • 14
  • 13
  • 13
  • 12
  • 12
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Prediction Performance of Survival Models

Yuan, Yan January 2008 (has links)
Statistical models are often used for the prediction of future random variables. There are two types of prediction, point prediction and probabilistic prediction. The prediction accuracy is quantified by performance measures, which are typically based on loss functions. We study the estimators of these performance measures, the prediction error and performance scores, for point and probabilistic predictors, respectively. The focus of this thesis is to assess the prediction performance of survival models that analyze censored survival times. To accommodate censoring, we extend the inverse probability censoring weighting (IPCW) method, thus arbitrary loss functions can be handled. We also develop confidence interval procedures for these performance measures. We compare model-based, apparent loss based and cross-validation estimators of prediction error under model misspecification and variable selection, for absolute relative error loss (in chapter 3) and misclassification error loss (in chapter 4). Simulation results indicate that cross-validation procedures typically produce reliable point estimates and confidence intervals, whereas model-based estimates are often sensitive to model misspecification. The methods are illustrated for two medical contexts in chapter 5. The apparent loss based and cross-validation estimators of performance scores for probabilistic predictor are discussed and illustrated with an example in chapter 6. We also make connections for performance.
22

New Non-Parametric Confidence Interval for the Youden

Zhou, Haochuan 18 July 2008 (has links)
Youden index, a main summary index for the Receiver Operating Characteristic (ROC) curve, is a comprehensive measurement for the effectiveness of a diagnostic test. For a continuous-scale diagnostic test, the optimal cut-point for the positive of disease is the cut-point leading to the maximization of the sum of sensitivity and specificity. Finding the Youden index of the test is equivalent to maximize the sum of sensitivity and specificity for all the possible values of the cut-point. In this thesis, we propose a new non-parametric confidence interval for the Youden index. Extensive simulation studies are conducted to compare the relative performance of the new interval with the existing intervals for the index. Our simulation results indicate that the newly developed non-parametric method performs as well as the existing parametric method but it has better finite sample performance than the existing non-parametric methods. The new method is flexible and easy to implement in practice. A real example is also used to illustrate the application of the proposed interval.
23

Statistical Evaluation of Continuous-Scale Diagnostic Tests with Missing Data

Wang, Binhuan 12 June 2012 (has links)
The receiver operating characteristic (ROC) curve methodology is the statistical methodology for assessment of the accuracy of diagnostics tests or bio-markers. Currently most widely used statistical methods for the inferences of ROC curves are complete-data based parametric, semi-parametric or nonparametric methods. However, these methods cannot be used in diagnostic applications with missing data. In practical situations, missing diagnostic data occur more commonly due to various reasons such as medical tests being too expensive, too time consuming or too invasive. This dissertation aims to develop new nonparametric statistical methods for evaluating the accuracy of diagnostic tests or biomarkers in the presence of missing data. Specifically, novel nonparametric statistical methods will be developed with different types of missing data for (i) the inference of the area under the ROC curve (AUC, which is a summary index for the diagnostic accuracy of the test) and (ii) the joint inference of the sensitivity and the specificity of a continuous-scale diagnostic test. In this dissertation, we will provide a general framework that combines the empirical likelihood and general estimation equations with nuisance parameters for the joint inferences of sensitivity and specificity with missing diagnostic data. The proposed methods will have sound theoretical properties. The theoretical development is challenging because the proposed profile log-empirical likelihood ratio statistics are not the standard sum of independent random variables. The new methods have the power of likelihood based approaches and jackknife method in ROC studies. Therefore, they are expected to be more robust, more accurate and less computationally intensive than existing methods in the evaluation of competing diagnostic tests.
24

Prediction Performance of Survival Models

Yuan, Yan January 2008 (has links)
Statistical models are often used for the prediction of future random variables. There are two types of prediction, point prediction and probabilistic prediction. The prediction accuracy is quantified by performance measures, which are typically based on loss functions. We study the estimators of these performance measures, the prediction error and performance scores, for point and probabilistic predictors, respectively. The focus of this thesis is to assess the prediction performance of survival models that analyze censored survival times. To accommodate censoring, we extend the inverse probability censoring weighting (IPCW) method, thus arbitrary loss functions can be handled. We also develop confidence interval procedures for these performance measures. We compare model-based, apparent loss based and cross-validation estimators of prediction error under model misspecification and variable selection, for absolute relative error loss (in chapter 3) and misclassification error loss (in chapter 4). Simulation results indicate that cross-validation procedures typically produce reliable point estimates and confidence intervals, whereas model-based estimates are often sensitive to model misspecification. The methods are illustrated for two medical contexts in chapter 5. The apparent loss based and cross-validation estimators of performance scores for probabilistic predictor are discussed and illustrated with an example in chapter 6. We also make connections for performance.
25

Statistical Geocomputing: Spatial Outlier Detection in Precision Agriculture

Chu Su, Peter 29 September 2011 (has links)
The collection of crop yield data has become much easier with the introduction of technologies such as the Global Positioning System (GPS), ground-based yield sensors, and Geographic Information Systems (GIS). This explosive growth and widespread use of spatial data has challenged the ability to derive useful spatial knowledge. In addition, outlier detection as one important pre-processing step remains a challenge because the technique and the definition of spatial neighbourhood remain non-trivial, and the quantitative assessments of false positives, false negatives, and the concept of region outlier remain unexplored. The overall aim of this study is to evaluate different spatial outlier detection techniques in terms of their accuracy and computational efficiency, and examine the performance of these outlier removal techniques in a site-specific management context. In a simulation study, unconditional sequential Gaussian simulation is performed to generate crop yield as the response variable along with two explanatory variables. Point and region spatial outliers are added to the simulated datasets by randomly selecting observations and adding or subtracting a Gaussian error term. With simulated data which contains known spatial outliers in advance, the assessment of spatial outlier techniques can be conducted as a binary classification exercise, treating each spatial outlier detection technique as a classifier. Algorithm performance is evaluated with the area and partial area under the ROC curve up to different true positive and false positive rates. Outlier effects in on-farm research are assessed in terms of the influence of each spatial outlier technique on coefficient estimates from a spatial regression model that accounts for autocorrelation. Results indicate that for point outliers, spatial outlier techniques that account for spatial autocorrelation tend to be better than standard spatial outlier techniques in terms of higher sensitivity, lower false positive detection rate, and consistency in performance. They are also more resistant to changes in the neighbourhood definition. In terms of region outliers, standard techniques tend to be better than spatial autocorrelation techniques in all performance aspects because they are less affected by masking and swamping effects. In particular, one spatial autocorrelation technique, Averaged Difference, is superior to all other techniques in terms of both point and region outlier scenario because of its ability to incorporate spatial autocorrelation while at the same time, revealing the variation between nearest neighbours. In terms of decision-making, all algorithms led to slightly different coefficient estimates, and therefore, may result in distinct decisions for site-specific management. The results outlined here will allow an improved removal of crop yield data points that are potentially problematic. What has been determined here is the recommendation of using Averaged Difference algorithm for cleaning spatial outliers in yield dataset. Identifying the optimal nearest neighbour parameter for the neighbourhood aggregation function is still non-trivial. The recommendation is to specify a large number of nearest neighbours, large enough to capture the region size. Lastly, the unbiased coefficient estimates obtained with Average Difference suggest it is the better method for pre-processing spatial outliers in crop yield data, which underlines its suitability for detecting spatial outlier in the context of on-farm research.
26

Statistical Methods In Credit Rating

Sezgin, Ozge 01 September 2006 (has links) (PDF)
Credit risk is one of the major risks banks and financial institutions are faced with. With the New Basel Capital Accord, banks and financial institutions have the opportunity to improve their risk management process by using Internal Rating Based (IRB) approach. In this thesis, we focused on the internal credit rating process. First, a short overview of credit scoring techniques and validation techniques was given. By using real data set obtained from a Turkish bank about manufacturing firms, default prediction logistic regression, probit regression, discriminant analysis and classification and regression trees models were built. To improve the performances of the models the optimum sample for logistic regression was selected from the data set and taken as the model construction sample. In addition, also an information on how to convert continuous variables to ordered scaled variables to avoid difference in scale problem was given. After the models were built the performances of models for whole data set including both in sample and out of sample were evaluated with validation techniques suggested by Basel Committee. In most cases classification and regression trees model dominates the other techniques. After credit scoring models were constructed and evaluated, cut-off values used to map probability of default obtained from logistic regression to rating classes were determined with dual objective optimization. The cut-off values that gave the maximum area under ROC curve and minimum mean square error of regression tree was taken as the optimum threshold after 1000 simulation. Keywords: Credit Rating, Classification and Regression Trees, ROC curve, Pietra Index
27

Empirical Likelihood-Based NonParametric Inference for the Difference between Two Partial AUCS

Yuan, Yan 02 August 2007 (has links)
Compare the accuracy of two continuous-scale tests is increasing important when a new test is developed. The traditional approach that compares the entire areas under two Receiver Operating Characteristic (ROC) curves is not sensitive when two ROC curves cross each other. A better approach to compare the accuracy of two diagnostic tests is to compare the areas under two ROC curves (AUCs) in the interested specificity interval. In this thesis, we have proposed bootstrap and empirical likelihood (EL) approach for inference of the difference between two partial AUCs. The empirical likelihood ratio for the difference between two partial AUCs is defined and its limiting distribution is shown to be a scaled chi-square distribution. The EL based confidence intervals for the difference between two partial AUCs are obtained. Additionally we have conducted simulation studies to compare four proposed EL and bootstrap based intervals.
28

Discrimination of High Risk and Low Risk Populations for the Treatment of STDs

Zhao, Hui 05 August 2011 (has links)
It is an important step in clinical practice to discriminate real diseased patients from healthy persons. It would be great to get such discrimination from some common information like personal information, life style, and the contact with diseased patient. In this study, a score is calculated for each patient based on a survey through generalized linear model, and then the diseased status is decided according to previous sexually transmitted diseases (STDs) records. This study will facilitate clinics in grouping patients into real diseased or healthy, which in turn will affect the method clinics take to screen patients: complete screening for possible diseased patient and some common screening for potentially healthy persons.
29

Estimation of the Optimal Threshold Using Kernel Estimate and ROC Curve Approaches

Zhu, Zi 23 May 2011 (has links)
Credit Line Analysis plays a very important role in the housing market, especially with the situation of large number of frozen loans during the current financial crisis. In this thesis, we apply the methods of kernel estimate and the Receiver Operating Characteristic (ROC) curve in the credit loan application process in order to help banks select the optimal threshold to differentiate good customers from bad customers. Better choice of the threshold is essential for banks to prevent loss and maximize profit from loans. One of the main advantages of our study is that the method does not require us to specify the distribution of the latent risk score. We apply bootstrap method to construct the confidence interval for the estimate.
30

A Comparison of Anthropometric Measures for Classification of Metabolic Syndrome and Cardiometabolic Risk Factors, NHANES 2007-2010

Heath, John 12 August 2014 (has links)
BACKGROUND: Type 2 diabetes and cardiovascular disease (CVD) are among the leading causes of death in the United States. The Metabolic Syndrome, which comprises a cluster of cardiometabolic risk factors, puts individuals at increased risk for these diseases. It is therefore important that people with Metabolic Syndrome, at high risk for CVD and type 2 diabetes, are identified and treated. Since it may not often be practical to obtain the laboratory measures necessary for diagnosing the Metabolic Syndrome, simple anthropometric measures are a useful way of quickly identifying individuals at increased risk for the Metabolic Syndrome. OBJECTIVE: The purpose of this thesis is to evaluate the utility of three of the most commonly used anthropometric measures – Body Mass Index (BMI), Waist Circumference (WC), and Waist-to-Height Ratio (WC) – for classifying individuals with and without the Metabolic Syndrome and its component risk factors in the United States. Using Receiver Operating Characteristic (ROC) curve analysis and Area Under the Curve (AUC) statistics, this thesis will assess the utility of each body measurement and compare it to BMI. METHODS: A large, multi-ethnic, nationally representative sample from the National Health and Nutrition Examination Survey (NHANES) 2007-2010 was used for this analysis. The study sample was restricted to adults aged 20-65 with complete information on height, weight, waist circumference, blood pressure, HDL cholesterol, fasting glucose, and triglycerides (n=3,769). In order to compare the utility of different anthropometric measures for classification, weighted ROC curves were constructed for each anthropometric measure-outcome combination and AUC statistics were compared. AUC statistics were calculated by approximating the definite integral of the ROC curves with the trapezoidal rule. Variances for AUC statistics and differences in AUC statistics were estimated with jackknife repeated replication. Analyses were completed for the entire sample and separately for non-Hispanic whites, non-Hispanic blacks, and Mexican Americans. RESULTS: For the entire sample, WC (AUC=0.752) did a better job than BMI (AUC=0.728) at classifying individuals with and without the Metabolic Syndrome (p CONCLUSION: Waist circumference should be considered, especially over BMI, for risk stratification in clinical settings and research. Further research should attempt to identify optimum waist circumference cut points for use in the US population.

Page generated in 0.037 seconds