Spelling suggestions: "subject:"receiver operating characteristics"" "subject:"deceiver operating characteristics""
1 |
Application Of Support Vector Machines And Neural Networks In Digital Mammography: A Comparative StudyCandade, Nivedita V 28 October 2004 (has links)
Microcalcification (MC) detection is an important component of breast cancer diagnosis. However, visual analysis of mammograms is a difficult task for radiologists. Computer Aided Diagnosis (CAD) technology helps in identifying lesions and assists the radiologist in making his final decision.
This work is a part of a CAD project carried out at the Imaging Science Research Division (ISRD), Digital Medical Imaging Program, Moffitt Cancer Research Center, Tampa, FL. A CAD system had been previously developed to perform the following tasks: (a) pre-processing, (b) segmentation and (c) feature extraction of mammogram images. Ten features covering spatial, and morphological domains were extracted from the mammograms and the samples were classified as Microcalcification (MC) or False alarm (False Positive microcalcification/ FP) based on a binary truth file obtained from a radiologist's initial investigation.
The main focus of this work was two-fold: (a) to analyze these features, select the most significant features among them and study their impact on classification accuracy and (b) to implement and compare two machine-learning algorithms, Neural Networks (NNs) and Support Vector Machines (SVMs) and evaluate their performances with these features.
The NN was based on the Standard Back Propagation (SBP) algorithm. The SVM was implemented using polynomial, linear and Radial Basis Function (RBF) kernels. A detailed statistical analysis of the input features was performed. Feature selection was done using Stepwise Forward Selection (SFS) method. Training and testing of the classifiers was carried out using various training methods. Classifier evaluation was first performed with all the ten features in the model. Subsequently, only the features from SFS were used in the model to study their effect on classifier performance. Accuracy assessment was done to evaluate classifier performance.
Detailed statistical analysis showed that the given dataset showed poor discrimination between classes and proved a very difficult pattern recognition problem. The SVM performed better than the NN in most cases, especially on unseen data. No significant improvement in classifier performance was noted with feature selection. However, with SFS, the NN showed improved performance on unseen data. The training time taken by the SVM was several magnitudes less than the NN. Classifiers were compared on the basis of their accuracy and parameters like sensitivity and specificity. Free Receiver Operating Curves (FROCs) were used for evaluation of classifier performance.
The highest accuracy observed was about 93% on training data and 76% for testing data with the SVM using Leave One Out (LOO) Cross Validation (CV) training. Sensitivity was 81% and 46% on training and testing data respectively for a threshold of 0.7. The NN trained using the 'single test' method showed the highest accuracy of 86% on training data and 70% on testing data with respective sensitivity of 84% and 50%. Threshold in this case was -0.2. However, FROC analyses showed overall superiority of SVM especially on unseen data.
Both spatial and morphological domain features were significant in our model. Features were selected based on their significance in the model. However, when tested with the NN and SVM, this feature selection procedure did not show significant improvement in classifier performance. It was interesting to note that the model with interactions between these selected variables showed excellent testing sensitivity with the NN classifier (about 81%).
Recent research has shown SVMs outperform NNs in classification tasks. SVMs show distinct advantages such as better generalization, increased speed of learning, ability to find a global optimum and ability to deal with linearly non-separable data. Thus, though NNs are more widely known and used, SVMs are expected to gain popularity in practical applications. Our findings show that the SVM outperforms the NN. However, its performance depends largely on the nature of data used.
|
2 |
Computer-aided diagnosis for mammographic microcalcification clusters [electronic resource] / by Mugdha Tembey.Tembey, Mugdha. January 2003 (has links)
Title from PDF of title page. / Document formatted into pages; contains 112 pages. / Thesis (M.S.C.S.)--University of South Florida, 2003. / Includes bibliographical references. / Text (Electronic thesis) in PDF format. / ABSTRACT: Breast cancer is the second leading cause of cancer deaths among women in the United States and microcalcifications clusters are one of the most important indicators of breast disease. Computer methodologies help in the detection and differentiation between benign and malignant lesions and have the potential to improve radiologists' performance and breast cancer diagnosis significantly. A Computer-Aided Diagnosis (CAD-Dx) algorithm has been previously developed to assist radiologists in the diagnosis of mammographic clusters of calcifications with the modules: (a) detection of all calcification-like areas, (b) false-positive reduction and segmentation of the detected calcifications, (c) selection of morphological and distributional features and (d) classification of the clusters. Classification was based on an artificial neural network (ANN) with 14 input features and assigned a likelihood of malignancy to each cluster. / ABSTRACT: The purpose of this work was threefold: (a) optimize the existing algorithm and test on a large database, (b) rank classification features and select the best feature set, and (c) determine the impact of single and two-view feature estimation on classification and feature ranking. Classification performance was evaluated with the NevProp4 artificial neural network trained with the leave-one-out resampling technique. Sequential forward selection was used for feature selection and ranking. Mammograms from 136 patients, containing single or two views of a breast with calcification cluster were digitized at 60 microns and 16 bits per pixel. 260 regions of interest (ROI's) centered on calcification cluster were defined to build the single-view dataset. 100 of the 136 patients had a two-view mammogram which yielded 202 ROI's that formed the two-view dataset. Classification and feature selection were evaluated with both these datasets. / ABSTRACT: To decide on the optimal features for two-view feature estimation several combinations of CC and MLO view features were attempted. On the single-view dataset the classifier achieved an AZ =0.8891 with 88% sensitivity and 77% specificity at an operating point of 0.4; 12 features were selected as the most important. With the two-view dataset, the classifier achieved a higher performance with an AZ =0.9580 and sensitivity and specificity of 98% and 80% respectively at an operating point of 0.4; 10 features were selected as the most important. / System requirements: World Wide Web browser and PDF reader. / Mode of access: World Wide Web.
|
3 |
Bankruptcy prediction models in the Czech economy: New specification using Bayesian model averaging and logistic regression on the latest data / Bankruptcy prediction models in the Czech economy: New specification using Bayesian model averaging and logistic regression on the latest dataKolísko, Jiří January 2017 (has links)
The main objective of our research was to develop a new bankruptcy prediction model for the Czech economy. For that purpose we used the logistic regression and 150,000 financial statements collected for the 2002-2016 period. We defined 41 explanatory variables (25 financial ratios and 16 dummy variables) and used Bayesian model averaging to select the best set of explanatory variables. The resulting model has been estimated for three prediction horizons: one, two, and three years before bankruptcy, so that we could assess the changes in the importance of explanatory variables and models' prediction accuracy. To deal with high skew in our dataset due to small number of bankrupt firms, we applied over- and under- sampling methods on the train sample (80% of data). These methods proved to enhance our classifier's accuracy for all specifications and periods. The accuracy of our models has been evaluated by Receiver operating characteristics curves, Sensitivity-Specificity curves, and Precision-Recall curves. In comparison with models examined on similar data, our model performed very well. In addition, we have selected the most powerful predictors for short- and long-term horizons, which is potentially of high relevance for practice. JEL Classification C11, C51, C53, G33, M21 Keywords Bankruptcy...
|
4 |
Computer-Aided Diagnosis for Mammographic Microcalcification ClustersTembey, Mugdha 07 November 2003 (has links)
Breast cancer is the second leading cause of cancer deaths among women in the United States and microcalcifications clusters are one of the most important indicators of breast disease. Computer methodologies help in the detection and differentiation between benign and malignant lesions and have the potential to improve radiologists' performance and breast cancer diagnosis significantly.
A Computer-Aided Diagnosis (CAD-Dx) algorithm has been previously developed to assist radiologists in the diagnosis of mammographic clusters of calcifications with the modules: (a) detection of all calcification-like areas, (b) false-positive reduction and segmentation of the detected calcifications, (c) selection of morphological and distributional features and (d) classification of the clusters. Classification was based on an artificial neural network (ANN) with 14 input features and assigned a likelihood of malignancy to each cluster. The purpose of this work was threefold: (a) optimize the existing algorithm and test on a large database, (b) rank classification features and select the best feature set, and (c) determine the impact of single and two-view feature estimation on classification and feature ranking. Classification performance was evaluated with the NevProp4 artificial neural network trained with the leave-one-out resampling technique. Sequential forward selection was used for feature selection and ranking.
Mammograms from 136 patients, containing single or two views of a breast with calcification cluster were digitized at 60 microns and 16 bits per pixel. 260 regions of interest (ROI's) centered on calcification cluster were defined to build the single-view dataset. 100 of the 136 patients had a two-view mammogram which yielded 202 ROI's that formed the two-view dataset. Classification and feature selection were evaluated with both these datasets. To decide on the optimal features for two-view feature estimation several combinations of CC and MLO view features were attempted.
On the single-view dataset the classifier achieved an AZ =0.8891 with 88% sensitivity and 77% specificity at an operating point of 0.4; 12 features were selected as the most important. With the two-view dataset, the classifier achieved a higher performance with an AZ =0.9580 and sensitivity and specificity of 98% and 80% respectively at an operating point of 0.4; 10 features were selected as the most important.
|
5 |
Computational Intelligence Based Classifier Fusion Models for Biomedical Classification ApplicationsChen, Xiujuan 27 November 2007 (has links)
The generalization abilities of machine learning algorithms often depend on the algorithms’ initialization, parameter settings, training sets, or feature selections. For instance, SVM classifier performance largely relies on whether the selected kernel functions are suitable for real application data. To enhance the performance of individual classifiers, this dissertation proposes classifier fusion models using computational intelligence knowledge to combine different classifiers. The first fusion model called T1FFSVM combines multiple SVM classifiers through constructing a fuzzy logic system. T1FFSVM can be improved by tuning the fuzzy membership functions of linguistic variables using genetic algorithms. The improved model is called GFFSVM. To better handle uncertainties existing in fuzzy MFs and in classification data, T1FFSVM can also be improved by applying type-2 fuzzy logic to construct a type-2 fuzzy classifier fusion model (T2FFSVM). T1FFSVM, GFFSVM, and T2FFSVM use accuracy as a classifier performance measure. AUC (the area under an ROC curve) is proved to be a better classifier performance metric. As a comparison study, AUC-based classifier fusion models are also proposed in the dissertation. The experiments on biomedical datasets demonstrate promising performance of the proposed classifier fusion models comparing with the individual composing classifiers. The proposed classifier fusion models also demonstrate better performance than many existing classifier fusion methods. The dissertation also studies one interesting phenomena in biology domain using machine learning and classifier fusion methods. That is, how protein structures and sequences are related each other. The experiments show that protein segments with similar structures also share similar sequences, which add new insights into the existing knowledge on the relation between protein sequences and structures: similar sequences share high structure similarity, but similar structures may not share high sequence similarity.
|
6 |
Three Stage Level Set Segmentation of Mass Core, Periphery, and Spiculations for Automated Image Analysis of Digital MammogramsBall, John E 05 May 2007 (has links)
In this dissertation, level set methods are employed to segment masses in digital mammographic images and to classify land cover classes in hyperspectral data. For the mammography computer aided diagnosis (CAD) application, level set-based segmentation methods are designed and validated for mass periphery segmentation, spiculation segmentation, and core segmentation. The proposed periphery segmentation uses the narrowband level set method in conjunction with an adaptive speed function based on a measure of the boundary complexity in the polar domain. The boundary complexity term is shown to be beneficial for delineating challenging masses with ill-defined and irregularly shaped borders. The proposed method is shown to outperform periphery segmentation methods currently reported in the literature. The proposed mass spiculation segmentation uses a generalized form of the Dixon and Taylor Line Operator along with narrowband level sets using a customized speed function. The resulting spiculation features are shown to be very beneficial for classifying the mass as benign or malignant. For example, when using patient age and texture features combined with a maximum likelihood (ML) classifier, the spiculation segmentation method increases the overall accuracy to 92% with 2 false negatives as compared to 87% with 4 false negatives when using periphery segmentation approaches. The proposed mass core segmentation uses the Chan-Vese level set method with a minimal variance criterion. The resulting core features are shown to be effective and comparable to periphery features, and are shown to reduce the number of false negatives in some cases. Most mammographic CAD systems use only a periphery segmentation, so those systems could potentially benefit from core features.
|
7 |
Atrial Fibrillation Detection Algorithm Evaluation and Implementation in Java / Utvärdering av algoritmer för detektion av förmaksflimmer samt implementation i JavaDizon, Lucas, Johansson, Martin January 2014 (has links)
Atrial fibrillation is a common heart arrhythmia which is characterized by a missing or irregular contraction of the atria. The disease is a risk factor for other more serious diseases and the total medical costs in society are extensive. Therefore it would be beneficial to improve and optimize the prevention and detection of the disease. Pulse palpation and heart auscultation can facilitate the detection of atrial fibrillation clinically, but the diagnosis is generally confirmed by an ECG examination. Today there are several algorithms that detect atrial fibrillation by analysing an ECG. A common method is to study the heart rate variability (HRV) and by different types of statistical calculations find episodes of atrial fibrillation which deviates from normal sinus rhythm. Two algorithms for detection of atrial fibrillation have been evaluated in Matlab. One is based on the coefficient of variation and the other uses a logistic regression model. Training and testing of the algorithms were done with data from the Physionet MIT database. Several steps of signal processing were used to remove different types of noise and artefacts before the data could be used. When testing the algorithms, the CV algorithm performed with a sensitivity of 91,38%, a specificity of 93,93% and accuracy of 92,92%, and the results of the logistic regression algorithm was a sensitivity of 97,23%, specificity of 93,79% and accuracy of 95,39%. The logistic regression algorithm performed better and was chosen for implementation in Java, where it achieved a sensitivity of 97,31%, specificity of 93,47% and accuracy of 95,25%. / Förmaksflimmer är en vanlig hjärtrytmrubbning som kännetecknas av en avsaknad eller oregelbunden kontraktion av förmaken. Sjukdomen är en riskfaktor för andra allvarligare sjukdomar och de totala kostnaderna för samhället är betydande. Det skulle därför vara fördelaktigt att effektivisera och förbättra prevention samt diagnostisering av förmaksflimmer. Kliniskt diagnostiseras förmaksflimmer med hjälp av till exempel pulspalpation och auskultation av hjärtat, men diagnosen brukar fastställas med en EKG-undersökning. Det finns idag flertalet algoritmer för att detektera arytmin genom att analysera ett EKG. En av de vanligaste metoderna är att undersöka variabiliteten av hjärtrytmen (HRV) och utföra olika sorters statistiska beräkningar som kan upptäcka episoder av förmaksflimmer som avviker från en normal sinusrytm. I detta projekt har två metoder för att detektera förmaksflimmer utvärderats i Matlab, en baseras på beräkningar av variationskoefficienten och den andra använder sig av logistisk regression. EKG som kommer från databasen Physionet MIT används för att träna och testa modeller av algoritmerna. Innan EKG-signalen kan användas måste den behandlas för att ta bort olika typer av brus och artefakter. Vid test av algoritmen med variationskoefficienten blev resultatet en sensitivitet på 91,38%, en specificitet på 93,93% och en noggrannhet på 92,92%. För logistisk regression blev sensitiviteten 97,23%, specificiteten 93,79% och noggrannheten 95,39%. Algoritmen med logistisk regression presterade bättre och valdes därför för att implementeras i Java, där uppnåddes en sensitivitet på 91,31%, en specificitet på 93,47% och en noggrannhet på 95,25%.
|
Page generated in 0.1519 seconds