1 |
Improving Multiclass Text Classification with the Support Vector MachineRennie, Jason D. M., Rifkin, Ryan 16 October 2001 (has links)
We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties.
|
2 |
Application of Machine Learning and Statistical Learning Methods for Prediction in a Large-Scale Vegetation MapBrookey, Carla M. 01 December 2017 (has links)
Original analyses of a large vegetation cover dataset from Roosevelt National Forest in northern Colorado were carried out by Blackard (1998) and Blackard and Dean (1998; 2000). They compared the classification accuracies of linear and quadratic discriminant analysis (LDA and QDA) with artificial neural networks (ANN) and obtained an overall classification accuracy of 70.58% for a tuned ANN compared to 58.38% for LDA and 52.76% for QDA. Because there has been tremendous development of machine learning classification methods over the last 35 years in both computer science and statistics, as well as substantial improvements in the speed of computer hardware, I applied five modern machine learning algorithms to the data to determine whether significant improvements in the classification accuracy were possible using one or more of these methods. I found that only a tuned gradient boosting machine had a higher accuracy (71.62%) that the ANN of Blackard and Dean (1998), and the difference in accuracies was only about 1%. Of the other four methods, Random Forests (RF), Support Vector Machines (SVM), Classification Trees (CT), and adaboosted trees (ADA), a tuned SVM and RF had accuracies of 67.17% and 67.57%, respectively. The partition of the data by Blackard and Dean (1998) was unusual in that the training and validation datasets had equal representation of the seven vegetation classes, even though 85% of the data fell into classes 1 and 2. For the second part of my analyses I randomly selected 60% of the data for the training data and 20% for each of the validation data and test data. On this partition of the data a single classification tree achieved an accuracy of 92.63% on the test data and the accuracy of RF is 83.98%. Unsurprisingly, most of the gains in accuracy were in classes 1 and 2, the largest classes which also had the highest misclassification rates under the original partition of the data. By decreasing the size of the training data but maintaining the same relative occurrences of the vegetation classes as in the full dataset I found that even for a training dataset of the same size as that of Blackard and Dean (1998) a single classification tree was more accurate (73.80%) that the ANN of Blackard and Dean (1998) (70.58%). The final part of my thesis was to explore the possibility that combining several of the machine learning classifiers predictions could result in higher predictive accuracies. In the analyses I carried out, the answer seems to be that increased accuracies do not occur with a simple voting of five machine learning classifiers.
|
3 |
Type-1 and singleton fuzzy logic system trained by a fast scaled conjugate gradient methods for dealing with classification problemsAmaral, Renan Piazzaroli Finotti 01 September 2017 (has links)
Submitted by Geandra Rodrigues (geandrar@gmail.com) on 2018-01-09T13:48:15Z
No. of bitstreams: 1
renanpiazzarolifinottiamaral.pdf: 1172046 bytes, checksum: eb7bf10c813d64fbddcc572d39aecfc5 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2018-01-22T16:10:30Z (GMT) No. of bitstreams: 1
renanpiazzarolifinottiamaral.pdf: 1172046 bytes, checksum: eb7bf10c813d64fbddcc572d39aecfc5 (MD5) / Made available in DSpace on 2018-01-22T16:10:30Z (GMT). No. of bitstreams: 1
renanpiazzarolifinottiamaral.pdf: 1172046 bytes, checksum: eb7bf10c813d64fbddcc572d39aecfc5 (MD5)
Previous issue date: 2017-09-01 / - / This thesis presents and discusses improvements in the type-1 and singleton fuzzy logic system for dealing with classification problems. Two training methods are addressed, the scaled conjugate gradient, which uses the second order information approximating the multiplication of the Hessian matrix H by the directional vector v (i.e. Hv), and the same method using the differential operator R {.} to compute the exact value of Hv. Also, in order to adapt the fuzzy model to handle multiclass classification problems, it is developed a novel fuzzy model with a vector as output. All proposals are tested through the performance metrics analysis based on data sets provided by UCI Machine Learning Repository. The reported results show the high convergence speed and better classification rates of the proposed training methods than others presented in the literature. Additionally, the novel fuzzy model has a significant reduction in computational and classifier complexity, especially when the number of classes in classification problem increases.
|
4 |
Comparing LSTM and GRU for Multiclass Sentiment Analysis of Movie Reviews.Sarika, Pawan Kumar January 2020 (has links)
Today, we are living in a data-driven world. Due to a surge in data generation, there is a need for efficient and accurate techniques to analyze data. One such kind of data which is needed to be analyzed are text reviews given for movies. Rather than classifying the reviews as positive or negative, we will classify the sentiment of the reviews on the scale of one to ten. In doing so, we will compare two recurrent neural network algorithms Long short term memory(LSTM) and Gated recurrent unit(GRU). The main objective of this study is to compare the accuracies of LSTM and GRU models. For training models, we collected data from two different sources. For filtering data, we used porter stemming and stop words. We coupled LSTM and GRU with the convolutional neural networks to increase the performance. After conducting experiments, we have observed that LSTM performed better in predicting border values. Whereas, GRU predicted every class equally. Overall GRU was able to predict multiclass text data of movie reviews slightly better than LSTM. GRU was computationally expansive when compared to LSTM.
|
5 |
Analysis of Emergency Medical Transport Datasets using Machine Learning / Analys av ambulanstransport medelst maskininlärningLetzner, Josefine January 2017 (has links)
The selection of hospital once an ambulance has picked up its patient is today decided by the ambulance staff. This report describes a supervised machinelearning approach for predicting hospital selection. This is a multi-classclassification problem. The performance of random forest, logistic regression and neural network were compared to each other and to a baseline, namely the one rule-algorithm. The algorithms were applied to real world data from SOS-alarm, the company that operate Sweden’s emergency call services. Performance was measured with accuracy and f1-score. Random Forest got the best result followed by neural network. Logistic regression exhibited slightly inferior results but still performed far better than the baseline. The results point toward machine learning being a suitable method for learning the problem of hospital selection. / Beslutet om till vilket sjukhus en ambulans ska köra patienten till bestäms idag av ambulanspersonalen. Den här rapporten beskriver användandet av övervakad maskininlärning för att förutsåga detta beslut. Resultaten från algoritmerna slumpmässig skog, logistisk regression och neurala nätvärk jämförs med varanda och mot ett basvärde. Basvärdet erhölls med algorithmen en-regel. Algoritmerna applicerades på verklig data från SOS-alarm, Sveriges operatör för larmsamtal. Resultaten mättes med noggrannhet och f1-poäng. Slumpmässigskog visade bäst resultat följt av neurala nätverk. Logistisk regression uppvisade något sämre resultat men var fortfarande betydligt bättre än basvärdet. Resultaten pekar mot att det är lämpligt att använda maskininlärning för att lära sig att ta beslut om val av sjukhus.
|
6 |
Explanation Methods for a Medical Image Classifier by Analysis of its UncertaintyGupta, Sanskar January 2022 (has links)
Over the last decade, neural networks have reached almost every field of science and technology. They have become a crucial part of various real-world applications, such as medical imaging. Still, their deployment in safety-critical applications remains limited owing to their inability to provide reliable uncertainty estimates and frequently occurring overconfident predictions, which is normally the case in modern neural networks possessing a substantial number of layers. In this thesis, we leverage the capability of data mining algorithms like density clustering to explain the behavior of a medical image classifier responsible for classifying white blood cells. We know that any clustering algorithm acts on the feature vector of the input data and annotates the data into different clusters as per the features. In this work, we lay down and prove the hypothesis that the output discrete probability matrix of a multi-class classification problem can be used as a feature vector where the confidence value of every class can be considered as a degree of resemblance with that class. Before implementing clustering, one needs to make sure that these confidence values represent actual probabilities so that they can be used as features; hence certain calibration techniques were incorporated to improve the calibration of the network first. Having a better calibrated medical classifier, density clustering was implemented, which generated results that provided solid arguments to justify the behavior of the network. As far as the use case of this method is concerned, it was observed that we could identify pathologies like myelodysplastic syndromes, acute lymphocytic leukemia, and chronic myelomonocytic leukemia in a patient. This was possible due to the presence of the same class of White blood cells in multiple clusters indicating the presence of subpopulations separated into healthy and pathological cells of the same class depending upon the pathology that needs to be detected. This was proved visually by mapping cluster points to actual cell images and quantitatively as well by using entropy as a method of quantifying uncertainty. This method showed that there is a lot of information embedded in the output probability matrix. Hence one can employ various data mining techniques to extract more information and not just limit themselves to misclassifications and confusion matrices. / Under det senaste decenniet har neurala nätverk nått nästan alla områden inom vetenskap och teknik. De har blivit en avgörande del av olika verkliga tillämpningar, såsom medicinsk bildbehandling. Ändå förblir deras användning i säkerhetskritiska applikationer begränsad på grund av deras oförmåga att tillhandahålla tillförlitliga osäkerhetsuppskattningar och ofta förekommande övermodiga förutsägelser, vilket normalt är fallet i moderna neurala nätverk som har ett stort antal lager. I den här avhandlingen utnyttjar vi förmågan hos datautvinningsalgoritmer som densitetsklustring för att förklara beteendet hos en medicinsk bildklassificerare som är ansvarig för att klassificera vita blodkroppar. Vi vet att alla klustringsalgoritmer verkar på funktionsvektorn för indata och annoterar data i olika kluster enligt funktionerna. I detta arbete lägger vi ner och bevisar hypotesen att den utgående diskreta sannolikhetsmatrisen för ett klassificeringsproblem med flera klasser kan användas som en egenskapsvektor där konfidensvärdet för varje klass kan betraktas som en grad av likhet med den klassen. Innan man implementerar klustring måste man se till att dessa konfidensvärden representerar faktiska sannolikheter så att de kan användas som funktioner; därför införlivades vissa kalibreringstekniker för att först förbättra kalibreringen av nätverket. Med en bättre kalibrerad medicinsk klassificerare implementerades densitetsklustring, vilket genererade resultat som gav solida argument för att motivera nätverkets beteende. När det gäller användningsfallet för denna metod, observerades det att vi kunde identifiera patologier som myelodysplastiska syndrom, akut lymfatisk leukemi och kronisk myelomonocytisk leukemi hos en patient. Detta var möjligt på grund av närvaron av samma klass av vita blodkroppar i flera kluster, vilket indikerar närvaron av subpopulationer separerade i friska och patologiska celler av samma klass beroende på vilken patologi som behöver detekteras. Detta bevisades visuellt genom att kartlägga klusterpunkter till faktiska cellbilder och kvantitativt också genom att använda entropi som en metod för att kvantifiera osäkerhet. Denna metod visade att det finns mycket information inbäddad i utmatningssannolikhetsmatrisen. Därför kan man använda olika datautvinningstekniker för att extrahera mer information och inte bara begränsa sig till felklassificeringar och förvirringsmatriser.
|
7 |
Geração automática de laudos médicos para o diagnóstico de epilepsia por meio do processamento de eletroencefalogramas utilizando aprendizado de máquina / Automatic Generation of Medical Reports for Epilepsy Diagnosis through Electroencephalogram Processing using Machine LearningOliva, Jefferson Tales 05 December 2018 (has links)
A epilepsia, cujas crises são resultantes de distúrbios elétricos temporários no cérebro, é a quarta enfermidade neurológica mais comum, atingindo aproximadamente 50 milhões de pessoas. Essa enfermidade pode ser diagnosticada por meio de eletroencefalogramas (EEG), que são de elevada importância para o diagnóstico de enfermidades cerebrais. As informações consideradas relevantes desses exames são descritas em laudos médicos, que são armazenados com o objetivo de manter o histórico clínico do paciente e auxiliar os especialistas da área médica na realização de procedimentos futuros, como a identificação de padrões de determinadas enfermidades. Entretanto, o crescente aumento no armazenamento de dados médicos inviabiliza a análise manual dos mesmos. Outra dificuldade para a análise de EEG é a variabilidade de opiniões de especialistas sobre um mesmo padrão observado, podendo aumentar a dificuldade para o diagnóstico de enfermidades cerebrais. Também, os exames de EEG podem conter padrões relevantes difíceis de serem observados, mesmo por profissionais experientes. Da mesma forma, nos laudos podem faltar informações e/ou conter erros de digitação devido aos mesmos serem preenchidos apressadamente por especialistas. Assim, neste trabalho foi desenvolvido o método computacional de geração de laudos médicos (automatic generation of medical report AutoGenMR), que tem o propósito de auxiliar especialistas da área médica no diagnóstico de epilepsia e em tomadas de decisão. Esse processo é aplicado em duas fases: (1) construção de classificadores por meio de métodos de aprendizado de máquina e (2) geração automática de laudos textuais. O AutoGenMR foi avaliado experimentalmente em dois estudos de caso, para os quais, em cada um foi utilizada uma base de EEG disponibilizada publicamente e gratuitamente. Nessas avaliações foram utilizadas as mesmas configurações experimentais para a extração de características e construção de classificadores (desconsiderando que um dos problemas de classificação é multiclasse e o outro, binário). No primeiro estudo de caso, os modelos preditivos geraram, em média, 89% das expressões de laudos. Na segunda avaliação experimental, em média, 76% das sentenças de laudos foram geradas corretamente. Desse modo, os resultados de ambos estudos são considerados promissores, constatando que o AutoGenMR pode auxiliar especialistas na identificação de padrões relacionados a eventos epiléticos, na geração de laudos textuais padronizados e em processos de tomadas de decisão. / Epilepsy, which seizures are due to temporary electrical disturbances in the brain, is the fourth most common neurological disorder, affecting 50 million people, approximately. This disease can be diagnosed by electroencephalograms (EEG), which have great importance for the diagnosis of brain diseases. The information considered relevant in these tests is described in textual reports, which are stored in order to maintain the patients medical history and assist medical experts in performing such other procedures as the standard identification of certain diseases. However, the increasing medical data storage makes it unfeasible for manual analysis. Another challenge for the EEG analysis is the diversity of expert opinions on particular patterns observed and may increase the difficulty in diagnosing diseases of the brain. Moreover, the EEG may contain patterns difficult to be noticed even by experienced professionals. Similarly, the reports may not have information and/or include typographical errors due to its rushed filling by experts. Thereby, in this work, the automatic generation of medical report (AutoGenMR) method was developed in order to assist medical experts in the diagnosis of epilepsy and decision making. This method is applied in two phases: (1) classifier building by machine learning techniques and (2) automatic report generation. The AutoGenMR was computed in two case studies, for which, a public and freely available EEG database was used in each one. In both studies, the same experimental settings for feature extraction and classifier building were used. In the first study case, the classifiers correctly generated, on average, 89% of the report expressions. In the second experiment, on average, 76% of the report sentences were successfully generated. In this sense, the results of both studies are considered promising, noting that the AutoGenMR can assist medical experts in the identification of patterns related to epileptic events, standardized textual report generation, and in decision-making processes.
|
8 |
Comparing Weak and Strong Annotation Strategies for Multiple Instance Learning in Digital Pathology / Jämförelse av svaga och starka annoteringsstrategier för flerinstansinlärning i digital patologiCiallella, Alice January 2022 (has links)
Prostate cancer is the second most diagnosed cancer worldwide and its diagnosis is done through visual inspection of biopsy tissue by a pathologist, who assigns a score used by doctors to decide on the treatment. However, the scoring system, the Gleason score, is affected by a high inter and intra-observer variability, lack of standardization, and overestimation. Therefore, there is a need for new solutions that can reduce these issues and provide a more accurate diagnosis. Nowadays, high-resolution digital images of biopsy tissues can be obtained and stored. The availability of such images, called Whole Slide Images (WSI) allows the implementation of Machine and Deep learning models to assist pathologists in diagnosing prostate cancer. Multiple-Instance Learning (MIL) has been shown to reach very promising results in digital pathology and binary classification of prostate cancer slides. However, such models require large datasets to ensure good performances. This project wants to investigate the use of small sets of strongly annotated images to create new large datasets to train a MIL model. To evaluate the performance of this approach, the standard dataset is used to obtain baselines for both binary and multiclass classification tasks. For multiclassification, the International Society of Urological Pathology (ISUP) score is used, which is derived from the Gleason score. The dataset used is the publicly available PANDA. In this project, only the slides from RadboudUniversity Medical Center are used, which consists of 5160 images. The MIL model chosen is the Clustering-constrained Attention Multiple instance learning (CLAM) model, which is publicly available. The standard approach reaches a Cohen’s kappa (κ) of 0.78 and 0.59 for binary and multiclass classification respectively. To evaluate the new approach, large datasets are created starting from different set sizes. Using 500 images, the model reaches a κ of 0.72 and 0.38 respectively. While for the binary the results of the two approaches are comparable, the new approach is not beneficial for multiclass classification tasks.
|
9 |
Interpretable Binary and Multiclass Prediction Models for Insolvencies and Credit RatingsObermann, Lennart 10 May 2016 (has links)
Insolvenzprognosen und Ratings sind wichtige Aufgaben der Finanzbranche und dienen der Kreditwürdigkeitsprüfung von Unternehmen. Eine Möglichkeit dieses Aufgabenfeld anzugehen, ist maschinelles Lernen. Dabei werden Vorhersagemodelle aufgrund von Beispieldaten aufgestellt. Methoden aus diesem Bereich sind aufgrund Ihrer Automatisierbarkeit vorteilhaft. Dies macht menschliche Expertise in den meisten Fällen überflüssig und bietet dadurch einen höheren Grad an Objektivität. Allerdings sind auch diese Ansätze nicht perfekt und können deshalb menschliche Expertise nicht gänzlich ersetzen. Sie bieten sich aber als Entscheidungshilfen an und können als solche von Experten genutzt werden, weshalb interpretierbare Modelle wünschenswert sind. Leider bieten nur wenige Lernalgorithmen interpretierbare Modelle. Darüber hinaus sind einige Aufgaben wie z.B. Rating häufig Mehrklassenprobleme. Mehrklassenklassifikationen werden häufig durch Meta-Algorithmen erreicht, welche mehrere binäre Algorithmen trainieren. Die meisten der üblicherweise verwendeten Meta-Algorithmen eliminieren jedoch eine gegebenenfalls vorhandene Interpretierbarkeit.
In dieser Dissertation untersuchen wir die Vorhersagegenauigkeit von interpretierbaren Modellen im Vergleich zu nicht interpretierbaren Modellen für Insolvenzprognosen und Ratings. Wir verwenden disjunktive Normalformen und Entscheidungsbäume mit Schwellwerten von Finanzkennzahlen als interpretierbare Modelle. Als nicht interpretierbare Modelle werden Random Forests, künstliche Neuronale Netze und Support Vector Machines verwendet. Darüber hinaus haben wir einen eigenen Lernalgorithmus Thresholder entwickelt, welcher disjunktive Normalformen und interpretierbare Mehrklassenmodelle generiert.
Für die Aufgabe der Insolvenzprognose zeigen wir, dass interpretierbare Modelle den nicht interpretierbaren Modellen nicht unterlegen sind. Dazu wird in einer ersten Fallstudie eine in der Praxis verwendete Datenbank mit Jahresabschlüssen von 5152 Unternehmen verwendet, um die Vorhersagegenauigkeit aller oben genannter Modelle zu messen.
In einer zweiten Fallstudie zur Vorhersage von Ratings demonstrieren wir, dass interpretierbare Modelle den nicht interpretierbaren Modellen sogar überlegen sind. Die Vorhersagegenauigkeit aller Modelle wird anhand von drei in der Praxis verwendeten Datensätzen bestimmt, welche jeweils drei Ratingklassen aufweisen.
In den Fallstudien vergleichen wir verschiedene interpretierbare Ansätze bezüglich deren Modellgrößen und der Form der Interpretierbarkeit. Wir präsentieren exemplarische Modelle, welche auf den entsprechenden Datensätzen basieren und bieten dafür Interpretationsansätze an.
Unsere Ergebnisse zeigen, dass interpretierbare, schwellwertbasierte Modelle den Klassifikationsproblemen in der Finanzbranche angemessen sind. In diesem Bereich sind sie komplexeren Modellen, wie z.B. den Support Vector Machines, nicht unterlegen. Unser Algorithmus Thresholder erzeugt die kleinsten Modelle während seine Vorhersagegenauigkeit vergleichbar mit den anderen interpretierbaren Modellen bleibt.
In unserer Fallstudie zu Rating liefern die interpretierbaren Modelle deutlich bessere Ergebnisse als bei der zur Insolvenzprognose (s. o.). Eine mögliche Erklärung dieser Ergebnisse bietet die Tatsache, dass Ratings im Gegensatz zu Insolvenzen menschengemacht sind. Das bedeutet, dass Ratings auf Entscheidungen von Menschen beruhen, welche in interpretierbaren Regeln, z.B. logischen Verknüpfungen von Schwellwerten, denken. Daher gehen wir davon aus, dass interpretierbare Modelle zu den Problemstellungen passen und diese interpretierbaren Regeln erkennen und abbilden.
|
10 |
Learning with Complex Performance Measures : Theory, Algorithms and ApplicationsNarasimhan, Harikrishna January 2016 (has links) (PDF)
We consider supervised learning problems, where one is given objects with labels, and the goal is to learn a model that can make accurate predictions on new objects. These problems abound in applications, ranging from medical diagnosis to information retrieval to computer vision. Examples include binary or multiclass classi cation, where the goal is to learn a model that can classify objects into two or more categories (e.g. categorizing emails into spam or non-spam); bipartite ranking, where the goal is to learn a model that can rank relevant objects above the irrelevant ones (e.g. ranking documents by relevance to a query); class probability estimation (CPE), where the goal is to predict the probability of an object belonging to different categories (e.g. probability of an internet ad being clicked by a user). In each case, the accuracy of a model is evaluated in terms of a specified `performance measure'.
While there has been much work on designing and analyzing algorithms for different supervised learning tasks, we have complete understanding only for settings where the performance measure of interest is the standard 0-1 or a loss-based classification measure. These performance measures have a simple additive structure, and can be expressed as an expectation of errors on individual examples. However, in many real-world applications, the performance measure used to evaluate a model is often more complex, and does not decompose into a sum or expectation of point-wise errors. These include the binary or multiclass G-mean used in class-imbalanced classification problems; the F1-measure and its multiclass variants popular in text retrieval; and the (partial) area under the ROC curve (AUC) and precision@ employed in ranking applications. How does one design efficient learning algorithms for such complex performance measures, and can these algorithms be shown to be statistically consistent, i.e. shown to converge in the limit of infinite data to the optimal model for the given measure? How does one develop efficient learning algorithms for complex measures in online/streaming settings where the training examples need to be processed one at a time? These are questions that we seek to address in this thesis. Firstly, we consider the bipartite ranking problem with the AUC and partial AUC performance measures. We start by understanding how bipartite ranking with AUC is related to the
standard 0-1 binary classification and CPE tasks. It is known that a good binary CPE model can be used to obtain both a good binary classification model and a good bipartite ranking model (formally, in terms of regret transfer bounds), and that a binary classification model does not necessarily yield a CPE model. However, not much is known about other directions. We show that in a weaker sense (where the mapping needed to transform a model from one problem to another depends on the underlying probability distribution), a good bipartite ranking model for AUC can indeed be used to construct a good binary classification model, and also a good binary CPE model. Next, motivated by the increasing number of applications (e.g. biometrics, medical diagnosis, etc.), where performance is measured, not in terms of the full AUC, but in terms of the partial AUC between two false positive rates (FPRs), we design batch algorithms for optimizing partial AUC in any given FPR range. Our algorithms optimize structural support vector machine based surrogates, which unlike for the full AUC; do not admit a straightforward decomposition into simpler terms. We develop polynomial time cutting plane solvers for solving the optimization, and provide experiments to demonstrate the efficacy of our methods. We also present an application of our approach to predicting chemotherapy outcomes for cancer patients, with the aim of improving treatment decisions.
Secondly, we develop algorithms for optimizing (surrogates for) complex performance mea-sures in the presence of streaming data. A well-known method for solving this problem for standard point-wise surrogates such as the hinge surrogate, is the stochastic gradient descent (SGD) method, which performs point-wise updates using unbiased gradient estimates. How-ever, this method cannot be applied to complex objectives, as here one can no longer obtain unbiased gradient estimates from a single point. We develop a general stochastic method for optimizing complex measures that avoids point-wise updates, and instead performs gradient updates on mini-batches of incoming points. The method is shown to provably converge for any performance measure that satis es a uniform convergence requirement, such as the partial AUC, precision@ and F1-measure, and in experiments, is often several orders of magnitude faster than the state-of-the-art batch methods, while achieving similar or better accuracies. Moreover, for specific complex binary classification measures, which are concave functions of the true positive rate (TPR) and true negative rate (TNR), we are able to develop stochastic (primal-dual) methods that can indeed be implemented with point-wise updates, by using an adaptive linearization scheme. These methods admit convergence rates that match the rate of the SGD method, and are again several times faster than the state-of-the-art methods.
Finally, we look at the design of consistent algorithms for complex binary and multiclass measures. For binary measures, we consider the practically popular plug-in algorithm that constructs a classifier by applying an empirical threshold to a suitable class probability estimate,
and provide a general methodology for proving consistency of these methods. We apply this technique to show consistency for the F1-measure, and under a continuity assumption on the distribution, for any performance measure that is monotonic in the TPR and TNR. For the case of multiclass measures, a simple plug-in method is no longer tractable, as in the place of a single threshold parameter, one needs to tune at least as many parameters as the number of classes. Using an optimization viewpoint, we provide a framework for designing learning algorithms for multiclass measures that are general functions of the confusion matrix, and as an instantiation, provide an e cient and provably consistent algorithm based on the bisection method for multiclass measures that are ratio-of-linear functions of the confusion matrix (e.g. micro F1). The algorithm outperforms the state-of-the-art SVMPerf method in terms of both accuracy and running time.
Overall, in this thesis, we have looked at various aspects of complex performance measures used in supervised learning problems, leading to several new algorithms that are often significantly better than the state-of-the-art, to improved theoretical understanding of the performance measures studied, and to novel real-life applications of the algorithms developed.
|
Page generated in 0.1354 seconds