Spelling suggestions: "subject:"multiclass"" "subject:"multiclasse""
1 |
Multiclass Classification of SRBCTsYeo, Gene, Poggio, Tomaso 25 August 2001 (has links)
A novel approach to multiclass tumor classification using Artificial Neural Networks (ANNs) was introduced in a recent paper cite{Khan2001}. The method successfully classified and diagnosed small, round blue cell tumors (SRBCTs) of childhood into four distinct categories, neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS), using cDNA gene expression profiles of samples that included both tumor biopsy material and cell lines. We report that using an approach similar to the one reported by Yeang et al cite{Yeang2001}, i.e. multiclass classification by combining outputs of binary classifiers, we achieved equal accuracy with much fewer features. We report the performances of 3 binary classifiers (k-nearest neighbors (kNN), weighted-voting (WV), and support vector machines (SVM)) with 3 feature selection techniques (Golub's Signal to Noise (SN) ratios cite{Golub99}, Fisher scores (FSc) and Mukherjee's SVM feature selection (SVMFS))cite{Sayan98}.
|
2 |
Classifying human activities through machine learningLannge, Jakob, Majed, Ali January 2018 (has links)
Klassificering av dagliga aktiviteter (ADL) kan användas i system som bevakar människors aktiviteter i olika syften. T.ex., i nödsituationssystem. Med machine learning och bärbara sensor som samlar in data kan ADL klassificeras med hög noggrannhet. I detta arbete, ett proof-of-concept system med tre olika machine learning algoritmer utvärderas och jämförs mellan tre olika dataset, ett som är allmänt tillgängligt på (Ugulino, et al., 2012), och två som har samlats in i rapporten med hjälp av en android enhet. Algoritmerna som har använts är: Multiclass Decision Forest, Multiclass Decision Jungle and Multiclass Neural Network. Sensorerna som har använts är en accelerometer och ett gyroskop. Resultatet visar hur ett konceptuellt system kan byggas i Azure Machine Learning Studio, och hur tre olika algoritmer presterar vid klassificering av tre olika dataset. En algoritm visar högre precision vid klassning av Ugolino’s dataset, jämfört med machine learning modellen som ursprungligen används i rapporten. / Classifying Activities of daily life (ADL) can be used in a system that monitor people’s activities for different purposes. For example, in emergency systems. Machine learning is a way to classify ADL with high accuracy, using wearable sensors as an input. In this paper, a proof-of-concept system consisting of three different machine learning algorithms is evaluated and compared between tree different datasets, one publicly available at (Ugulino, et al., 2012), and two collected in this paper using an android device’s accelerometer and gyroscope sensor. The algorithms are: Multiclass Decision Forest, Multiclass Decision Jungle and Multiclass Neural Network. The two sensors used are an accelerometer and a gyroscope. The result shows how a system can be implemented using Azure Machine Learning Studio, and how three different algorithms performs when classifying three different datasets. One algorithm achieves a higher accuracy compared to the machine learning model initially used with the Ugolino data set.
|
3 |
Segmentação de imagens coloridas baseada na mistura de cores e redes neurais / Segmentation of color images based on color mixture and neural networksDiego Rafael Moraes 26 March 2018 (has links)
O Color Mixture é uma técnica para segmentação de imagens coloridas, que cria uma \"Retina Artificial\" baseada na mistura de cores, e faz a quantização da imagem projetando todas as cores em 256 planos no cubo RGB. Em seguida, atravessa todos esses planos com um classificador Gaussiano, visando à segmentação da imagem. Porém, a abordagem atual possui algumas limitações. O classificador atual resolve exclusivamente problemas binários. Inspirado nesta \"Retina Artificial\" do Color Mixture, esta tese define uma nova \"Retina Artificial\", propondo a substituição do classificador atual por uma rede neural artificial para cada um dos 256 planos, com o objetivo de melhorar o desempenho atual e estender sua aplicação para problemas multiclasse e multiescala. Para esta nova abordagem é dado o nome de Neural Color Mixture. Para a validação da proposta foram realizadas análises estatísticas em duas áreas de aplicação. Primeiramente para a segmentação de pele humana, tendo sido comparado seus resultados com oito métodos conhecidos, utilizando quatro conjuntos de dados de tamanhos diferentes. A acurácia de segmentação da abordagem proposta nesta tese superou a de todos os métodos comparados. A segunda avaliação prática do modelo proposto foi realizada com imagens de satélite devido à vasta aplicabilidade em áreas urbanas e rurais. Para isto, foi criado e disponibilizado um banco de imagens, extraídas do Google Earth, de dez regiões diferentes do planeta, com quatro escalas de zoom (500 m, 1000 m, 1500 m e 2000 m), e que continham pelo menos quatro classes de interesse: árvore, solo, rua e água. Foram executados quatro experimentos, sendo comparados com dois métodos, e novamente a proposta foi superior. Conclui-se que a nova proposta pode ser utilizada para problemas de segmentação de imagens coloridas multiclasse e multiescala. E que possivelmente permite estender o seu uso para qualquer aplicação, pois envolve uma fase de treinamento, em que se adapta ao problema. / The Color Mixture is a technique for color images segmentation, which creates an \"Artificial Retina\" based on the color mixture, and quantizes the image by projecting all the colors in 256 plans into the RGB cube. Then, it traverses all those plans with a Gaussian classifier, aiming to reach the image segmentation. However, the current approach has some limitations. The current classifier solves exclusively binary problems. Inspired by this \"Artificial Retina\" of the Color Mixture, we defined a new \"Artificial Retina\", as well as we proposed the replacement of the current classifier by an artificial neural network for each of the 256 plans, with the goal of improving current performance and extending your application to multiclass and multiscale issues. We called this new approach \"Neural Color Mixture\". To validate the proposal, we analyzed it statistically in two areas of application. Firstly for the human skin segmentation, its results were compared with eight known methods using four datasets of different sizes. The segmentation accuracy of the our proposal in this thesis surpassed all the methods compared. The second practical evaluation of the our proposal was carried out with satellite images due to the wide applicability in urban and rural areas. In order to do this, we created and made available a database of satellite images, extracted from Google Earth, from ten different regions of the planet, with four zoom scales (500 m, 1000 m, 1500 m and 2000 m), which contained at least four classes of interest: tree, soil, street and water. We compared our proposal with a neural network of the multilayer type (ANN-MLP) and an Support Vector Machine (SVM). Four experiments were performed, compared to two methods, and again the proposal was superior. We concluded that our proposal can be used for multiclass and multiscale color image segmentation problems, and that it possibly allows to extend its use to any application, as it involves a training phase, in which our methodology adapts itself to any kind of problem.
|
4 |
Segmentação de imagens coloridas baseada na mistura de cores e redes neurais / Segmentation of color images based on color mixture and neural networksMoraes, Diego Rafael 26 March 2018 (has links)
O Color Mixture é uma técnica para segmentação de imagens coloridas, que cria uma \"Retina Artificial\" baseada na mistura de cores, e faz a quantização da imagem projetando todas as cores em 256 planos no cubo RGB. Em seguida, atravessa todos esses planos com um classificador Gaussiano, visando à segmentação da imagem. Porém, a abordagem atual possui algumas limitações. O classificador atual resolve exclusivamente problemas binários. Inspirado nesta \"Retina Artificial\" do Color Mixture, esta tese define uma nova \"Retina Artificial\", propondo a substituição do classificador atual por uma rede neural artificial para cada um dos 256 planos, com o objetivo de melhorar o desempenho atual e estender sua aplicação para problemas multiclasse e multiescala. Para esta nova abordagem é dado o nome de Neural Color Mixture. Para a validação da proposta foram realizadas análises estatísticas em duas áreas de aplicação. Primeiramente para a segmentação de pele humana, tendo sido comparado seus resultados com oito métodos conhecidos, utilizando quatro conjuntos de dados de tamanhos diferentes. A acurácia de segmentação da abordagem proposta nesta tese superou a de todos os métodos comparados. A segunda avaliação prática do modelo proposto foi realizada com imagens de satélite devido à vasta aplicabilidade em áreas urbanas e rurais. Para isto, foi criado e disponibilizado um banco de imagens, extraídas do Google Earth, de dez regiões diferentes do planeta, com quatro escalas de zoom (500 m, 1000 m, 1500 m e 2000 m), e que continham pelo menos quatro classes de interesse: árvore, solo, rua e água. Foram executados quatro experimentos, sendo comparados com dois métodos, e novamente a proposta foi superior. Conclui-se que a nova proposta pode ser utilizada para problemas de segmentação de imagens coloridas multiclasse e multiescala. E que possivelmente permite estender o seu uso para qualquer aplicação, pois envolve uma fase de treinamento, em que se adapta ao problema. / The Color Mixture is a technique for color images segmentation, which creates an \"Artificial Retina\" based on the color mixture, and quantizes the image by projecting all the colors in 256 plans into the RGB cube. Then, it traverses all those plans with a Gaussian classifier, aiming to reach the image segmentation. However, the current approach has some limitations. The current classifier solves exclusively binary problems. Inspired by this \"Artificial Retina\" of the Color Mixture, we defined a new \"Artificial Retina\", as well as we proposed the replacement of the current classifier by an artificial neural network for each of the 256 plans, with the goal of improving current performance and extending your application to multiclass and multiscale issues. We called this new approach \"Neural Color Mixture\". To validate the proposal, we analyzed it statistically in two areas of application. Firstly for the human skin segmentation, its results were compared with eight known methods using four datasets of different sizes. The segmentation accuracy of the our proposal in this thesis surpassed all the methods compared. The second practical evaluation of the our proposal was carried out with satellite images due to the wide applicability in urban and rural areas. In order to do this, we created and made available a database of satellite images, extracted from Google Earth, from ten different regions of the planet, with four zoom scales (500 m, 1000 m, 1500 m and 2000 m), which contained at least four classes of interest: tree, soil, street and water. We compared our proposal with a neural network of the multilayer type (ANN-MLP) and an Support Vector Machine (SVM). Four experiments were performed, compared to two methods, and again the proposal was superior. We concluded that our proposal can be used for multiclass and multiscale color image segmentation problems, and that it possibly allows to extend its use to any application, as it involves a training phase, in which our methodology adapts itself to any kind of problem.
|
5 |
Sharing visual features for multiclass and multiview object detectionTorralba, Antonio, Murphy, Kevin P., Freeman, William T. 14 April 2004 (has links)
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects. We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.
|
6 |
Improving Multiclass Text Classification with the Support Vector MachineRennie, Jason D. M., Rifkin, Ryan 16 October 2001 (has links)
We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties.
|
7 |
Sharing visual features for multiclass and multiview object detectionTorralba, Antonio, Murphy, Kevin P., Freeman, William T. 14 April 2004 (has links)
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (run-time) computational complexity, and the (training-time) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects.We present a multi-class boosting procedure (joint boosting) that reduces the computational and sample complexity, by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required, and therefore the computational cost, is observed to scale approximately logarithmically with the number of classes. The features selected jointly are closer to edges and generic features typical of many natural structures instead of finding specific object parts. Those generic features generalize better and reduce considerably the computational cost of an algorithm for multi-class object detection.
|
8 |
Design and Analysis of Consistent Algorithms for Multiclass Learning ProblemsHarish, Guruprasad Ramaswami January 2015 (has links) (PDF)
We consider the broad framework of supervised learning, where one gets examples of objects together with some labels (such as tissue samples labeled as cancerous or non-cancerous, or images of handwritten digits labeled with the correct digit in 0-9), and the goal is to learn a prediction model which given a new object, makes an accurate prediction. The notion of accuracy depends on the learning problem under study and is measured by a performance measure of interest. A supervised learning algorithm is said to be 'statistically consistent' if it returns an `optimal' prediction model with respect to the desired performance measure in the limit of infinite data. Statistical consistency is a fundamental notion in supervised machine learning, and therefore the design of consistent algorithms for various learning problems is an important question. While this has been well studied for simple binary classification problems and some other specific learning problems, the question of consistent algorithms for general multiclass learning problems remains open. We investigate several aspects of this question as detailed below.
First, we develop an understanding of consistency for multiclass performance measures defined by a general loss matrix, for which convex surrogate risk minimization algorithms are widely used. Consistency of such algorithms hinges on the notion of 'calibration' of the surrogate loss with respect to target loss matrix; we start by developing a general understanding of this notion, and give both necessary conditions and sufficient conditions for a surrogate loss to be calibrated with respect to a target loss matrix. We then define a fundamental quantity associated with any loss matrix, which we term the `convex calibration dimension' of the loss matrix; this gives one measure of the intrinsic difficulty of designing convex calibrated surrogates for a given loss matrix. We derive lower bounds on the convex calibration dimension which leads to several new results on non-existence of convex calibrated surrogates for various losses. For example, our results improve on recent results on the non-existence of low dimensional convex calibrated surrogates for various subset ranking losses like the pairwise disagreement (PD) and mean average precision (MAP) losses. We also upper bound the convex calibration dimension of a loss matrix by its rank, by constructing an explicit, generic, least squares type convex calibrated surrogate, such that the dimension of the surrogate is at most the (linear algebraic) rank of the loss matrix. This yields low-dimensional convex calibrated surrogates - and therefore consistent learning algorithms - for a variety of structured prediction problems for which the associated loss is of low rank, including for example the precision @ k and expected rank utility (ERU) losses used in subset ranking problems. For settings where achieving exact consistency is computationally difficult, as is the case with the PD and MAP losses in subset ranking, we also show how to extend these surrogates to give algorithms satisfying weaker notions of consistency, including both consistency over restricted sets of probability distributions, and an approximate form of consistency over the full probability space.
Second, we consider the practically important problem of hierarchical classification, where the labels to be predicted are organized in a tree hierarchy. We design a new family of convex calibrated surrogate losses for the associated tree-distance loss; these surrogates are better than the generic least squares surrogate in terms of easier optimization and representation of the solution, and some surrogates in the family also operate on a significantly lower dimensional space than the rank of the tree-distance loss matrix. These surrogates, which we term the `cascade' family of surrogates, rely crucially on a new understanding we develop for the problem of multiclass classification with an abstain option, for which we construct new convex calibrated surrogates that are of independent interest by themselves. The resulting hierarchical classification algorithms outperform the current state-of-the-art in terms of both accuracy and running time.
Finally, we go beyond loss-based multiclass performance measures, and consider multiclass learning problems with more complex performance measures that are nonlinear functions of the confusion matrix and that cannot be expressed using loss matrices; these include for example the multiclass G-mean measure used in class imbalance settings and the micro F1 measure used often in information retrieval applications. We take an optimization viewpoint for such settings, and give a Frank-Wolfe type algorithm that is provably consistent for any complex performance measure that is a convex function of the entries of the confusion matrix (this includes the G-mean, but not the micro F1). The resulting algorithms outperform the state-of-the-art SVMPerf algorithm in terms of both accuracy and running time.
In conclusion, in this thesis, we have developed a deep understanding and fundamental results in the theory of supervised multiclass learning. These insights have allowed us to develop computationally efficient and statistically consistent algorithms for a variety of multiclass learning problems of practical interest, in many cases significantly outperforming the state-of-the-art algorithms for these problems.
|
9 |
Application of Machine Learning and Statistical Learning Methods for Prediction in a Large-Scale Vegetation MapBrookey, Carla M. 01 December 2017 (has links)
Original analyses of a large vegetation cover dataset from Roosevelt National Forest in northern Colorado were carried out by Blackard (1998) and Blackard and Dean (1998; 2000). They compared the classification accuracies of linear and quadratic discriminant analysis (LDA and QDA) with artificial neural networks (ANN) and obtained an overall classification accuracy of 70.58% for a tuned ANN compared to 58.38% for LDA and 52.76% for QDA. Because there has been tremendous development of machine learning classification methods over the last 35 years in both computer science and statistics, as well as substantial improvements in the speed of computer hardware, I applied five modern machine learning algorithms to the data to determine whether significant improvements in the classification accuracy were possible using one or more of these methods. I found that only a tuned gradient boosting machine had a higher accuracy (71.62%) that the ANN of Blackard and Dean (1998), and the difference in accuracies was only about 1%. Of the other four methods, Random Forests (RF), Support Vector Machines (SVM), Classification Trees (CT), and adaboosted trees (ADA), a tuned SVM and RF had accuracies of 67.17% and 67.57%, respectively. The partition of the data by Blackard and Dean (1998) was unusual in that the training and validation datasets had equal representation of the seven vegetation classes, even though 85% of the data fell into classes 1 and 2. For the second part of my analyses I randomly selected 60% of the data for the training data and 20% for each of the validation data and test data. On this partition of the data a single classification tree achieved an accuracy of 92.63% on the test data and the accuracy of RF is 83.98%. Unsurprisingly, most of the gains in accuracy were in classes 1 and 2, the largest classes which also had the highest misclassification rates under the original partition of the data. By decreasing the size of the training data but maintaining the same relative occurrences of the vegetation classes as in the full dataset I found that even for a training dataset of the same size as that of Blackard and Dean (1998) a single classification tree was more accurate (73.80%) that the ANN of Blackard and Dean (1998) (70.58%). The final part of my thesis was to explore the possibility that combining several of the machine learning classifiers predictions could result in higher predictive accuracies. In the analyses I carried out, the answer seems to be that increased accuracies do not occur with a simple voting of five machine learning classifiers.
|
10 |
Learning with Complex Performance Measures : Theory, Algorithms and ApplicationsNarasimhan, Harikrishna January 2016 (has links) (PDF)
We consider supervised learning problems, where one is given objects with labels, and the goal is to learn a model that can make accurate predictions on new objects. These problems abound in applications, ranging from medical diagnosis to information retrieval to computer vision. Examples include binary or multiclass classi cation, where the goal is to learn a model that can classify objects into two or more categories (e.g. categorizing emails into spam or non-spam); bipartite ranking, where the goal is to learn a model that can rank relevant objects above the irrelevant ones (e.g. ranking documents by relevance to a query); class probability estimation (CPE), where the goal is to predict the probability of an object belonging to different categories (e.g. probability of an internet ad being clicked by a user). In each case, the accuracy of a model is evaluated in terms of a specified `performance measure'.
While there has been much work on designing and analyzing algorithms for different supervised learning tasks, we have complete understanding only for settings where the performance measure of interest is the standard 0-1 or a loss-based classification measure. These performance measures have a simple additive structure, and can be expressed as an expectation of errors on individual examples. However, in many real-world applications, the performance measure used to evaluate a model is often more complex, and does not decompose into a sum or expectation of point-wise errors. These include the binary or multiclass G-mean used in class-imbalanced classification problems; the F1-measure and its multiclass variants popular in text retrieval; and the (partial) area under the ROC curve (AUC) and precision@ employed in ranking applications. How does one design efficient learning algorithms for such complex performance measures, and can these algorithms be shown to be statistically consistent, i.e. shown to converge in the limit of infinite data to the optimal model for the given measure? How does one develop efficient learning algorithms for complex measures in online/streaming settings where the training examples need to be processed one at a time? These are questions that we seek to address in this thesis. Firstly, we consider the bipartite ranking problem with the AUC and partial AUC performance measures. We start by understanding how bipartite ranking with AUC is related to the
standard 0-1 binary classification and CPE tasks. It is known that a good binary CPE model can be used to obtain both a good binary classification model and a good bipartite ranking model (formally, in terms of regret transfer bounds), and that a binary classification model does not necessarily yield a CPE model. However, not much is known about other directions. We show that in a weaker sense (where the mapping needed to transform a model from one problem to another depends on the underlying probability distribution), a good bipartite ranking model for AUC can indeed be used to construct a good binary classification model, and also a good binary CPE model. Next, motivated by the increasing number of applications (e.g. biometrics, medical diagnosis, etc.), where performance is measured, not in terms of the full AUC, but in terms of the partial AUC between two false positive rates (FPRs), we design batch algorithms for optimizing partial AUC in any given FPR range. Our algorithms optimize structural support vector machine based surrogates, which unlike for the full AUC; do not admit a straightforward decomposition into simpler terms. We develop polynomial time cutting plane solvers for solving the optimization, and provide experiments to demonstrate the efficacy of our methods. We also present an application of our approach to predicting chemotherapy outcomes for cancer patients, with the aim of improving treatment decisions.
Secondly, we develop algorithms for optimizing (surrogates for) complex performance mea-sures in the presence of streaming data. A well-known method for solving this problem for standard point-wise surrogates such as the hinge surrogate, is the stochastic gradient descent (SGD) method, which performs point-wise updates using unbiased gradient estimates. How-ever, this method cannot be applied to complex objectives, as here one can no longer obtain unbiased gradient estimates from a single point. We develop a general stochastic method for optimizing complex measures that avoids point-wise updates, and instead performs gradient updates on mini-batches of incoming points. The method is shown to provably converge for any performance measure that satis es a uniform convergence requirement, such as the partial AUC, precision@ and F1-measure, and in experiments, is often several orders of magnitude faster than the state-of-the-art batch methods, while achieving similar or better accuracies. Moreover, for specific complex binary classification measures, which are concave functions of the true positive rate (TPR) and true negative rate (TNR), we are able to develop stochastic (primal-dual) methods that can indeed be implemented with point-wise updates, by using an adaptive linearization scheme. These methods admit convergence rates that match the rate of the SGD method, and are again several times faster than the state-of-the-art methods.
Finally, we look at the design of consistent algorithms for complex binary and multiclass measures. For binary measures, we consider the practically popular plug-in algorithm that constructs a classifier by applying an empirical threshold to a suitable class probability estimate,
and provide a general methodology for proving consistency of these methods. We apply this technique to show consistency for the F1-measure, and under a continuity assumption on the distribution, for any performance measure that is monotonic in the TPR and TNR. For the case of multiclass measures, a simple plug-in method is no longer tractable, as in the place of a single threshold parameter, one needs to tune at least as many parameters as the number of classes. Using an optimization viewpoint, we provide a framework for designing learning algorithms for multiclass measures that are general functions of the confusion matrix, and as an instantiation, provide an e cient and provably consistent algorithm based on the bisection method for multiclass measures that are ratio-of-linear functions of the confusion matrix (e.g. micro F1). The algorithm outperforms the state-of-the-art SVMPerf method in terms of both accuracy and running time.
Overall, in this thesis, we have looked at various aspects of complex performance measures used in supervised learning problems, leading to several new algorithms that are often significantly better than the state-of-the-art, to improved theoretical understanding of the performance measures studied, and to novel real-life applications of the algorithms developed.
|
Page generated in 0.0491 seconds