1 |
Learning the Sub-Conceptual Layer: A Framework for One-Class ClassificationSharma, Shiven January 2016 (has links)
In the realm of machine learning research and application, binary classification algorithms, i.e. algorithms that attempt to induce discriminant functions between two categories of data, reign supreme. Their fundamental property is the reliance on the availability of data from all known categories in order to induce functions that can offer acceptable levels of accuracy. Unfortunately, data from so-called ``real-world'' domains sometimes do not satisfy this property. In order to tackle this, researchers focus on methods such as sampling and cost-sensitive classification to make the data more conducive for binary classifiers.
However, as this thesis shall argue, there are scenarios in which even such explicit methods to rectify distributions fail. In such cases, one-class classification algorithms become a practical alternative. Unfortunately, if the domain is inherently complex, the advantage that they offer over binary classifiers becomes diminished. The work in this thesis addresses this issue, and builds a framework that allows for one-class algorithms to build efficient classifiers.
In particular, this thesis introduces the notion of learning along the lines sub-concepts in the domain; the complexity in domains arises due to the presence of sub-concepts, and by learning over them explicitly rather than on the entire domain as a whole, we can produce powerful one-class classification systems. The level of knowledge regarding these sub-concepts will naturally vary by domain, and thus we develop three distinct frameworks that take the amount of domain knowledge available into account. We demonstrate these frameworks over three real-world domains.
The first domain we consider is that of biometric authentication via a users swipe on a smartphone. We identify sub-concepts based on a users motion, and given that modern smartphones employ sensors that can identify motion, during learning as well as application, sub-concepts can be identified explicitly, and novel instances can be processed by the appropriate one-class classifier. The second domain is that of invasive isotope detection via gamma-ray spectra. The sub-concepts are based on environmental factors; however, the hardware employed cannot detect such concepts, and quantifying the precise source that creates these sub-concepts is difficult to ascertain. To remedy this, we introduce a novel framework in which we employ a sub-concept detector by means of a multi-class classifier, which pre-processes novel instances in order to send them to the correct one-class classifier. The third domain is that of compliance verification of the Comprehensive Test Ban Treaty (CTBT) through Xenon isotope measurements. This domain presents the worst case where sub-concepts are not known. To this end, we employ a generic version of our framework in which we simply cluster the domain and build classifiers over each cluster. In all cases, we demonstrate that learning in the context of domain concepts greatly improves the performance of one-class classifiers.
|
2 |
Classificação One-Class para predição de adaptação de espécies em ambientes desconhecidosSalmazzo, Natália January 2016 (has links)
Orientadora: Profa. Dra. Debora Maria Rossi de Medeiros / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2016. / O crescente aumento da exploração do meio ambiente e da biodiversidade faz com que seja necessário preservar os recursos naturais para evitar escassez e reduzir os impactos ambientais. Utilizando dados de distribuição geográfica de espécies combinados com características ambientais e, é possível gerar modelos de distribuição geográfica de espécies. Esses modelos podem ser aplicados na solução de diversos problemas relacionados à manutenção da biodiversidade e preservação das espécies, por exemplo, como auxiliar na dentição de politicas publica e cenários para o uso sustentável do meio ambiente, estudar o potencial de crescimento e proliferação de espécies invasoras, e avaliar os impactos das mudanças climáticas na biodiversidade.
Este trabalho propõe um método para a geração de modelos de distribuição de espécies através da aplicação de conceitos de aprendizado de maquina adaptados para a resolução de problemas de uma classe. Os modelos gerados possibilitam a localização de áreas com características similares ao habitat natural das espécies e, dessa forma, contribuem para a sua preservação.
Para avaliar a sua acácia, o método proposto foi aplicado em uma base de dados real e algumas bases de Benchmark, e comparado com uma versão do algoritmo Suporta Vector Machies, para dados com uma única classe. O SVM é um dos algoritmos mais aplicados na modelagem de distribuição de espécies e esta disponível em algumas das soluções mais utilizados pelos pesquisadores da área, como o Openmodeller 1 e o Biodiversityr2, avaliação para outras situações, como bases de dados que incluam dados de ausência de espécies bases de dados com um maior numeram de exemplos, os resultados são promissores e indicam que o aprofundamento da pesquisa nessa área pode ter impacto relevante para a modelagem de distribuição de espécies, portanto oferece uma base solida para avaliação.
Os resultados mostraram que o método proposto é vi Avel e competitivo. Em muitos casos, como para dados possíveis de serem separados linearmente, o novo método apresentou resultados superiores aos do SVM. Embora ainda seja necessário estender a sua avaliação para outras situações, como bases de dados que incluam dados de ausência de espécies e bases de dados com um maior numeram de exemplos, os resultados são promissores e indicam que o aprofundamento da pesquisa nessa área pode ter impacto relevante para a modelagem de distribuição de espécies. / The increasing exploitation of the environment and biodiversity makes it necessary
to preserve the natural resources to avoid scarcity and reduce environmental impacts.
Using geographical species distribution data combined with environmental and ecological
characteristics, geographical species distribution models can be generated. These models
can be applied in solving various problems related to the maintenance of biodiversity and
species conservation, such as an aid in the denition of public policies and scenarios for
sustainable use of the environment, study the potential for growth and proliferation of
invasive species, and assess the impacts of climate change on biodiversity.
This work proposes a method for generating geographical species distribution models by
applying Machine Learning concepts adapted to solving one-class problems. The generated
models enable the identication of areas with similar characteristics to the natural habitat
of the species and therefore contribute to its preservation.
To evaluate its eectiveness, the proposed method was applied to a real database and
some benchmark bases, and compared with a version of the Support Vector Machines
algorithm, for one-class classication . The SVM is one of the most applied algorithms
for species distribution modelling and is available in some of the solutions most used by
researchers in this eld, such as openModeller 3 and BiodiversityR 4. Therefore, it provides
a solid base for evaluation.
The results showed that the proposed method is viable and competitive. In many cases,
such as when the data can be linearly separable, the results obtained by applying the new
method were better than those of SVM. Although additional research is necessary to evaluate
the method in dierent situations, such as by using databases that include species
absence data and databases with a large number of examples, the results are promising
and indicate that further research in this area could have a relevant impact to the species
distribution modelling technique.
|
3 |
Clustering to Improve One-Class Classifier Performance in Data StreamsMoulton, Richard Hugh 27 August 2018 (has links)
The classification task requires learning a decision boundary between classes by making use of training examples from each. A potential challenge for this task is the class imbalance problem, which occurs when there are many training instances available for a single class, the majority class, and few training instances for the other, the minority class [58]. In this case, it is no longer clear how to separate the majority class from something for which we have little to no knowledge. More worrying, often the minority class is the class of interest, e.g. for detecting abnormal conditions from streaming sensor data.
The one-class classification (OCC) paradigm addresses this scenario by casting the task as learning a decision boundary around the majority class with no need for minority class instances [110]. OCC has been thoroughly investigated, e.g. [20, 60, 90, 110], and many one-class classifiers have been proposed. One approach for improving one-class classifier performance on static data sets is learning in the context of concepts: the majority class is broken down into its constituent sub-concepts and a classifier is induced over each [100].
Modern machine learning research, however, is concerned with data streams: where potentially infinite amounts of data arrive quickly and need to be processed as they arrive. In these cases it is not possible to store all of the instances in memory, nor is it practical to wait until “the end of the data stream” before learning. An example is network intrusion detection: detecting an attack on the computer network should occur as soon as practicable. Many one-class classifiers for data streams have been described in the literature, e.g. [33, 108], and it is worth investigating whether the approach of learning in the context of concepts can be successfully applied to the OCC task for data streams as well.
This thesis identifies that the idea of breaking the majority class into subconcepts to simplify the OCC problem has been demonstrated for static data sets, [100], but has not been applied in data streams. The primary contribution to the literature made by this thesis is the identification of how the majority class’s sub-concept structure can be used to improve the classification performance of streaming one-class classifiers while mitigating the challenges posed by the data stream environment. Three frameworks are developed, each using this knowledge to a different degree. These are applied with a selection of streaming one-class classifiers to both synthetic and benchmark data streams with performance compared to that of the one-class classifier learning independently. These results are analyzed and it is shown that scenarios exist where knowledge of sub-concepts can be used to improve one-class classifier performance.
|
4 |
Minimizing Dataset Size Requirements for Machine LearningJanuary 2017 (has links)
abstract: Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset for classification is then extended to active machine learning methodology where, first a one class classifier will predict the outliers in the data and then the outlier samples are added to a training set for support vector machine classifier for labeling the unlabeled data. The labeling of dataset can be scaled up to avoid manual labeling and building more robust machine learning methodologies. / Dissertation/Thesis / Masters Thesis Engineering 2017
|
5 |
Active Learning for One-class ClassificationBarnabé-Lortie, Vincent January 2015 (has links)
Active learning is a common solution for reducing labeling costs and maximizing the impact of human labeling efforts in binary and multi-class classification settings. However, when we are faced with extreme levels of class imbalance, a situation in which it is not safe to assume that we have a representative sample of the minority class, it has been shown effective to replace the binary classifiers with a one-class classifiers. In such a setting, traditional active learning methods, and many previously proposed in the literature for one-class classifiers, prove to be inappropriate, as they rely on assumptions about the data that no longer stand.
In this thesis, we propose a novel approach to active learning designed for one-class classification. The proposed method does not rely on many of the inappropriate assumptions of its predecessors and leads to more robust classification performance. The gist of this method consists of labeling, in priority, the instances considered to fit the learned class the least by previous iterations of a one-class classification model.
Throughout the thesis, we provide evidence for the merits of our method, then deepen our understanding of these merits by exploring the properties of the method that allow it to outperform the alternatives.
|
6 |
SV-Means: A Fast One-Class Support Vector Machine-Based Level Set EstimatorPavy, Anne M. January 2017 (has links)
No description available.
|
7 |
OCLEP+: One-Class Intrusion Detection Using Length of PatternsPentukar, Sai Kiran 06 June 2017 (has links)
No description available.
|
8 |
Deep Learning One-Class Classification With Support Vector MethodsHampton, Hayden D 01 January 2024 (has links) (PDF)
Through the specialized lens of one-class classification, anomalies–irregular observations that uncharacteristically diverge from normative data patterns–are comprehensively studied. This dissertation focuses on advancing boundary-based methods in one-class classification, a critical approach to anomaly detection. These methodologies delineate optimal decision boundaries, thereby facilitating a distinct separation between normal and anomalous observations. Encompassing traditional approaches such as One-Class Support Vector Machine and Support Vector Data Description, recent adaptations in deep learning offer a rich ground for innovation in anomaly detection. This dissertation proposes three novel deep learning methods for one-class classification, aiming to enhance the efficacy and accuracy of anomaly detection in an era where data volume and complexity present unprecedented challenges. The first two methods are designed for tabular data from a least squares perspective. Formulating these optimization problems within a least squares framework offers notable advantages. It facilitates the derivation of closed-form solutions for critical gradients that largely influence the optimization procedure. Moreover, this approach circumvents the prevalent issue of degenerate or uninformative solutions, a challenge often associated with these types of deep learning algorithms. The third method is designed for second-order tensors. This proposed method has certain computational advantages and alleviates the need for vectorization, which can lead to structural information loss when spatial or contextual relationships exist in the data structure. The performance of the three proposed methods are demonstrated with simulation studies and real-world datasets. Compared to kernel-based one-class classification methods, the proposed deep learning methods achieve significantly better performance under the settings considered.
|
9 |
An approach to boosting from positive-only dataMitchell, Andrew, Computer Science & Engineering, Faculty of Engineering, UNSW January 2004 (has links)
Ensemble techniques have recently been used to enhance the performance of machine learning methods. However, current ensemble techniques for classification require both positive and negative data to produce a result that is both meaningful and useful. Negative data is, however, sometimes difficult, expensive or impossible to access. In this thesis a learning framework is described that has a very close relationship to boosting. Within this framework a method is described which bears remarkable similarities to boosting stumps and that does not rely on negative examples. This is surprising since learning from positive-only data has traditionally been difficult. An empirical methodology is described and deployed for testing positive-only learning systems using commonly available multiclass datasets to compare these learning systems with each other and with multiclass learning systems. Empirical results show that our positive-only boosting-like method learns, using stumps as a base learner and from positive data only, successfully, and in the process does not pay too heavy a price in accuracy compared to learners that have access to both positive and negative data. We also describe methods of using positive-only learners on multiclass learning tasks and vice versa and empirically demonstrate the superiority of our method of learning in a boosting-like fashion from positive-only data over a traditional multiclass learner converted to learn from positive-only data. Finally we examine some alternative frameworks, such as when additional unlabelled training examples are given. Some theoretical justifications of the results and methods are also provided.
|
10 |
Applying Discriminant Functions with One-Class SVMs for Multi-Class ClassificationLee, Zhi-Ying 09 August 2007 (has links)
AdaBoost.M1 has been successfully applied to improve the accuracy of a learning algorithm for multi-class classification problems. However, it assumes that the performance of each base classifier must be better than 1/2, and this may be hard to achieve in practice for a multi-class problem. A new algorithm called AdaBoost.MK only requiring base classifiers better than a random guessing (1/k) is thus designed.
Early SVM-based multi-class classification algorithms work by splitting the original problem into a set of two-class sub-problems. The time and space required by these algorithms are very demanding. In order to have low time and space complexities, we develop a base classifier that integrates one-class SVMs with discriminant functions.
In this study, a hybrid method that integrates AdaBoost.MK and one-class SVMs with improved discriminant functions as the base classifiers is proposed to solve a multi-class classification problem. Experimental results on data sets from UCI and Statlog show that the proposed approach outperforms many popular multi-class algorithms including support vector clustering and AdaBoost.M1 with one-class SVMs as the base classifiers.
|
Page generated in 0.0712 seconds