Global ETD Search

1	Learning the Sub-Conceptual Layer: A Framework for One-Class Classification Sharma, Shiven January 2016 (has links) In the realm of machine learning research and application, binary classification algorithms, i.e. algorithms that attempt to induce discriminant functions between two categories of data, reign supreme. Their fundamental property is the reliance on the availability of data from all known categories in order to induce functions that can offer acceptable levels of accuracy. Unfortunately, data from so-called ``real-world'' domains sometimes do not satisfy this property. In order to tackle this, researchers focus on methods such as sampling and cost-sensitive classification to make the data more conducive for binary classifiers. However, as this thesis shall argue, there are scenarios in which even such explicit methods to rectify distributions fail. In such cases, one-class classification algorithms become a practical alternative. Unfortunately, if the domain is inherently complex, the advantage that they offer over binary classifiers becomes diminished. The work in this thesis addresses this issue, and builds a framework that allows for one-class algorithms to build efficient classifiers. In particular, this thesis introduces the notion of learning along the lines sub-concepts in the domain; the complexity in domains arises due to the presence of sub-concepts, and by learning over them explicitly rather than on the entire domain as a whole, we can produce powerful one-class classification systems. The level of knowledge regarding these sub-concepts will naturally vary by domain, and thus we develop three distinct frameworks that take the amount of domain knowledge available into account. We demonstrate these frameworks over three real-world domains. The first domain we consider is that of biometric authentication via a users swipe on a smartphone. We identify sub-concepts based on a users motion, and given that modern smartphones employ sensors that can identify motion, during learning as well as application, sub-concepts can be identified explicitly, and novel instances can be processed by the appropriate one-class classifier. The second domain is that of invasive isotope detection via gamma-ray spectra. The sub-concepts are based on environmental factors; however, the hardware employed cannot detect such concepts, and quantifying the precise source that creates these sub-concepts is difficult to ascertain. To remedy this, we introduce a novel framework in which we employ a sub-concept detector by means of a multi-class classifier, which pre-processes novel instances in order to send them to the correct one-class classifier. The third domain is that of compliance verification of the Comprehensive Test Ban Treaty (CTBT) through Xenon isotope measurements. This domain presents the worst case where sub-concepts are not known. To this end, we employ a generic version of our framework in which we simply cluster the domain and build classifiers over each cluster. In all cases, we demonstrate that learning in the context of domain concepts greatly improves the performance of one-class classifiers. machine learning one-class classification artificial intelligence
2	Clustering to Improve One-Class Classifier Performance in Data Streams Moulton, Richard Hugh 27 August 2018 (has links) The classification task requires learning a decision boundary between classes by making use of training examples from each. A potential challenge for this task is the class imbalance problem, which occurs when there are many training instances available for a single class, the majority class, and few training instances for the other, the minority class [58]. In this case, it is no longer clear how to separate the majority class from something for which we have little to no knowledge. More worrying, often the minority class is the class of interest, e.g. for detecting abnormal conditions from streaming sensor data. The one-class classification (OCC) paradigm addresses this scenario by casting the task as learning a decision boundary around the majority class with no need for minority class instances [110]. OCC has been thoroughly investigated, e.g. [20, 60, 90, 110], and many one-class classifiers have been proposed. One approach for improving one-class classifier performance on static data sets is learning in the context of concepts: the majority class is broken down into its constituent sub-concepts and a classifier is induced over each [100]. Modern machine learning research, however, is concerned with data streams: where potentially infinite amounts of data arrive quickly and need to be processed as they arrive. In these cases it is not possible to store all of the instances in memory, nor is it practical to wait until “the end of the data stream” before learning. An example is network intrusion detection: detecting an attack on the computer network should occur as soon as practicable. Many one-class classifiers for data streams have been described in the literature, e.g. [33, 108], and it is worth investigating whether the approach of learning in the context of concepts can be successfully applied to the OCC task for data streams as well. This thesis identifies that the idea of breaking the majority class into subconcepts to simplify the OCC problem has been demonstrated for static data sets, [100], but has not been applied in data streams. The primary contribution to the literature made by this thesis is the identification of how the majority class’s sub-concept structure can be used to improve the classification performance of streaming one-class classifiers while mitigating the challenges posed by the data stream environment. Three frameworks are developed, each using this knowledge to a different degree. These are applied with a selection of streaming one-class classifiers to both synthetic and benchmark data streams with performance compared to that of the one-class classifier learning independently. These results are analyzed and it is shown that scenarios exist where knowledge of sub-concepts can be used to improve one-class classifier performance. machine learning one-class classification data streams sub-concepts
3	Minimizing Dataset Size Requirements for Machine Learning January 2017 (has links) abstract: Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset for classification is then extended to active machine learning methodology where, first a one class classifier will predict the outliers in the data and then the outlier samples are added to a training set for support vector machine classifier for labeling the unlabeled data. The labeling of dataset can be scaled up to avoid manual labeling and building more robust machine learning methodologies. / Dissertation/Thesis / Masters Thesis Engineering 2017 Computer science Active Learning Machine Learning One Class Classification
4	Active Learning for One-class Classification Barnabé-Lortie, Vincent January 2015 (has links) Active learning is a common solution for reducing labeling costs and maximizing the impact of human labeling efforts in binary and multi-class classification settings. However, when we are faced with extreme levels of class imbalance, a situation in which it is not safe to assume that we have a representative sample of the minority class, it has been shown effective to replace the binary classifiers with a one-class classifiers. In such a setting, traditional active learning methods, and many previously proposed in the literature for one-class classifiers, prove to be inappropriate, as they rely on assumptions about the data that no longer stand. In this thesis, we propose a novel approach to active learning designed for one-class classification. The proposed method does not rely on many of the inappropriate assumptions of its predecessors and leads to more robust classification performance. The gist of this method consists of labeling, in priority, the instances considered to fit the learned class the least by previous iterations of a one-class classification model. Throughout the thesis, we provide evidence for the merits of our method, then deepen our understanding of these merits by exploring the properties of the method that allow it to outperform the alternatives. active learning one-class classification class imbalance problem machine learning
5	A one-class NIDS for SDN-based SCADA systems / Um NIDS baseado em OCC para sistemas SCADA baseados em SDN Silva, Eduardo Germano da January 2007 (has links) Sistemas elétricos possuem grande influência no desenvolvimento econômico mundial. Dada a importância da energia elétrica para nossa sociedade, os sistemas elétricos frequentemente são alvos de intrusões pela rede causadas pelas mais diversas motivações. Para minimizar ou até mesmo mitigar os efeitos de intrusões pela rede, estão sendo propostos mecanismos que aumentam o nível de segurança dos sistemas elétricos, como novos protocolos de comunicação e normas de padronização. Além disso, os sistemas elétricos estão passando por um intenso processo de modernização, tornando-os altamente dependentes de sistemas de rede responsáveis por monitorar e gerenciar componentes elétricos. Estes, então denominados Smart Grids, compreendem subsistemas de geração, transmissão, e distribuição elétrica, que são monitorados e gerenciados por sistemas de controle e aquisição de dados (SCADA). Nesta dissertação de mestrado, investigamos e discutimos a aplicabilidade e os benefícios da adoção de Redes Definidas por Software (SDN) para auxiliar o desenvolvimento da próxima geração de sistemas SCADA. Propomos também um sistema de detecção de intrusões (IDS) que utiliza técnicas específicas de classificação de tráfego e se beneficia de características das redes SCADA e do paradigma SDN/OpenFlow. Nossa proposta utiliza SDN para coletar periodicamente estatísticas de rede dos equipamentos SCADA, que são posteriormente processados por algoritmos de classificação baseados em exemplares de uma única classe (OCC). Dado que informações sobre ataques direcionados à sistemas SCADA são escassos e pouco divulgados publicamente por seus mantenedores, a principal vantagem ao utilizar algoritmos OCC é de que estes não dependem de assinaturas de ataques para detectar possíveis tráfegos maliciosos. Como prova de conceito, desenvolvemos um protótipo de nossa proposta. Por fim, em nossa avaliação experimental, observamos a performance e a acurácia de nosso protótipo utilizando dois tipos de algoritmos OCC, e considerando eventos anômalos na rede SCADA, como um ataque de negação de serviço (DoS), e a falha de diversos dispositivos de campo. / Power grids have great influence on the development of the world economy. Given the importance of the electrical energy to our society, power grids are often target of network intrusion motivated by several causes. To minimize or even to mitigate the aftereffects of network intrusions, more secure protocols and standardization norms to enhance the security of power grids have been proposed. In addition, power grids are undergoing an intense process of modernization, and becoming highly dependent on networked systems used to monitor and manage power components. These so-called Smart Grids comprise energy generation, transmission, and distribution subsystems, which are monitored and managed by Supervisory Control and Data Acquisition (SCADA) systems. In this Masters dissertation, we investigate and discuss the applicability and benefits of using Software-Defined Networking (SDN) to assist in the deployment of next generation SCADA systems. We also propose an Intrusion Detection System (IDS) that relies on specific techniques of traffic classification and takes advantage of the characteristics of SCADA networks and of the adoption of SDN/OpenFlow. Our proposal relies on SDN to periodically gather statistics from network devices, which are then processed by One- Class Classification (OCC) algorithms. Given that attack traces in SCADA networks are scarce and not publicly disclosed by utility companies, the main advantage of using OCC algorithms is that they do not depend on known attack signatures to detect possible malicious traffic. As a proof-of-concept, we developed a prototype of our proposal. Finally, in our experimental evaluation, we observed the performance and accuracy of our prototype using two OCC-based Machine Learning (ML) algorithms, and considering anomalous events in the SCADA network, such as a Denial-of-Service (DoS), and the failure of several SCADA field devices. Redes : Computadores Seguranca : Redes : Computadores Supervisory control and data acquisition Software-defined networking Smart grids Network-based intrusion detection system One-class classification
6	Machine Learning Methods For Using Network Based Information In Microrna Target Prediction Sualp, Merter 01 February 2013 (has links) (PDF) Computational microRNA (miRNA) target identification in animal genomes is a challenging problem due to the imperfect pairing of the miRNA with the target site. Techniques based on sequence alone are prone to produce many false positive interactions. Therefore, integrative techniques have been developed to utilize additional genomic, structural features, and evolu- tionary conservation information for reducing the high false positive rate. We propose that the context of a putative miRNA target in a protein-protein interaction (PPI) network can be used as an additional filter in a computational miRNA target pr ediction algorithm. We compute several graph theoretic measures on human PPI network as indicators of network context. We assess the performance of individual and combined contextual measures in increasing the precision of a popular miRNA target prediction tool, TargetScan, using low throughput and high throughput datasets of experimentally verified human miRNA targets. We used clas- sification algorithms for that assessment. Since there exists only miRNA targets as training samples, this problem becomes a One Class Classification (OCC) problem. We devised a novel OCC method, DiVo, based on simple distance metrics and voting. Comparative analysis with the state of the art methods show that, DiVo attains better classification performance. Our eventual results indicate that topological properties of target gene products in PPI networks are valuable sources of information for filtering out false positive miRNA target genes. We show that, for targets of a number of miRNAs, netwo rk context correlates better with being a target compared to a sequence based score provided by the prediction tool. QA Computer Software 76.75-76.765
7	Machine Learning Methods For Using Network Based Information In Microrna Target Prediction Sualp, Merter 01 February 2013 (has links) (PDF) Computational microRNA (miRNA) target identification in animal genomes is a challenging problem due to the imperfect pairing of the miRNA with the target site. Techniques based on sequence alone are prone to produce many false positive interactions. Therefore, integrative techniques have been developed to utilize additional genomic, structural features, and evolu- tionary conservation information for reducing the high false positive rate. We propose that the context of a putative miRNA target in a protein-protein interaction (PPI) network can be used as an additional filter in a computational miRNA target prediction algorithm. We compute several graph theoretic measures on human PPI network as indicators of network context. We assess the performance of individual and combined contextual measures in increasing the precision of a popular miRNA target prediction tool, TargetScan, using low throughput and high throughput datasets of experimentally verified human miRNA targets. We used clas- sification algorithms for that assessment. Since there exists only miRNA targets as training samples, this problem becomes a One Class Classification (OCC) problem. We devised a novel OCC method, DiVo, based on simple distance metrics and voting. Comparative analysis with the state of the art methods show that, DiVo attains better classification performance. Our eventual results indicate that topological properties of target gene products in PPI networks are valuable sources of information for filtering out false positive miRNA target genes. We show that, for targets of a number of miRNAs, network context correlates better with being a target compared to a sequence based score provided by the prediction tool. QA Computer Software 76.75-76.765
8	A one-class NIDS for SDN-based SCADA systems / Um NIDS baseado em OCC para sistemas SCADA baseados em SDN Silva, Eduardo Germano da January 2007 (has links) Sistemas elétricos possuem grande influência no desenvolvimento econômico mundial. Dada a importância da energia elétrica para nossa sociedade, os sistemas elétricos frequentemente são alvos de intrusões pela rede causadas pelas mais diversas motivações. Para minimizar ou até mesmo mitigar os efeitos de intrusões pela rede, estão sendo propostos mecanismos que aumentam o nível de segurança dos sistemas elétricos, como novos protocolos de comunicação e normas de padronização. Além disso, os sistemas elétricos estão passando por um intenso processo de modernização, tornando-os altamente dependentes de sistemas de rede responsáveis por monitorar e gerenciar componentes elétricos. Estes, então denominados Smart Grids, compreendem subsistemas de geração, transmissão, e distribuição elétrica, que são monitorados e gerenciados por sistemas de controle e aquisição de dados (SCADA). Nesta dissertação de mestrado, investigamos e discutimos a aplicabilidade e os benefícios da adoção de Redes Definidas por Software (SDN) para auxiliar o desenvolvimento da próxima geração de sistemas SCADA. Propomos também um sistema de detecção de intrusões (IDS) que utiliza técnicas específicas de classificação de tráfego e se beneficia de características das redes SCADA e do paradigma SDN/OpenFlow. Nossa proposta utiliza SDN para coletar periodicamente estatísticas de rede dos equipamentos SCADA, que são posteriormente processados por algoritmos de classificação baseados em exemplares de uma única classe (OCC). Dado que informações sobre ataques direcionados à sistemas SCADA são escassos e pouco divulgados publicamente por seus mantenedores, a principal vantagem ao utilizar algoritmos OCC é de que estes não dependem de assinaturas de ataques para detectar possíveis tráfegos maliciosos. Como prova de conceito, desenvolvemos um protótipo de nossa proposta. Por fim, em nossa avaliação experimental, observamos a performance e a acurácia de nosso protótipo utilizando dois tipos de algoritmos OCC, e considerando eventos anômalos na rede SCADA, como um ataque de negação de serviço (DoS), e a falha de diversos dispositivos de campo. / Power grids have great influence on the development of the world economy. Given the importance of the electrical energy to our society, power grids are often target of network intrusion motivated by several causes. To minimize or even to mitigate the aftereffects of network intrusions, more secure protocols and standardization norms to enhance the security of power grids have been proposed. In addition, power grids are undergoing an intense process of modernization, and becoming highly dependent on networked systems used to monitor and manage power components. These so-called Smart Grids comprise energy generation, transmission, and distribution subsystems, which are monitored and managed by Supervisory Control and Data Acquisition (SCADA) systems. In this Masters dissertation, we investigate and discuss the applicability and benefits of using Software-Defined Networking (SDN) to assist in the deployment of next generation SCADA systems. We also propose an Intrusion Detection System (IDS) that relies on specific techniques of traffic classification and takes advantage of the characteristics of SCADA networks and of the adoption of SDN/OpenFlow. Our proposal relies on SDN to periodically gather statistics from network devices, which are then processed by One- Class Classification (OCC) algorithms. Given that attack traces in SCADA networks are scarce and not publicly disclosed by utility companies, the main advantage of using OCC algorithms is that they do not depend on known attack signatures to detect possible malicious traffic. As a proof-of-concept, we developed a prototype of our proposal. Finally, in our experimental evaluation, we observed the performance and accuracy of our prototype using two OCC-based Machine Learning (ML) algorithms, and considering anomalous events in the SCADA network, such as a Denial-of-Service (DoS), and the failure of several SCADA field devices. Redes : Computadores Seguranca : Redes : Computadores Supervisory control and data acquisition Software-defined networking Smart grids Network-based intrusion detection system One-class classification
9	A one-class NIDS for SDN-based SCADA systems / Um NIDS baseado em OCC para sistemas SCADA baseados em SDN Silva, Eduardo Germano da January 2007 (has links) Sistemas elétricos possuem grande influência no desenvolvimento econômico mundial. Dada a importância da energia elétrica para nossa sociedade, os sistemas elétricos frequentemente são alvos de intrusões pela rede causadas pelas mais diversas motivações. Para minimizar ou até mesmo mitigar os efeitos de intrusões pela rede, estão sendo propostos mecanismos que aumentam o nível de segurança dos sistemas elétricos, como novos protocolos de comunicação e normas de padronização. Além disso, os sistemas elétricos estão passando por um intenso processo de modernização, tornando-os altamente dependentes de sistemas de rede responsáveis por monitorar e gerenciar componentes elétricos. Estes, então denominados Smart Grids, compreendem subsistemas de geração, transmissão, e distribuição elétrica, que são monitorados e gerenciados por sistemas de controle e aquisição de dados (SCADA). Nesta dissertação de mestrado, investigamos e discutimos a aplicabilidade e os benefícios da adoção de Redes Definidas por Software (SDN) para auxiliar o desenvolvimento da próxima geração de sistemas SCADA. Propomos também um sistema de detecção de intrusões (IDS) que utiliza técnicas específicas de classificação de tráfego e se beneficia de características das redes SCADA e do paradigma SDN/OpenFlow. Nossa proposta utiliza SDN para coletar periodicamente estatísticas de rede dos equipamentos SCADA, que são posteriormente processados por algoritmos de classificação baseados em exemplares de uma única classe (OCC). Dado que informações sobre ataques direcionados à sistemas SCADA são escassos e pouco divulgados publicamente por seus mantenedores, a principal vantagem ao utilizar algoritmos OCC é de que estes não dependem de assinaturas de ataques para detectar possíveis tráfegos maliciosos. Como prova de conceito, desenvolvemos um protótipo de nossa proposta. Por fim, em nossa avaliação experimental, observamos a performance e a acurácia de nosso protótipo utilizando dois tipos de algoritmos OCC, e considerando eventos anômalos na rede SCADA, como um ataque de negação de serviço (DoS), e a falha de diversos dispositivos de campo. / Power grids have great influence on the development of the world economy. Given the importance of the electrical energy to our society, power grids are often target of network intrusion motivated by several causes. To minimize or even to mitigate the aftereffects of network intrusions, more secure protocols and standardization norms to enhance the security of power grids have been proposed. In addition, power grids are undergoing an intense process of modernization, and becoming highly dependent on networked systems used to monitor and manage power components. These so-called Smart Grids comprise energy generation, transmission, and distribution subsystems, which are monitored and managed by Supervisory Control and Data Acquisition (SCADA) systems. In this Masters dissertation, we investigate and discuss the applicability and benefits of using Software-Defined Networking (SDN) to assist in the deployment of next generation SCADA systems. We also propose an Intrusion Detection System (IDS) that relies on specific techniques of traffic classification and takes advantage of the characteristics of SCADA networks and of the adoption of SDN/OpenFlow. Our proposal relies on SDN to periodically gather statistics from network devices, which are then processed by One- Class Classification (OCC) algorithms. Given that attack traces in SCADA networks are scarce and not publicly disclosed by utility companies, the main advantage of using OCC algorithms is that they do not depend on known attack signatures to detect possible malicious traffic. As a proof-of-concept, we developed a prototype of our proposal. Finally, in our experimental evaluation, we observed the performance and accuracy of our prototype using two OCC-based Machine Learning (ML) algorithms, and considering anomalous events in the SCADA network, such as a Denial-of-Service (DoS), and the failure of several SCADA field devices. Redes : Computadores Seguranca : Redes : Computadores Supervisory control and data acquisition Software-defined networking Smart grids Network-based intrusion detection system One-class classification
10	Deep Learning One-Class Classification With Support Vector Methods Hampton, Hayden D 01 January 2024 (has links) (PDF) Through the specialized lens of one-class classification, anomalies–irregular observations that uncharacteristically diverge from normative data patterns–are comprehensively studied. This dissertation focuses on advancing boundary-based methods in one-class classification, a critical approach to anomaly detection. These methodologies delineate optimal decision boundaries, thereby facilitating a distinct separation between normal and anomalous observations. Encompassing traditional approaches such as One-Class Support Vector Machine and Support Vector Data Description, recent adaptations in deep learning offer a rich ground for innovation in anomaly detection. This dissertation proposes three novel deep learning methods for one-class classification, aiming to enhance the efficacy and accuracy of anomaly detection in an era where data volume and complexity present unprecedented challenges. The first two methods are designed for tabular data from a least squares perspective. Formulating these optimization problems within a least squares framework offers notable advantages. It facilitates the derivation of closed-form solutions for critical gradients that largely influence the optimization procedure. Moreover, this approach circumvents the prevalent issue of degenerate or uninformative solutions, a challenge often associated with these types of deep learning algorithms. The third method is designed for second-order tensors. This proposed method has certain computational advantages and alleviates the need for vectorization, which can lead to structural information loss when spatial or contextual relationships exist in the data structure. The performance of the three proposed methods are demonstrated with simulation studies and real-world datasets. Compared to kernel-based one-class classification methods, the proposed deep learning methods achieve significantly better performance under the settings considered. one-class classification deep learning neural network support vector data description anomaly detection one-class support vector machine Categorical Data Analysis Data Science

Search results