Global ETD Search

11	An online and adaptive signature-based approach for intrusion detection using learning classifier systems Shafi, Kamran, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links) This thesis presents the case of dynamically and adaptively learning signatures for network intrusion detection using genetic based machine learning techniques. The two major criticisms of the signature based intrusion detection systems are their i) reliance on domain experts to handcraft intrusion signatures and ii) inability to detect previously unknown attacks or the attacks for which no signatures are available at the time. In this thesis, we present a biologically-inspired computational approach to address these two issues. This is done by adaptively learning maximally general rules, which are referred to as signatures, from network traffic through a supervised learning classifier system, UCS. The rules are learnt dynamically (i.e., using machine intelligence and without the requirement of a domain expert), and adaptively (i.e., as the data arrives without the need to relearn the complete model after presenting each data instance to the current model). Our approach is hybrid in that signatures for both intrusive and normal behaviours are learnt. The rule based profiling of normal behaviour allows for anomaly detection in that the events not matching any of the rules are considered potentially harmful and could be escalated for an action. We study the effect of key UCS parameters and operators on its performance and identify areas of improvement through this analysis. Several new heuristics are proposed that improve the effectiveness of UCS for the prediction of unseen and extremely rare intrusive activities. A signature extraction system is developed that adaptively retrieves signatures as they are discovered by UCS. The signature extraction algorithm is augmented by introducing novel subsumption operators that minimise overlap between signatures. Mechanisms are provided to adapt the main algorithm parameters to deal with online noisy and imbalanced class data. The performance of UCS, its variants and the signature extraction system is measured through standard evaluation metrics on a publicly available intrusion detection dataset provided during the 1999 KDD Cup intrusion detection competition. We show that the extended UCS significantly improves test accuracy and hit rate while significantly reducing the rate of false alarms and cost per example scores than the standard UCS. The results are competitive to the best systems participated in the competition in addition to our systems being online and incremental rule learners. The signature extraction system built on top of the extended UCS retrieves a magnitude smaller rule set than the base UCS learner without any significant performance loss. We extend the evaluation of our systems to real time network traffic which is captured from a university departmental server. A methodology is developed to build fully labelled intrusion detection dataset by mixing real background traffic with attacks simulated in a controlled environment. Tools are developed to pre-process the raw network data into feature vector format suitable for UCS and other related machine learning systems. We show the effectiveness of our feature set in detecting payload based attacks. Classification Intrusion detection Evolutionary computation Data mining Genetic based machine learning Supervised learning Learning classifier system Knowledge extraction
12	Towards a Versatile System for the Visual Recognition of Surface Defects Koprnicky, Miroslav January 2005 (has links) Automated visual inspection is an emerging multi-disciplinary field with many challenges; it combines different aspects of computer vision, pattern recognition, automation, and control systems. There does not exist a large body of work dedicated to the design of generalized visual inspection systems; that is, those that might easily be made applicable to different product types. This is an important oversight, in that many improvements in design and implementation times, as well as costs, might be realized with a system that could easily be made to function in different production environments. <br /><br /> This thesis proposes a framework for generalizing and automating the design of the defect classification stage of an automated visual inspection system. It involves using an expandable set of features which are optimized along with the classifier operating on them in order to adapt to the application at hand. The particular implementation explored involves optimizing the feature set in disjoint sets logically grouped by feature type to keep search spaces reasonable. Operator input is kept at a minimum throughout this customization process, since it is limited only to those cases in which the existing feature library cannot adequately delineate the classes at hand, at which time new features (or pools) may have to be introduced by an engineer with experience in the domain. <br /><br /> Two novel methods are put forward which fit well within this framework: cluster-space and hybrid-space classifiers. They are compared in a series of tests against both standard benchmark classifiers, as well as mean and majority vote multi-classifiers, on feature sets comprised of just the logical feature subsets, as well as the entire feature sets formed by their union. The proposed classifiers as well as the benchmarks are optimized with both a progressive combinatorial approach and with an genetic algorithm. Experimentation was performed on true colour industrial lumber defect images, as well as binary hand-written digits. <br /><br /> Based on the experiments conducted in this work, it was found that the sequentially optimized multi hybrid-space methods are capable of matching the performances of the benchmark classifiers on the lumber data, with the exception of the mean-rule multi-classifiers, which dominated most experiments by approximately 3% in classification accuracy. The genetic algorithm optimized hybrid-space multi-classifier achieved best performance however; an accuracy of 79. 2%. <br /><br /> The numeral dataset results were less promising; the proposed methods could not equal benchmark performance. This is probably because the numeral feature-sets were much more conducive to good class separation, with standard benchmark accuracies approaching 95% not uncommon. This indicates that the cluster-space transform inherent to the proposed methods appear to be most useful in highly dependant or confusing feature-spaces, a hypothesis supported by the outstanding performance of the single hybrid-space classifier in the difficult texture feature subspace: 42. 6% accuracy, a 6% increase over the best benchmark performance. <br /><br /> The generalized framework proposed appears promising, because classifier performance over feature sets formed by the union of independently optimized feature subsets regularly met and exceeded those classifiers operating on feature sets formed by the optimization of the feature set in its entirety. This finding corroborates earlier work with similar results [3, 9], and is an aspect of pattern recognition that should be examined further. Systems Design Automated Visual Inspection Generalized Defect Recognition Framework Cluster-Space Transform Multi-Classifier System Fuzzy C-Means Clustering Pattern Recognition
13	Towards a Versatile System for the Visual Recognition of Surface Defects Koprnicky, Miroslav January 2005 (has links) Automated visual inspection is an emerging multi-disciplinary field with many challenges; it combines different aspects of computer vision, pattern recognition, automation, and control systems. There does not exist a large body of work dedicated to the design of generalized visual inspection systems; that is, those that might easily be made applicable to different product types. This is an important oversight, in that many improvements in design and implementation times, as well as costs, might be realized with a system that could easily be made to function in different production environments. <br /><br /> This thesis proposes a framework for generalizing and automating the design of the defect classification stage of an automated visual inspection system. It involves using an expandable set of features which are optimized along with the classifier operating on them in order to adapt to the application at hand. The particular implementation explored involves optimizing the feature set in disjoint sets logically grouped by feature type to keep search spaces reasonable. Operator input is kept at a minimum throughout this customization process, since it is limited only to those cases in which the existing feature library cannot adequately delineate the classes at hand, at which time new features (or pools) may have to be introduced by an engineer with experience in the domain. <br /><br /> Two novel methods are put forward which fit well within this framework: cluster-space and hybrid-space classifiers. They are compared in a series of tests against both standard benchmark classifiers, as well as mean and majority vote multi-classifiers, on feature sets comprised of just the logical feature subsets, as well as the entire feature sets formed by their union. The proposed classifiers as well as the benchmarks are optimized with both a progressive combinatorial approach and with an genetic algorithm. Experimentation was performed on true colour industrial lumber defect images, as well as binary hand-written digits. <br /><br /> Based on the experiments conducted in this work, it was found that the sequentially optimized multi hybrid-space methods are capable of matching the performances of the benchmark classifiers on the lumber data, with the exception of the mean-rule multi-classifiers, which dominated most experiments by approximately 3% in classification accuracy. The genetic algorithm optimized hybrid-space multi-classifier achieved best performance however; an accuracy of 79. 2%. <br /><br /> The numeral dataset results were less promising; the proposed methods could not equal benchmark performance. This is probably because the numeral feature-sets were much more conducive to good class separation, with standard benchmark accuracies approaching 95% not uncommon. This indicates that the cluster-space transform inherent to the proposed methods appear to be most useful in highly dependant or confusing feature-spaces, a hypothesis supported by the outstanding performance of the single hybrid-space classifier in the difficult texture feature subspace: 42. 6% accuracy, a 6% increase over the best benchmark performance. <br /><br /> The generalized framework proposed appears promising, because classifier performance over feature sets formed by the union of independently optimized feature subsets regularly met and exceeded those classifiers operating on feature sets formed by the optimization of the feature set in its entirety. This finding corroborates earlier work with similar results [3, 9], and is an aspect of pattern recognition that should be examined further. Systems Design Automated Visual Inspection Generalized Defect Recognition Framework Cluster-Space Transform Multi-Classifier System Fuzzy C-Means Clustering Pattern Recognition
14	Sobre cognição, adaptação e homeostase : uma analise de ferramentas computacionais bioinspiradas aplicadas a navegação autonoma de robos / On cognition, adaptation and homeostasis : analysis and synthesis of bio-inspired computational tools applied to robot autonomous navigation Moioli, Renan Cipriano 09 October 2008 (has links) Orientadores: Fernando Jose Von Zuben, Patricia Amancio Vargas / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-11T19:08:31Z (GMT). No. of bitstreams: 1 Moioli_RenanCipriano_M.pdf: 1774485 bytes, checksum: fbe8aa9cf8be0ba5310723711c91235c (MD5) Previous issue date: 2008 / Resumo: Este trabalho tem como objetivos principais estudar, desenvolver e aplicar duas ferramentas computacionais bio-inspiradas em navegação autônoma de robôs. A primeira delas é representada pelos Sistemas Classificadores com Aprendizado, sendo que utilizou-se uma versão da proposta original, baseada em energia, e uma versão baseada em precisão. Adicionalmente, apresenta-se uma análise do processo de evolução das regras de inferência e da população final obtida. A segunda ferramenta trata de um modelo denominado sistema homeostático artificial evolutivo, composto por duas redes neurais artificiais recorrentes do tipo NSGasNets e um sistema endócrino artificial. O ajuste dos parâmetros do sistema é feito por meio de evolução, reduzindo-se a necessidade de codificação e parametrização a priori. São feitas análises de suas peculiaridades e de sua capacidade de adaptação. A motivação das duas propostas está no emprego conjunto de evolução e aprendizado, etapas consideradas fundamentais para a síntese de sistemas complexos adaptativos e modelagem computacional de processos cognitivos. Os experimentos visando validar as propostas envolvem simulação computacional em ambientes virtuais e implementações em um robô real do tipo Khepera II. / Abstract: The objectives of this work are to study, develop and apply two bio-inspired computational tools in robot autonomous navigation. The first tool is represented by Learning Classifier Systems, using the strength-based and the accuracy-based models. Additionally, the rule evolution mechanisms and the final evolved populations are analyzed. The second tool is a model called evolutionary artificial homeostatic system, composed of two NSGasNet recurrent artificial neural networks and an artificial endocrine system. The parameters adjustment is made by means of evolution, reducing the necessity of a priori coding and parametrization. Analysis of the system's peculiarities and its adaptation capability are made. The motivation of both proposals is on the concurrent use of evolution and learning, steps considered fundamental for the synthesis of complex adaptive systems and the computational modeling of cognitive processes. The experiments, which aim to validate both proposals, involve computational simulation in virtual environments and implementations on real Khepera II robots. / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Inteligência artificial Homeostase Redes neurais (Computação) Computação evolutiva Robôs móveis Sistemas inteligentes de controle Learning classifier system Artificial homeostatic system Artificial Neural Networks Evolutionary robotics Reactive and non reactive behaviour
15	Automating rule creation in a Smart Home prototype with Learning Classifier System Anderzén, Anton, Winroth, Markus January 2018 (has links) The name ”smart homes” gives a promise of intelligent behavior. Today automation of the home environment is a manual task, with the creation of rules controlling devices relying on the user. For smart homes this tedious manual task can be automated. The purpose of this thesis is development of a prototype that will help users in smart homes create rules. The rules should be automatically created by the use of a machine learning solution. A learning classifier system algorithm is found as a suitable machine learning solution. A learning classifier system is used to find and create rules from sensor data. In the prototype a Raspberry Pi is used to collect the data. This data is processedby the learning classifier system, generating a set of rules. These rules predict actions for controlling a smart lighting system. The rules are continuously updated with new sensory information from the environment constantly reevaluating the previous found rules. The learning classifier system prototype solves the problem of how rules can be generated automatically by the use of machine learning. / Uttrycket ”smarta hem” utlovar ett intelligent beteende. Idag är automatiseringen av hemmiljön en manuell uppgift, där användaren formulerar regler som styr systemet. I smarta hem kan denna uppgift bli automatiserad. Syftet med denna kandidatuppsats är att utveckla en prototyp som ska hjälpa användare i smarta hem att skapa regler. Reglerna ska skapas automatiskt med hjälp av en maskininlärningslösning. Ett självlärande klassificeringssystem bedöms uppfylla den kravställning som görs. Det självlärande klassificeringssystemet används för att skapa regler från sensordata. I prototypen används en Raspberry Pi för att samla in data. Insamlad data behandlas av det självlärande klassificeringssystem som genererar en uppsättning regler. Dessa regler används för att kontrollera ett smart ljussystem. Reglerna uppdateras kontinuerligt med ny sensorinformation från omgivningen och utvärderar de tidigare funna reglerna. Den självlärande klassificeringssystemprototypen löser problemet om hur regler kan skapas automatiskt med hjälp av maskininlärning. Computer and Information Sciences Data- och informationsvetenskap
16	遺傳規畫在人工智慧經濟學中的發展與評估 / The development and evaluation of genetic programming on artificial intelligence economics 葉佳炫, Yeh, Chia Hsuan Unknown Date (has links) 本論文是承續近來〝有限理性總體經濟學〞發展下之一支研究。有關有限理性的定義，在本研究中乃是以Sargent(1993)及Leijonhufvud(1993)為根據。Sargent(1993)認為：經濟學家在建立模型時，要怎麼樣去塑造其模型中的決策者的預期及學習呢？為了在精神上求一致起見，不應將模型中的決策者想成比經濟學家本人更聰明或更無知。有關這兩個角色應一致的要求，似乎便成了有限理性總體經濟學中相當關鍵的磐石。有關預期與學習形成的部份在計量經濟學上，又可大致分為兩個階段。在第一階段中，是以統計決策理論為主所建構的預期與學習過程，這類型的預期是奠基於以機率模型為主的學習過程。此類學習過程可以說是1980年代以來，理性預期學習過程的主要架構。使用這種學習模型需對決策者在所擁有的資訊上，做較強的限制。而第二階段的學習模式是要減輕模型中對決策人在資訊上的負荷，即將第一階段機率模型的學習擴充至非機率模型的學習。而幾乎所有學習上的問題，都可以視為一個尋找的問題，模型選擇是尋找模型，參數估計是尋找參數。在模型的設定上，以往我們處理的程序是：假設模型為....，則我們可以....。對於模型的選定並沒有嚴格的判定標準可供依循。然而遺傳規畫不但對模型的設立，提供了一個良好的典範，而且對如何尋找模型，提供了一個一般性的尋找模式。模型的選取，應是先經由尋找的過程而得到的，而非憑空自上帝的手中取得。因此，就如何建立起尋找的方式，其較模型的選擇更為基本且更為重要。遺傳規畫運作之初，並沒有包含先驗的知識，初始的模型是經由隨機創造而得。在演化的過程中，模型逐漸地有了系統（型態）的出現。這種尋找的過程，既不偏向隨機也不偏向系統，在隨機與系統中，取得了一個完美的平衡點。在遺傳規畫運作下，要選擇何種模型，將視實驗者的時間成本而定。換句話說，即遺傳規畫提供了實驗者到目前為止最好的模型，是否該花更多時間以取得〝較精確〞的模型，將由實驗者自行決定。在此情況下，我們在模型的選擇上，有了一個較為適當的判定基準：模型的大體輪廓將是藉由進化的方式取得，不是經由天外神來之筆而誕生。在模型精確度的選擇上，將由個人的時間成本來定奪。就在這層意義上來說，此種選擇的模式比較符合〝人性〞，亦與經濟學的精神相符合。本論文的目的便是要了解遺傳規畫在實際運作上的一些特性，以及該如何正確地使用它才能得到最大的功效，以期望它能成為我們在處理有限理性總體經濟學上的一個重要工具。遺傳程式遺傳規畫有限理性演化檔案系統歸納性學習 Genetic Algorithm Genetic Programming Bounded Rationality Evolution Classifier System
17	Aprendizado de máquina baseado em separabilidade linear em sistema de classificação híbrido-nebuloso aplicado a problemas multiclasse Tuma, Carlos Cesar Mansur 29 June 2009 (has links) Made available in DSpace on 2016-06-02T19:05:36Z (GMT). No. of bitstreams: 1 2598.pdf: 3349204 bytes, checksum: 01649491fd1f03aa5a11b9191727f88b (MD5) Previous issue date: 2009-06-29 / Financiadora de Estudos e Projetos / This master thesis describes an intelligent classifier system applied to multiclass non-linearly separable problems called Slicer. The system adopts a low computacional cost supervised learning strategy (evaluated as ) based on linear separability. During the learning period the system determines a set of hyperplanes associated to oneclass regions (sub-spaces). In classification tasks the classifier system uses the hyperplanes as a set of if-then-else rules to infer the class of the input attribute vector (non classified object). Among other characteristics, the intelligent classifier system is able to: deal with missing attribute values examples; reject noise examples during learning; adjust hyperplane parameters to improve the definition of the one-class regions; and eliminate redundant rules. The fuzzy theory is considered to design a hybrid version with features such as approximate reasoning and parallel inference computation. Different classification methods and benchmarks are considered for evaluation. The classifier system Slicer reaches acceptable results in terms of accuracy, justifying future investigation effort. / Este trabalho de mestrado descreve um sistema classificador inteligente aplicado a problemas multiclasse não-linearmente separáveis chamado Slicer. O sistema adota uma estratégia de aprendizado supervisionado de baixo custo computacional (avaliado em ) baseado em separabilidade linear. Durante o período de aprendizagem o sistema determina um conjunto de hiperplanos associados a regiões de classe única (subespaços). Nas tarefas de classificação o sistema classificador usa os hiperplanos como um conjunto de regras se-entao-senao para inferir a classe do vetor de atributos dado como entrada (objeto a ser classificado). Entre outras caracteristicas, o sistema classificador é capaz de: tratar atributos faltantes; eliminar ruídos durante o aprendizado; ajustar os parâmetros dos hiperplanos para obter melhores regiões de classe única; e eliminar regras redundantes. A teoria nebulosa é considerada para desenvolver uma versão híbrida com características como raciocínio aproximado e simultaneidade no mecanismo de inferência. Diferentes métodos de classificação e domínios são considerados para avaliação. O sistema classificador Slicer alcança resultados aceitáveis em termos de acurácia, justificando investir em futuras investigações. Inteligência artificial Aprendizagem de máquina Classificação Método geométrico de classificação Sistema classificador nebuloso Separabilidade linear Linear separability Multiclass non-linear problems Machine learning Geometric classification method Fuzzy classifier system
18	[en] MULTIPLE CLASSIFIER SYSTEM FOR MOTOR IMAGERY TASK CLASSIFICATION / [pt] SISTEMA DE MÚLTIPLOS CLASSIFICADORES PARA CLASSIFICAÇÃO DE TAREFAS DE IMAGINAÇÃO MOTORA ALIMED CELECIA RAMOS 09 August 2017 (has links) [pt] Interfaces Cérebro Computador (BCIs) são sistemas artificiais que permitem a interação entre a pessoa e seu ambiente empregando a tradução de sinais elétricos cerebrais como controle para qualquer dispositivo externo. Um Sistema de neuroreabilitação baseado em EEG pode combinar portabilidade e baixo custo com boa resolução temporal e nenhum risco para a vida do usuário. Este sistema pode estimular a plasticidade cerebral, desde que ofereça confiabilidade no reconhecimento das tarefas de imaginação motora realizadas pelo usuário. Portanto, o objetivo deste trabalho é o projeto de um sistema de aprendizado de máquinas que, baseado no sinal de EEG de somente dois eletrodos, C3 e C4, consiga classificar tarefas de imaginação motora com alta acurácia, robustez às variações do sinal entre experimentos e entre sujeitos, e tempo de processamento razoável. O sistema de aprendizado de máquina proposto é composto de quatro etapas principais: pré-processamento, extração de atributos, seleção de atributos, e classificação. O pré-processamento e extração de atributos são implementados mediante a extração de atributos estatísticos, de potência e de fase das sub-bandas de frequência obtidas utilizando a Wavelet Packet Decomposition. Já a seleção de atributos é efetuada por um Algoritmo Genético e o modelo de classificação é constituído por um Sistema de Múltiplos Classificadores, composto por diferentes classificadores, e combinados por uma rede neural Multi-Layer Perceptron. O sistema foi testado em seis sujeitos de bases de dados obtidas das Competições de BCIs e comparados com trabalhos benchmark da literatura, superando os resultados dos outros métodos. Adicionalmente, um sistema real de BCI para neurorehabilitação foi projetado, desenvolvido e testado, produzindo também bons resultados. / [en] Brain Computer Interfaces (BCIs) are artificial systems that allow the interaction between a person and their environment using the translated brain electrical signals to control any external device. An EEG neurorehabilitation system can combine portability and affordability with good temporal resolution and no health risks to the user. This system can stimulate the brain plasticity, provided that the system offers reliability on the recognition of the motor imagery (MI) tasks performed by the user. Therefore, the aim of this work is the design of a machine learning system that, based on the EEG signal from only C3 and C4 electrodes, can classify MI tasks with high accuracy, robustness to trial and inter-subject signal variations, and reasonable processing time. The proposed machine learning system has four main stages: preprocessing, feature extraction, feature selection, and classification. The preprocessing and feature extraction are implemented by the extraction of statistical, power and phase features of the frequency sub-bands obtained by the Wavelet Packet Decomposition. The feature selection process is effectuated by a Genetic Algorithm and the classifier model is constituted by a Multiple Classifier System composed by different classifiers and combined by a Multilayer Perceptron Neural Network as meta-classifier. The system is tested on six subjects from datasets offered by the BCIs Competitions and compared with benchmark works founded in the literature, outperforming the other methods. In addition, a real BCI system for neurorehabilitation is designed and tested, producing good results as well. [pt] PROCESSAMENTO DE SINAIS [pt] MLP [pt] TECNICAS DE FUSAO [pt] EEG [pt] IMAGINACAO MOTORA [pt] INTERFACE CEREBRO COMPUTADOR [pt] ALGORITMO GENETICO [en] SIGNAL PROCESSING [en] MLP [en] FUSION TECHNIQUES [en] EEG [en] MOTOR IMAGERY [en] BRAIN COMPUTER INTERFACE [en] MULTIPLE CLASSIFIER SYSTEM [en] GENETIC ALGORITHM
19	A scalable evolutionary learning classifier system for knowledge discovery in stream data mining Dam, Hai Huong, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links) Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks. Data mining Action map Classification Data stream Neural network Noisy data Non-stationary environment Reinforcement learning Rule-based system Static environment Stream data mining Supervised learning Distributed data mining Dynamic environment Ensemble learning Evolutionary computation Genetic algorithm Knowledge discovery Learning classifier system Negative correlation learning
20	A scalable evolutionary learning classifier system for knowledge discovery in stream data mining Dam, Hai Huong, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links) Data mining (DM) is the process of finding patterns and relationships in databases. The breakthrough in computer technologies triggered a massive growth in data collected and maintained by organisations. In many applications, these data arrive continuously in large volumes as a sequence of instances known as a data stream. Mining these data is known as stream data mining. Due to the large amount of data arriving in a data stream, each record is normally expected to be processed only once. Moreover, this process can be carried out on different sites in the organisation simultaneously making the problem distributed in nature. Distributed stream data mining poses many challenges to the data mining community including scalability and coping with changes in the underlying concept over time. In this thesis, the author hypothesizes that learning classifier systems (LCSs) - a class of classification algorithms - have the potential to work efficiently in distributed stream data mining. LCSs are an incremental learner, and being evolutionary based they are inherently adaptive. However, they suffer from two main drawbacks that hinder their use as fast data mining algorithms. First, they require a large population size, which slows down the processing of arriving instances. Second, they require a large number of parameter settings, some of them are very sensitive to the nature of the learning problem. As a result, it becomes difficult to choose a right setup for totally unknown problems. The aim of this thesis is to attack these two problems in LCS, with a specific focus on UCS - a supervised evolutionary learning classifier system. UCS is chosen as it has been tested extensively on classification tasks and it is the supervised version of XCS, a state of the art LCS. In this thesis, the architectural design for a distributed stream data mining system will be first introduced. The problems that UCS should face in a distributed data stream task are confirmed through a large number of experiments with UCS and the proposed architectural design. To overcome the problem of large population sizes, the idea of using a Neural Network to represent the action in UCS is proposed. This new system - called NLCS { was validated experimentally using a small fixed population size and has shown a large reduction in the population size needed to learn the underlying concept in the data. An adaptive version of NLCS called ANCS is then introduced. The adaptive version dynamically controls the population size of NLCS. A comprehensive analysis of the behaviour of ANCS revealed interesting patterns in the behaviour of the parameters, which motivated an ensemble version of the algorithm with 9 nodes, each using a different parameter setting. In total they cover all patterns of behaviour noticed in the system. A voting gate is used for the ensemble. The resultant ensemble does not require any parameter setting, and showed better performance on all datasets tested. The thesis concludes with testing the ANCS system in the architectural design for distributed environments proposed earlier. The contributions of the thesis are: (1) reducing the UCS population size by an order of magnitude using a neural representation; (2) introducing a mechanism for adapting the population size; (3) proposing an ensemble method that does not require parameter setting; and primarily (4) showing that the proposed LCS can work efficiently for distributed stream data mining tasks. Data mining Action map Classification Data stream Neural network Noisy data Non-stationary environment Reinforcement learning Rule-based system Static environment Stream data mining Supervised learning Distributed data mining Dynamic environment Ensemble learning Evolutionary computation Genetic algorithm Knowledge discovery Learning classifier system Negative correlation learning

Search results