201 |
An online and adaptive signature-based approach for intrusion detection using learning classifier systemsShafi, Kamran, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links)
This thesis presents the case of dynamically and adaptively learning signatures for network intrusion detection using genetic based machine learning techniques. The two major criticisms of the signature based intrusion detection systems are their i) reliance on domain experts to handcraft intrusion signatures and ii) inability to detect previously unknown attacks or the attacks for which no signatures are available at the time. In this thesis, we present a biologically-inspired computational approach to address these two issues. This is done by adaptively learning maximally general rules, which are referred to as signatures, from network traffic through a supervised learning classifier system, UCS. The rules are learnt dynamically (i.e., using machine intelligence and without the requirement of a domain expert), and adaptively (i.e., as the data arrives without the need to relearn the complete model after presenting each data instance to the current model). Our approach is hybrid in that signatures for both intrusive and normal behaviours are learnt. The rule based profiling of normal behaviour allows for anomaly detection in that the events not matching any of the rules are considered potentially harmful and could be escalated for an action. We study the effect of key UCS parameters and operators on its performance and identify areas of improvement through this analysis. Several new heuristics are proposed that improve the effectiveness of UCS for the prediction of unseen and extremely rare intrusive activities. A signature extraction system is developed that adaptively retrieves signatures as they are discovered by UCS. The signature extraction algorithm is augmented by introducing novel subsumption operators that minimise overlap between signatures. Mechanisms are provided to adapt the main algorithm parameters to deal with online noisy and imbalanced class data. The performance of UCS, its variants and the signature extraction system is measured through standard evaluation metrics on a publicly available intrusion detection dataset provided during the 1999 KDD Cup intrusion detection competition. We show that the extended UCS significantly improves test accuracy and hit rate while significantly reducing the rate of false alarms and cost per example scores than the standard UCS. The results are competitive to the best systems participated in the competition in addition to our systems being online and incremental rule learners. The signature extraction system built on top of the extended UCS retrieves a magnitude smaller rule set than the base UCS learner without any significant performance loss. We extend the evaluation of our systems to real time network traffic which is captured from a university departmental server. A methodology is developed to build fully labelled intrusion detection dataset by mixing real background traffic with attacks simulated in a controlled environment. Tools are developed to pre-process the raw network data into feature vector format suitable for UCS and other related machine learning systems. We show the effectiveness of our feature set in detecting payload based attacks.
|
202 |
Evolving complexity towards risk : a massive scenario generation approach for evaluating advanced air traffic management conceptsAlam, Sameer, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2008 (has links)
Present day air traffc control is reaching its operational limits and accommodating future traffic growth will be a challenging task for air traffic service providers and airline operators. Free Flight is a proposed transition from a highly-structured and centrally-controlled air traffic system to a self-optimized and highly-distributed system. In Free Flight, pilots will have the flexibility of real-time trajectory planning and dynamic route optimization given airspace constraints (traffic, weather etc.). A variety of advanced air traffc management (ATM) concepts are proposed as enabling technologies for the realization of Free Flight. Since these concepts can be exposed to unforeseen and challenging scenarios in Free Flight, they need to be validated and evaluated in order to implement the most effective systems in the field. Evaluation of advanced ATM concepts is a challenging task due to the limitations in the existing scenario generation methodologies and limited availability of a common platform (air traffic simulator) where diverse ATM concepts can be modeled and evaluated. Their rigorous evaluation on safety metrics, in a variety of complex scenarios, can provide an insight into their performance, which can help improve upon them while developing new ones. In this thesis, I propose a non-propriety, non-commercial air traffic simulation system, with a novel representation of airspace, which can prototype advanced ATM concepts such as conflict detection and resolution, airborne weather avoidance and cockpit display of traffic information. I then propose a novel evolutionary computation methodology to algorithmically generate a massive number of conflict scenarios of increasing complexity in order to evaluate conflict detection algorithms. I illustrate the methodology in detail by quantitative evaluation of three conflict detection algorithms, from the literature, on safety metrics. I then propose the use of data mining techniques for the discovery of interesting relationships, that may exist implicitly, in the algorithm's performance data. The data mining techniques formulate the conflict characteristics, which may lead to algorithm failure, using if-then rules. Using the rule sets for each algorithm, I propose an ensemble of conflict detection algorithms which uses a switch mechanism to direct the subsequent conflict probes to an algorithm which is less vulnerable to failure in a given conflict scenario. The objective is to form a predictive model for algorithm's vulnerability which can then be included in an ensemble that can minimize the overall vulnerability of the system. In summary, the contributions of this thesis are: 1. A non-propriety, non-commercial air traffic simulation system with a novel representation of airspace for efficient modeling of advanced ATM concepts. 2. An Ant-based dynamic weather avoidance algorithm for traffic-constrained enroute airspace. 3. A novel representation of 4D air traffic scenario that allows the use of an evolutionary computation methodology to evolve complex conflict scenarios for the evaluation of conflict detection algorithms. 4. An evaluation framework where scenario generation, scenario evaluation and scenario evolution processes can be carried out in an integrated manner for rigorous evaluation of advanced ATM concepts. 5. A methodology for forming an intelligent ensemble of conflict detection algorithms by data mining the scenario space.
|
203 |
Audio-video based handwritten mathematical content recognitionVemulapalli, Smita 12 November 2012 (has links)
Recognizing handwritten mathematical content is a challenging problem, and more so when such content appears in classroom videos. However, given the fact that in such videos the handwritten text and the accompanying audio refer to the same content, a combination of video and audio based recognizer has the potential to significantly improve the content recognition accuracy. This dissertation, using a combination of video and audio based recognizers, focuses on improving the recognition accuracy associated with handwritten mathematical content in such videos.
Our approach makes use of a video recognizer as the primary recognizer and a multi-stage assembly, developed as part of this research, is used to facilitate effective combination with an audio recognizer. Specifically, we address the following challenges related to audio-video based handwritten mathematical content recognition: (1) Video Preprocessing - generates a timestamped sequence of segmented characters from the classroom video in the face of occlusions and shadows caused by the instructor, (2) Ambiguity Detection - determines the subset of input characters that may have been incorrectly recognized by the video based recognizer and forwards this subset for disambiguation, (3) A/V Synchronization - establishes correspondence between the handwritten character and the spoken content, (4) A/V Combination - combines the synchronized outputs from the video and audio based recognizers and generates the final recognized character, and (5) Grammar Assisted A/V Based Mathematical Content Recognition - utilizes a base mathematical speech grammar for both character and structure disambiguation. Experiments conducted using videos recorded in a classroom-like environment demonstrate the significant improvements in recognition accuracy that can be achieved using our techniques.
|
204 |
Design of Comprehensible Learning Machine Systems for Protein Structure PredictionHu, Hae-Jin 06 August 2007 (has links)
With the efforts to understand the protein structure, many computational approaches have been made recently. Among them, the Support Vector Machine (SVM) methods have been recently applied and showed successful performance compared with other machine learning schemes. However, despite the high performance, the SVM approaches suffer from the problem of understandability since it is a black-box model; the predictions made by SVM cannot be interpreted as biologically meaningful way. To overcome this limitation, a new association rule based classifier PCPAR was devised based on the existing classifier, CPAR to handle the sequential data. The performance of the PCPAR was improved more by designing the following two hybrid schemes. The PCPAR/SVM method is a parallel combination of the PCPAR and the SVM and the PCPAR_SVM method is a sequential combination of the PCPAR and the SVM. To understand the SVM prediction, the SVM_PCPAR scheme was developed. The experimental result presents that the PCPAR scheme shows better performance with respect to the accuracy and the number of generated patterns than CPAR method. The PCPAR/SVM scheme presents better performance than the PCPAR, PCPAR_SVM or the SVM_PCPAR and almost equal performance to the SVM. The generated patterns are easily understandable and biologically meaningful. The system sturdiness evaluation and the ROC curve analysis proved that this new scheme is robust and competent.
|
205 |
Semi-automated search for abnormalities in mammographic X-ray imagesBarnett, Michael Gordon 24 October 2006
Breast cancer is the most commonly diagnosed cancer among Canadian women; x-ray mammography is the leading screening technique for early detection. This work introduces a semi-automated technique for analyzing mammographic x-ray images to measure their degree of suspiciousness for containing abnormalities. The designed system applies the discrete wavelet transform to parse the images and extracts statistical features that characterize an images content, such as the mean intensity and the skewness of the intensity. A naïve Bayesian classifier uses these features to classify the images, achieving sensitivities as high as 99.5% for a data set containing 1714 images. To generate confidence levels, multiple classifiers are combined in three possible ways: a sequential series of classifiers, a vote-taking scheme of classifiers, and a network of classifiers tuned to detect particular types of abnormalities. The third method offers sensitivities of 99.85% or higher with specificities above 60%, making it an ideal candidate for pre-screening images. Two confidence level measures are developed: first, a real confidence level measures the true probability that an image was suspicious; and second, a normalized confidence level assumes that normal and suspicious images were equally likely to occur. The second confidence measure allows for more flexibility and could be combined with other factors, such as patient age and family history, to give a better true confidence level than assuming a uniform incidence rate. The system achieves sensitivities exceeding those in other current approaches while maintaining reasonable specificity, especially for the sequential series of classifiers and for the network of tuned classifiers.
|
206 |
Holistic Face Recognition By Dimension ReductionGul, Ahmet Bahtiyar 01 January 2003 (has links) (PDF)
Face recognition is a popular research area where there are different
approaches studied in the literature. In this thesis, a holistic Principal
Component Analysis (PCA) based method, namely Eigenface method is
studied in detail and three of the methods based on the Eigenface method
are compared. These are the Bayesian PCA where Bayesian classifier is
applied after dimension reduction with PCA, the Subspace Linear
Discriminant Analysis (LDA) where LDA is applied after PCA and
Eigenface where Nearest Mean Classifier applied after PCA. All the
three methods are implemented on the Olivetti Research Laboratory
(ORL) face database, the Face Recognition Technology (FERET)
database and the CNN-TURK Speakers face database. The results are
compared with respect to the effects of changes in illumination, pose and
aging. Simulation results show that Subspace LDA and Bayesian PCA
perform slightly well with respect to PCA under changes in pose / however, even Subspace LDA and Bayesian PCA do not perform well
under changes in illumination and aging although they perform better
than PCA.
|
207 |
Ensembles of Artificial Neural Networks: Analysis and Development of Design MethodsTorres Sospedra, Joaquín 30 September 2011 (has links)
This thesis is focused on the analysis and development of Ensembles of Neural Networks. An ensemble is a system in which a set of heterogeneous Artificial Neural Networks are generated in order to outperform the Single network based classifiers. However, this proposed thesis differs from others related to ensembles of neural networks [1, 2, 3, 4, 5, 6, 7] since it is organized as follows.
In this thesis, firstly, an ensemble methods comparison has been introduced in order to provide a rank-based list of the best ensemble methods existing in the bibliography. This comparison has been split into two researches which represents two chapters of the thesis.
Moreover, there is another important step related to the ensembles of neural networks which is how to combine the information provided by the neural networks in the ensemble. In the bibliography, there are some alternatives to apply in order to get an accurate combination of the information provided by the heterogeneous set of networks. For this reason, a combiner comparison has also been introduced in this thesis.
Furthermore, Ensembles of Neural Networks is only a kind of Multiple Classifier System based on neural networks. However, there are other alternatives to generate MCS based on neural networks which are quite different to Ensembles. The most important systems are Stacked Generalization and Mixture of Experts. These two systems will be also analysed in this thesis and new alternatives are proposed.
One of the results of the comparative research developed is a deep understanding of the field of ensembles. So new ensemble methods and combiners can be designed after analyzing the results provided by the research performed. Concretely, two new ensemble methods, a new ensemble methodology called Cross-Validated Boosting and two reordering algorithms are proposed in this thesis. The best overall results are obtained by the ensemble methods proposed.
Finally, all the experiments done have been carried out on a common experimental setup. The experiments have been repeated ten times on nineteen different datasets from the UCI repository in order to validate the results. Moreover, the procedure applied to set up specific parameters is quite similar in all the experiments performed.
It is important to conclude by remarking that the main contributions are:
1) An experimental setup to prepare the experiments which can be applied for further comparisons.
2) A guide to select the most appropriate methods to build and combine ensembles and multiple classifiers systems.
3) New methods proposed to build ensembles and other multiple classifier systems.
|
208 |
Semi-automated search for abnormalities in mammographic X-ray imagesBarnett, Michael Gordon 24 October 2006 (has links)
Breast cancer is the most commonly diagnosed cancer among Canadian women; x-ray mammography is the leading screening technique for early detection. This work introduces a semi-automated technique for analyzing mammographic x-ray images to measure their degree of suspiciousness for containing abnormalities. The designed system applies the discrete wavelet transform to parse the images and extracts statistical features that characterize an images content, such as the mean intensity and the skewness of the intensity. A naïve Bayesian classifier uses these features to classify the images, achieving sensitivities as high as 99.5% for a data set containing 1714 images. To generate confidence levels, multiple classifiers are combined in three possible ways: a sequential series of classifiers, a vote-taking scheme of classifiers, and a network of classifiers tuned to detect particular types of abnormalities. The third method offers sensitivities of 99.85% or higher with specificities above 60%, making it an ideal candidate for pre-screening images. Two confidence level measures are developed: first, a real confidence level measures the true probability that an image was suspicious; and second, a normalized confidence level assumes that normal and suspicious images were equally likely to occur. The second confidence measure allows for more flexibility and could be combined with other factors, such as patient age and family history, to give a better true confidence level than assuming a uniform incidence rate. The system achieves sensitivities exceeding those in other current approaches while maintaining reasonable specificity, especially for the sequential series of classifiers and for the network of tuned classifiers.
|
209 |
A Clustering-based Approach to Document-Category IntegrationCheng, Tsang-Hsiang 04 September 2003 (has links)
E-commerce applications generate and consume tremendous amount of online information that is typically available as textual documents. Observations of textual document management practices by organizations or individuals suggest the popularity of using categories (or category hierarchies) to organize, archive and access documents. On the other hand, an organization (or individual) also constantly acquires new documents from various Internet sources. Consequently, integration of relevant categorized documents into existent categories of the organization (or individual) becomes an important issue in the e-commerce era. Existing categorization-based approach for document-category integration (specifically, the Enhanced Naïve Bayes classifier) incurs several limitations, including homogeneous assumption on categorization schemes used by master and source catalogs and requirement for a large-sized master categories as training data. In this study, we developed a Clustering-based Category Integration (CCI) technique to deal with integrating two document catalogs each of which is organized non-hierarchically (i.e., in a flat set). Using the Enhanced Naïve Bayes classifier as benchmarks, the empirical evaluation results showed that the proposed CCI technique appeared to improve the effectiveness of document-category integration accuracy in different integration scenarios and seemed to be less sensitive to the size of master categories than the categorization-based approach.
Furthermore, to integrate the document categories that are organized hierarchically, we proposed a Clustering-based category-Hierarchy Integration (referred to as CHI) technique extended the CCI technique and for category-hierarchy integration. The empirical evaluation results showed that the CHI technique appeared to improve the effectiveness of hierarchical document-category integration than that attained by CCI under homogeneous and comparable scenarios.
|
210 |
SVM-BASED ROBUST TEMPLATE DESIGN FOR CELLULAR NEURAL NETWORKS IMPLEMENTING AN ARBITRARY BOOLEAN FUNCTIONTeng, Wei-chih 27 June 2005 (has links)
In this thesis, the geometric margin is used for the first time as the robustness indicator of an uncoupled cellular neural network implementing a given Boolean function. First, robust template design for uncoupled cellular neural networks implementing linearly separable Boolean functions by support vector machines is proposed. A fast sequential minimal optimization algorithm is presented to find maximal margin classifiers, which in turn determine the robust templates. Some general properties of robust templates are investigated. An improved CFC algorithm implementing an arbitrarily given Boolean function is proposed. Two illustrative examples are provided to demonstrate the validity of the proposed method.
|
Page generated in 0.0331 seconds