21 |
A cluster classification method for the extraction of knowledge from text documentsSaad Mohammed, Fathi Hassan January 2008 (has links)
No description available.
|
22 |
Applying the XCS Learning Classifier System to continuous-valued data-mining problemsWyatt, David Ian January 2004 (has links)
No description available.
|
23 |
A correlation framework for continuous user authentication using data miningSingh, Harjit January 2003 (has links)
The increasing security breaches revealed in recent surveys and security threats reported in the media reaffirms the lack of current security measures in IT systems. While most reported work in this area has focussed on enhancing the initial login stage in order to counteract against unauthorised access, there is still a problem detecting when an intruder has compromised the front line controls. This could pose a senous threat since any subsequent indicator of an intrusion in progress could be quite subtle and may remain hidden to the casual observer. Having passed the frontline controls and having the appropriate access privileges, the intruder may be in the position to do virtually anything without further challenge. This has caused interest'in the concept of continuous authentication, which inevitably involves the analysis of vast amounts of data. The primary objective of the research is to develop and evaluate a suitable correlation engine in order to automate the processes involved in authenticating and monitoring users in a networked system environment. The aim is to further develop the Anoinaly Detection module previously illustrated in a PhD thesis [I] as part of the conceptual architecture of an Intrusion Monitoring System (IMS) framework.
|
24 |
Formal concept matching and reinforcement learning in adaptive information retrievalRajapakse, Rohana Kithsiri January 2003 (has links)
The superiority of the human brain in information retrieval (IR) tasks seems to come firstly from its ability to read and understand the concepts, ideas or meanings central to documents, in order to reason out the usefulness of documents to information needs, and secondly from its ability to learn from experience and be adaptive to the environment. In this work we attempt to incorporate these properties into the development of an IR model to improve document retrieval. We investigate the applicability of concept lattices, which are based on the theory of Formal Concept Analysis (FCA), to the representation of documents. This allows the use of more elegant representation units, as opposed to keywords, in order to better capture concepts/ideas expressed in natural language text. We also investigate the use of a reinforcement leaming strategy to learn and improve document representations, based on the information present in query statements and user relevance feedback. Features or concepts of each document/query, formulated using FCA, are weighted separately with respect to the documents they are in, and organised into separate concept lattices according to a subsumption relation. Furthen-nore, each concept lattice is encoded in a two-layer neural network structure known as a Bidirectional Associative Memory (BAM), for efficient manipulation of the concepts in the lattice representation. This avoids implementation drawbacks faced by other FCA-based approaches. Retrieval of a document for an information need is based on concept matching between concept lattice representations of a document and a query. The learning strategy works by making the similarity of relevant documents stronger and non-relevant documents weaker for each query, depending on the relevance judgements of the users on retrieved documents. Our approach is radically different to existing FCA-based approaches in the following respects: concept formulation; weight assignment to object-attribute pairs; the representation of each document in a separate concept lattice; and encoding concept lattices in BAM structures. Furthermore, in contrast to the traditional relevance feedback mechanism, our learning strategy makes use of relevance feedback information to enhance document representations, thus making the document representations dynamic and adaptive to the user interactions. The results obtained on the CISI, CACM and ASLIB Cranfield collections are presented and compared with published results. In particular, the performance of the system is shown to improve significantly as the system learns from experience.
|
25 |
An adaptive anomaly detection system using data mining and an artificial immune systemOng, Arlene January 2007 (has links)
No description available.
|
26 |
Named entity recognition : challenges in document annotation, gazetteer construction and disambiguationZhang, Ziqi January 2013 (has links)
The 'information explosion' has generated unprecedented amount of published information that is still growing at an astonishing rate. As the amount of information grows, the problem of managing the information becomes challenging. A key to this challenge rests on the technology of Information Extraction, which automatically transforms un-structured textual data into structured representation that can be interpreted and manipulated by machines. It is recognised that a fundamental task in Information Extraction is Named Entity Recognition, the goals of which are identifying references of named entities in unstructured documents, and classifying them into pre-defined semantic categories. Further, due to the polysemous nature of natural language, name references are often ambiguous. Resolving ambiguity concerns recognising the true referent entity of a name reference, essentially a further named entity 'recognition' step and often a compulsory process required by tasks built on top of NER. This research presents a body of work aimed at addressing three research questions for NER. The first question concerns effective and efficient methods for training data annotation, which is the task of creating essential training examples for machine learning based NER methods. The second question studies automatically generating background knowledge for NER in the form of gazetteers, which are often critical resources to improve the performance of NER methods. The third question addresses resolving ambiguous name references, a further 'recognition' step that ensures the output of NER to be usable by many complex tasks and applications. For each research question, the related literature has been carefully studied and their limitations have been identified and discussed. New hypotheses and methods have been pro-posed, leading to a number of contributions: - an approach to training data annotation for supervised NER methods, based on the study of annotator suitability and suitability based task allocation; - a method of automatically expanding existing gazetteers of pre-defined semantic categories exploiting the structure and knowledge of Wikipedia; - a method of automatically generating untyped gazetteers for NER based on the 'topic-representativeness' of words in documents; - a method of named entity disambiguation based on maximising the semantic relatedness between candidate entities in a text discourse; - a review of lexical semantic relatedness measures; and a new lexical semantic relatedness measure that harnesses knowledge from different resources. The proposed methods have been evaluated by carefully designed experiments, following the standard practice in each related research area. The results have confirmed the validity of their corresponding hypotheses, as well as the empirical effectiveness of these methods. Overall it is believed that this research has made solid contribution to the re-search of NER and related areas.
|
27 |
Data mining integrated architecture for shop floor control systemSrinivas January 2007 (has links)
Organizations are becoming increasingly complex with emphasis on decentralized decision making. Recent advances in the field of information systems and networking have greatly changed the characteristics of demands from shop floor of an enterprise. It is not only viewed as a production centre but is also considered as a nucleus of information and knowledge. This knowledge may consist of system's behaviour, limitation, capability, etc. Therefore, the manufacturing system must have an information system that facilitates generation, sharing and integration of knowledge for effective and efficient decision making. In the competitive environment, organizational knowledge is not perpetual, but has a lifecycle. Organizational knowledge value deteriorates with time due to changes in the competitive environment. Enterprises generate an avalanche of data and information that may be critical and valuable in nature but hard to manage and leverage properly. Effective decision making in a data-intensive environment is likely to determine future business activities and differentiate a company from its competitors. Knowledge generation, accumulation and maintaining knowledge bases are time-consuming processes but essential to development and successful application of a knowledge base system.
|
28 |
Perceptron-like large margin classifiersTsampouka, Petroula January 2007 (has links)
We address the problem of binary linear classification with emphasis on algorithms that lead to separation of the data with large margins. We motivate large margin classification from statistical learning theory and review two broad categories of large margin classifiers, namely Support Vector Machines which operate in a batch setting and Perceptron-like algorithms which operate in an incremental setting and are driven by their mistakes. We subsequently examine in detail the class of Perceptron-like large margin classifiers. The algorithms belonging to this category are further classified on the basis of criteria such as the type of the misclassification condition or the behaviour of the effective learning rate, i.e. the ratio of the learning rate to the length of the weight vector, as a function of the number of mistakes. Moreover, their convergence is examined with a prominent role in such an investigation played by the notion of stepwise convergence which offers the possibility of a rather unified approach. Whenever possible, mistake bounds implying convergence in a finite number of steps are derived and discussed. Two novel families of approximate maximum margin algorithms called CRAMMA and MICRA are introduced and analysed theoretically. In addition, in order to deal with linearly inseparable data a soft margin approach for Perceptron-like large margin classifiers is discussed. Finally, a series of experiments on artificial as well as real-world data employing the newly introduced algorithms are conducted allowing a detailed comparative assessment of their performance with respect to other well-known Perceptron-like large margin classifiers and state-of-the-art Support Vector Machines.
|
29 |
Object-oriented data miningRawles, Simon Alan January 2007 (has links)
Attempts to overcome limitations in the attribute-value representation for machine learning has led to much interest in learning from structured data, concentrated in the research areas of inductive logic programming (ILP) and multi-relational data mining (MDRM). The expressivenessa nd encapsulationo f the object-oriented data model has led to its widespread adoption in software and database design. The considerable congruence between this model and individual-centred models in inductive logic programming presents new opportunities for mining object data specific to its domain. This thesis investigates the use of object-orientation in knowledge representation for multi-relational data mining. We propose a language for expressing object model metaknowledge and use it to extend the reasoning mechanisms of an object-oriented logic. A refinement operator is then defined and used for feature search in a object-oriented propositionalisation-based ILP classifier. An algorithm is proposed for reducing the large number of redundant features typical in propositionalisation. A data mining system based on the refinement operator is implemented and demonstrated on a real-world computational linguistics task and compared with a conventional ILP system. Keywords: Object orientation; data mining; inductive logic programming; propositionalisation; refinement operators; feature reduction
|
30 |
Robust methods in data miningMwitondi, K. S. January 2003 (has links)
The thesis focuses on two problems in Data Mining, namely clustering, an exploratory technique to group observations in similar groups, and classification, a technique used to assign new observations to one of the known groups. A thorough study of the two problems, which are also known in the Machine Learning literature as unsupervised and supervised classification respectively, is central to decision making in different fields - the thesis seeks to contribute towards that end. In the first part of the thesis we consider whether robust methods can be applied to clustering - in particular, we perform clustering on fuzzy data using two methods originally developed for outlier-detection. The fuzzy data clusters are characterised by two intersecting lines such that points belonging to the same cluster lie close to the same line. This part of the thesis also investigates a new application of finite mixture of normals to the fuzzy data problem. The second part of the thesis addresses issues relating to classification - in particular, classification trees and boosting. The boosting algorithm is a relative newcomer to the classification portfolio that seeks to enhance the performance of classifiers by iteratively re-weighting the data according to their previous classification status. We explore the performance of "boosted" trees (mainly stumps) based on 3 different models all characterised by a sine-wave boundary. We also carry out a thorough study of the factors that affect the boosting algorithm. Other results include a new look at the concept of randomness in the classification context, particularly because the form of randomness in both training and testing data has directly affects the accuracy and reliability of domain- partitioning rules. Further, we provide statistical interpretations of some of the classification-related concepts, originally used in Computer Science, Machine Learning and Artificial Intelligence. This is important since there exists a need for a unified interpretation of some of the "landmark" concepts in various disciplines, as a step forward towards seeking the principles that can guide and strengthen practical applications.
|
Page generated in 0.0229 seconds