31 |
SQL Implementation of Value Reduction with Multiset Decision TablesChen, Chen 16 May 2014 (has links)
No description available.
|
32 |
Automating debugging through data mining / Automatisering av felsökning genom data miningThun, Julia, Kadouri, Rebin January 2017 (has links)
Contemporary technological systems generate massive quantities of log messages. These messages can be stored, searched and visualized efficiently using log management and analysis tools. The analysis of log messages offer insights into system behavior such as performance, server status and execution faults in web applications. iStone AB wants to explore the possibility to automate their debugging process. Since iStone does most parts of their debugging manually, it takes time to find errors within the system. The aim was therefore to find different solutions to reduce the time it takes to debug. An analysis of log messages within access – and console logs were made, so that the most appropriate data mining techniques for iStone’s system would be chosen. Data mining algorithms and log management and analysis tools were compared. The result of the comparisons showed that the ELK Stack as well as a mixture between Eclat and a hybrid algorithm (Eclat and Apriori) were the most appropriate choices. To demonstrate their feasibility, the ELK Stack and Eclat were implemented. The produced results show that data mining and the use of a platform for log analysis can facilitate and reduce the time it takes to debug. / Dagens system genererar stora mängder av loggmeddelanden. Dessa meddelanden kan effektivt lagras, sökas och visualiseras genom att använda sig av logghanteringsverktyg. Analys av loggmeddelanden ger insikt i systemets beteende såsom prestanda, serverstatus och exekveringsfel som kan uppkomma i webbapplikationer. iStone AB vill undersöka möjligheten att automatisera felsökning. Eftersom iStone till mestadels utför deras felsökning manuellt så tar det tid att hitta fel inom systemet. Syftet var att därför att finna olika lösningar som reducerar tiden det tar att felsöka. En analys av loggmeddelanden inom access – och konsolloggar utfördes för att välja de mest lämpade data mining tekniker för iStone’s system. Data mining algoritmer och logghanteringsverktyg jämfördes. Resultatet av jämförelserna visade att ELK Stacken samt en blandning av Eclat och en hybrid algoritm (Eclat och Apriori) var de lämpligaste valen. För att visa att så är fallet så implementerades ELK Stacken och Eclat. De framställda resultaten visar att data mining och användning av en plattform för logganalys kan underlätta och minska den tid det tar för att felsöka.
|
33 |
Improving RDF data with data miningAbedjan, Ziawasch January 2014 (has links)
Linked Open Data (LOD) comprises very many and often large public data sets and knowledge bases. Those datasets are mostly presented in the RDF triple structure of subject, predicate, and object, where each triple represents a statement or fact. Unfortunately, the heterogeneity of available open data requires significant integration steps before it can be used in applications. Meta information, such as ontological definitions and exact range definitions of predicates, are desirable and ideally provided by an ontology. However in the context of LOD, ontologies are often incomplete or simply not available. Thus, it is useful to automatically generate meta information, such as ontological dependencies, range definitions, and topical classifications.
Association rule mining, which was originally applied for sales analysis on transactional databases, is a promising and novel technique to explore such data. We designed an adaptation of this technique for min-ing Rdf data and introduce the concept of “mining configurations”, which allows us to mine RDF data sets in various ways. Different configurations enable us to identify schema and value dependencies that in combination result in interesting use cases. To this end, we present rule-based approaches for auto-completion, data enrichment, ontology improvement, and query relaxation. Auto-completion remedies the problem of inconsistent ontology usage, providing an editing user with a sorted list of commonly used predicates. A combination of different configurations step extends this approach to create completely new facts for a knowledge base. We present two approaches for fact generation, a user-based approach where a user selects the entity to be amended with new facts and a data-driven approach where an algorithm discovers entities that have to be amended with missing facts.
As knowledge bases constantly grow and evolve, another approach to improve the usage of RDF data is to improve existing ontologies. Here, we present an association rule based approach to reconcile ontology and data. Interlacing different mining configurations, we infer an algorithm to discover synonymously used predicates. Those predicates can be used to expand query results and to support users during query formulation.
We provide a wide range of experiments on real world datasets for each use case. The experiments and evaluations show the added value of association rule mining for the integration and usability of RDF data and confirm the appropriateness of our mining configuration methodology. / Linked Open Data (LOD) umfasst viele und oft sehr große öffentlichen Datensätze und Wissensbanken, die hauptsächlich in der RDF Triplestruktur bestehend aus Subjekt, Prädikat und Objekt vorkommen. Dabei repräsentiert jedes Triple einen Fakt. Unglücklicherweise erfordert die Heterogenität der verfügbaren öffentlichen Daten signifikante Integrationsschritte bevor die Daten in Anwendungen genutzt werden können. Meta-Daten wie ontologische Strukturen und Bereichsdefinitionen von Prädikaten sind zwar wünschenswert und idealerweise durch eine Wissensbank verfügbar. Jedoch sind Wissensbanken im Kontext von LOD oft unvollständig oder einfach nicht verfügbar. Deshalb ist es nützlich automatisch Meta-Informationen, wie ontologische Abhängigkeiten, Bereichs-und Domänendefinitionen und thematische Assoziationen von Ressourcen generieren zu können.
Eine neue und vielversprechende Technik um solche Daten zu untersuchen basiert auf das entdecken von Assoziationsregeln, welche ursprünglich für Verkaufsanalysen in transaktionalen Datenbanken angewendet wurde. Wir haben eine Adaptierung dieser Technik auf RDF Daten entworfen und stellen das Konzept der Mining Konfigurationen vor, welches uns befähigt in RDF Daten auf unterschiedlichen Weisen Muster zu erkennen. Verschiedene Konfigurationen erlauben uns Schema- und Wertbeziehungen zu erkennen, die für interessante Anwendungen genutzt werden können. In dem Sinne, stellen wir assoziationsbasierte Verfahren für eine Prädikatvorschlagsverfahren, Datenvervollständigung, Ontologieverbesserung und Anfrageerleichterung vor.
Das Vorschlagen von Prädikaten behandelt das Problem der inkonsistenten Verwendung von Ontologien, indem einem Benutzer, der einen neuen Fakt einem Rdf-Datensatz hinzufügen will, eine sortierte Liste von passenden Prädikaten vorgeschlagen wird. Eine Kombinierung von verschiedenen Konfigurationen erweitert dieses Verfahren sodass automatisch komplett neue Fakten für eine Wissensbank generiert werden. Hierbei stellen wir zwei Verfahren vor, einen nutzergesteuertenVerfahren, bei dem ein Nutzer die Entität aussucht die erweitert werden soll und einen datengesteuerten Ansatz, bei dem ein Algorithmus selbst die Entitäten aussucht, die mit fehlenden Fakten erweitert werden.
Da Wissensbanken stetig wachsen und sich verändern, ist ein anderer Ansatz um die Verwendung von RDF Daten zu erleichtern die Verbesserung von Ontologien. Hierbei präsentieren wir ein Assoziationsregeln-basiertes Verfahren, der Daten und zugrundeliegende Ontologien zusammenführt. Durch die Verflechtung von unterschiedlichen Konfigurationen leiten wir einen neuen Algorithmus her, der gleichbedeutende Prädikate entdeckt. Diese Prädikate können benutzt werden um Ergebnisse einer Anfrage zu erweitern oder einen Nutzer während einer Anfrage zu unterstützen.
Für jeden unserer vorgestellten Anwendungen präsentieren wir eine große Auswahl an Experimenten auf Realweltdatensätzen. Die Experimente und Evaluierungen zeigen den Mehrwert von Assoziationsregeln-Generierung für die Integration und Nutzbarkeit von RDF Daten und bestätigen die Angemessenheit unserer konfigurationsbasierten Methodologie um solche Regeln herzuleiten.
|
34 |
Association Rule Based ClassificationPalanisamy, Senthil Kumar 03 May 2006 (has links)
In this thesis, we focused on the construction of classification models based on association rules. Although association rules have been predominantly used for data exploration and description, the interest in using them for prediction has rapidly increased in the data mining community. In order to mine only rules that can be used for classification, we modified the well known association rule mining algorithm Apriori to handle user-defined input constraints. We considered constraints that require the presence/absence of particular items, or that limit the number of items, in the antecedents and/or the consequents of the rules. We developed a characterization of those itemsets that will potentially form rules that satisfy the given constraints. This characterization allows us to prune during itemset construction itemsets such that neither they nor any of their supersets will form valid rules. This improves the time performance of itemset construction. Using this characterization, we implemented a classification system based on association rules and compared the performance of several model construction methods, including CBA, and several model deployment modes to make predictions. Although the data mining community has dealt only with the classification of single-valued attributes, there are several domains in which the classification target is set-valued. Hence, we enhanced our classification system with a novel approach to handle the prediction of set-valued class attributes. Since the traditional classification accuracy measure is inappropriate in this context, we developed an evaluation method for set-valued classification based on the E-Measure. Furthermore, we enhanced our algorithm by not relying on the typical support/confidence framework, and instead mining for the best possible rules above a user-defined minimum confidence and within a desired range for the number of rules. This avoids long mining times that might produce large collections of rules with low predictive power. For this purpose, we developed a heuristic function to determine an initial minimum support and then adjusted it using a binary search strategy until a number of rules within the given range was obtained. We implemented all of our techniques described above in WEKA, an open source suite of machine learning algorithms. We used several datasets from the UCI Machine Learning Repository to test and evaluate our techniques.
|
35 |
Suporte a sistemas de auxílio ao diagnóstico e de recuperação de imagens por conteúdo usando mineração de regras de associação / Supporting Computer-Aided Diagnosis and Content-Based Image Retrieval Systems through Association Rule MiningRibeiro, Marcela Xavier 16 December 2008 (has links)
Neste trabalho, a mineração de regras de associação é utilizada para dar suporte a dois tipos de sistemas médicos: os sistemas de busca por conteúdo em imagens (Content-based Image Retrieval - CBIR) e os sistemas de auxílio ao diagnóstico (Computer Aided Diagnosis - CAD). Na busca por conteúdo, regras de associação são empregadas para reduzir a dimensionalidade dos vetores de características que representam as imagens e para diminuir o ``gap semântico\'\', que existe entre as características de baixo nível das imagens e seu significado semântico. O algoritmo StARMiner (Statistical Association Rule Miner) foi desenvolvido para associar características de baixo nível das imagens com o seu significado semântico, sendo também utilizado para realizar seleção de características em bases de imagens médicas, melhorando a precisão dos sistemas CBIR. Para dar suporte aos sistemas CAD, o método IDEA (Image Diagnosis Enhancement through Association rules) foi desenvolvido. Nesse método regras de associação são empregadas para sugerir uma segunda opinião ou diagnóstico preliminar de uma nova imagem para o radiologista. A segunda opinião automaticamente gerada pelo método pode acelerar o processo de diagnóstico de uma imagem ou reforçar uma hipótese, trazendo ao especialista médico um apoio estatístico da situação sendo analisada. Dois novos algoritmos foram propostos: um para pré-processar as características de baixo nível das imagens médicas e, o outro, para propor diagnósticos baseados em regras de associação. Vários experimentos foram realizados para validar os métodos desenvolvidos. Os experimentos realizados indicam que o uso de regras de associação pode contribuir para melhorar a busca por conteúdo e o diagnóstico de imagens médicas, consistindo numa poderosa ferramenta para descoberta de padrões em sistemas médicos / In this work we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to diminish the semantic gap that exists between low-level features and its high-level semantical meaning. The StARMiner (Statistical Association Rule Miner) algorithm was developed to associate low-level features with their semantical meaning. StARMiner is also employed to perform feature selection in medical image datasets, improving the precision of CBIR systems. To improve CAD systems, we developed the IDEA (Image Diagnosis Enhancement through Association rules) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can accelerate the process of diagnosing or strengthen a hypothesis, giving to the physician a statistical support to the decision making process. Two new algorithms are developed to support the IDEA method: to pre-process low-level features and to propose a diagnosis based on association rules. We performed several experiments to validate the developed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems
|
36 |
Rule Mining and Sequential Pattern Based Predictive Modeling with EMR DataAbar, Orhan 01 January 2019 (has links)
Electronic medical record (EMR) data is collected on a daily basis at hospitals and other healthcare facilities to track patients’ health situations including conditions, treatments (medications, procedures), diagnostics (labs) and associated healthcare operations. Besides being useful for individual patient care and hospital operations (e.g., billing, triaging), EMRs can also be exploited for secondary data analyses to glean discriminative patterns that hold across patient cohorts for different phenotypes. These patterns in turn can yield high level insights into disease progression with interventional potential. In this dissertation, using a large scale realistic EMR dataset of over one million patients visiting University of Kentucky healthcare facilities, we explore data mining and machine learning methods for association rule (AR) mining and predictive modeling with mood and anxiety disorders as use-cases. Our first work involves analysis of existing quantitative measures of rule interestingness to assess how they align with a practicing psychiatrist’s sense of novelty/surprise corresponding to ARs identified from EMRs. Our second effort involves mining causal ARs with depression and anxiety disorders as target conditions through matching methods accounting for computationally identified confounding attributes. Our final effort involves efficient implementation (via GPUs) and application of contrast pattern mining to predictive modeling for mental conditions using various representational methods and recurrent neural networks. Overall, we demonstrate the effectiveness of rule mining methods in secondary analyses of EMR data for identifying causal associations and building predictive models for diseases.
|
37 |
Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical ApplicationsHe, Yuanchen 04 December 2006 (has links)
Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies.
|
38 |
A data mining approach to ontology learning for automatic content-related question-answering in MOOCsShatnawi, Safwan January 2016 (has links)
The advent of Massive Open Online Courses (MOOCs) allows massive volume of registrants to enrol in these MOOCs. This research aims to offer MOOCs registrants with automatic content related feedback to fulfil their cognitive needs. A framework is proposed which consists of three modules which are the subject ontology learning module, the short text classification module, and the question answering module. Unlike previous research, to identify relevant concepts for ontology learning a regular expression parser approach is used. Also, the relevant concepts are extracted from unstructured documents. To build the concept hierarchy, a frequent pattern mining approach is used which is guided by a heuristic function to ensure that sibling concepts are at the same level in the hierarchy. As this process does not require specific lexical or syntactic information, it can be applied to any subject. To validate the approach, the resulting ontology is used in a question-answering system which analyses students' content-related questions and generates answers for them. Textbook end of chapter questions/answers are used to validate the question-answering system. The resulting ontology is compared vs. the use of Text2Onto for the question-answering system, and it achieved favourable results. Finally, different indexing approaches based on a subject's ontology are investigated when classifying short text in MOOCs forum discussion data; the investigated indexing approaches are: unigram-based, concept-based and hierarchical concept indexing. The experimental results show that the ontology-based feature indexing approaches outperform the unigram-based indexing approach. Experiments are done in binary classification and multiple labels classification settings . The results are consistent and show that hierarchical concept indexing outperforms both concept-based and unigram-based indexing. The BAGGING and random forests classifiers achieved the best result among the tested classifiers.
|
39 |
Elicitation of Protein-Protein Interactions from Biomedical Literature Using Association Rule DiscoverySamuel, Jarvie John 08 1900 (has links)
Extracting information from a stack of data is a tedious task and the scenario is no different in proteomics. Volumes of research papers are published about study of various proteins in several species, their interactions with other proteins and identification of protein(s) as possible biomarker in causing diseases. It is a challenging task for biologists to keep track of these developments manually by reading through the literatures. Several tools have been developed by computer linguists to assist identification, extraction and hypotheses generation of proteins and protein-protein interactions from biomedical publications and protein databases. However, they are confronted with the challenges of term variation, term ambiguity, access only to abstracts and inconsistencies in time-consuming manual curation of protein and protein-protein interaction repositories. This work attempts to attenuate the challenges by extracting protein-protein interactions in humans and elicit possible interactions using associative rule mining on full text, abstracts and captions from figures available from publicly available biomedical literature databases. Two such databases are used in our study: Directory of Open Access Journals (DOAJ) and PubMed Central (PMC). A corpus is built using articles based on search terms. A dataset of more than 38,000 protein-protein interactions from the Human Protein Reference Database (HPRD) is cross-referenced to validate discovered interactive pairs. A set of an optimal size of possible binary protein-protein interactions is generated to be made available for clinician or biological validation. A significant change in the number of new associations was found by altering the thresholds for support and confidence metrics. This study narrows down the limitations for biologists in keeping pace with discovery of protein-protein interactions via manually reading the literature and their needs to validate each and every possible interaction.
|
40 |
Suporte a sistemas de auxílio ao diagnóstico e de recuperação de imagens por conteúdo usando mineração de regras de associação / Supporting Computer-Aided Diagnosis and Content-Based Image Retrieval Systems through Association Rule MiningMarcela Xavier Ribeiro 16 December 2008 (has links)
Neste trabalho, a mineração de regras de associação é utilizada para dar suporte a dois tipos de sistemas médicos: os sistemas de busca por conteúdo em imagens (Content-based Image Retrieval - CBIR) e os sistemas de auxílio ao diagnóstico (Computer Aided Diagnosis - CAD). Na busca por conteúdo, regras de associação são empregadas para reduzir a dimensionalidade dos vetores de características que representam as imagens e para diminuir o ``gap semântico\'\', que existe entre as características de baixo nível das imagens e seu significado semântico. O algoritmo StARMiner (Statistical Association Rule Miner) foi desenvolvido para associar características de baixo nível das imagens com o seu significado semântico, sendo também utilizado para realizar seleção de características em bases de imagens médicas, melhorando a precisão dos sistemas CBIR. Para dar suporte aos sistemas CAD, o método IDEA (Image Diagnosis Enhancement through Association rules) foi desenvolvido. Nesse método regras de associação são empregadas para sugerir uma segunda opinião ou diagnóstico preliminar de uma nova imagem para o radiologista. A segunda opinião automaticamente gerada pelo método pode acelerar o processo de diagnóstico de uma imagem ou reforçar uma hipótese, trazendo ao especialista médico um apoio estatístico da situação sendo analisada. Dois novos algoritmos foram propostos: um para pré-processar as características de baixo nível das imagens médicas e, o outro, para propor diagnósticos baseados em regras de associação. Vários experimentos foram realizados para validar os métodos desenvolvidos. Os experimentos realizados indicam que o uso de regras de associação pode contribuir para melhorar a busca por conteúdo e o diagnóstico de imagens médicas, consistindo numa poderosa ferramenta para descoberta de padrões em sistemas médicos / In this work we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to diminish the semantic gap that exists between low-level features and its high-level semantical meaning. The StARMiner (Statistical Association Rule Miner) algorithm was developed to associate low-level features with their semantical meaning. StARMiner is also employed to perform feature selection in medical image datasets, improving the precision of CBIR systems. To improve CAD systems, we developed the IDEA (Image Diagnosis Enhancement through Association rules) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can accelerate the process of diagnosing or strengthen a hypothesis, giving to the physician a statistical support to the decision making process. Two new algorithms are developed to support the IDEA method: to pre-process low-level features and to propose a diagnosis based on association rules. We performed several experiments to validate the developed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems
|
Page generated in 0.1306 seconds