• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 58
  • 36
  • 23
  • 8
  • 5
  • 5
  • 3
  • 3
  • 2
  • 1
  • Tagged with
  • 152
  • 152
  • 101
  • 101
  • 30
  • 29
  • 27
  • 26
  • 26
  • 25
  • 25
  • 21
  • 21
  • 20
  • 17
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

A Comprehensive Approach to Posterior Jointness Analysis in Bayesian Model Averaging Applications

Crespo Cuaresma, Jesus, Grün, Bettina, Hofmarcher, Paul, Humer, Stefan, Moser, Mathias 03 1900 (has links) (PDF)
Posterior analysis in Bayesian model averaging (BMA) applications often includes the assessment of measures of jointness (joint inclusion) across covariates. We link the discussion of jointness measures in the econometric literature to the literature on association rules in data mining exercises. We analyze a group of alternative jointness measures that include those proposed in the BMA literature and several others put forward in the field of data mining. The way these measures address the joint exclusion of covariates appears particularly important in terms of the conclusions that can be drawn from them. Using a dataset of economic growth determinants, we assess how the measurement of jointness in BMA can affect inference about the structure of bivariate inclusion patterns across covariates. (authors' abstract) / Series: Department of Economics Working Paper Series
82

Selecionando candidatos a descritores para agrupamentos hierárquicos de documentos utilizando regras de associação / Selecting candidate labels for hierarchical document clusters using association rules

Santos, Fabiano Fernandes dos 17 September 2010 (has links)
Uma forma de extrair e organizar o conhecimento, que tem recebido muita atenção nos últimos anos, é por meio de uma representação estrutural dividida por tópicos hierarquicamente relacionados. Uma vez construída a estrutura hierárquica, é necessário encontrar descritores para cada um dos grupos obtidos pois a interpretação destes grupos é uma tarefa complexa para o usuário, já que normalmente os algoritmos não apresentam descrições conceituais simples. Os métodos encontrados na literatura consideram cada documento como uma bag-of-words e não exploram explicitamente o relacionamento existente entre os termos dos documento do grupo. No entanto, essas relações podem trazer informações importantes para a decisão dos termos que devem ser escolhidos como descritores dos nós, e poderiam ser representadas por regras de associação. Assim, o objetivo deste trabalho é avaliar a utilização de regras de associação para apoiar a identificação de descritores para agrupamentos hierárquicos. Para isto, foi proposto o método SeCLAR (Selecting Candidate Labels using Association Rules), que explora o uso de regras de associação para a seleção de descritores para agrupamentos hierárquicos de documentos. Este método gera regras de associação baseadas em transações construídas à partir de cada documento da coleção, e utiliza a informação de relacionamento existente entre os grupos do agrupamento hierárquico para selecionar candidatos a descritores. Os resultados da avaliação experimental indicam que é possível obter uma melhora significativa com relação a precisão e a cobertura dos métodos tradicionais / One way to organize knowledge, that has received much attention in recent years, is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters, since most algorithms do not produce simple descriptions and the interpretation of these clusters is a difficult task for users. The related works consider each document as a bag-of-words and do not explore explicitly the relationship between the terms of the documents. However, these relationships can provide important information to the decision of the terms that must be chosen as descriptors of the nodes, and could be represented by rass. This works aims to evaluate the use of association rules to support the identification of labels for hierarchical document clusters. Thus, this paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical clusters of documents. This method generates association rules based on transactions built from each document in the collection, and uses the information relationship between the nodes of hierarchical clustering to select candidates for labels. The experimental results show that it is possible to obtain a significant improvement with respect to precision and recall of traditional methods
83

Contribution à l'extraction des règles d'association basée sur des préférences / Contribution to the extraction of association rules based on preferences

Bouker, Slim 30 June 2015 (has links)
Résumé indisponible. / Résumé indisponible.
84

Praktické uplatnění technologií data mining ve zdravotních pojišťovnách / Practical applications of data mining technologies in health insurance companies

Kulhavý, Lukáš January 2010 (has links)
This thesis focuses on data mining technology and its possible practical use in the field of health insurance companies. Thesis defines the term data mining and its relation to the term knowledge discovery in databases. The term data mining is explained, inter alia, with methods describing the individual phases of the process of knowledge discovery in databases (CRISP-DM, SEMMA). There is also information about possible practical applications, technologies and products available in the market (both products available free and commercial products). Introduction of the main data mining methods and specific algorithms (decision trees, association rules, neural networks and other methods) serves as a theoretical introduction, on which are the practical applications of real data in real health insurance companies build. These are applications seeking the causes of increased remittances and churn prediction. I have solved these applications in freely-available systems Weka and LISP-Miner. The objective is to introduce and to prove data mining capabilities over this type of data and to prove capabilities of Weka and LISP-Miner systems in solving tasks due to the methodology CRISP-DM. The last part of thesis is devoted the fields of cloud and grid computing in conjunction with data mining. It offers an insight into possibilities of these technologies and their benefits to the technology of data mining. Possibilities of cloud computing are presented on the Amazon EC2 system, grid computing can be used in Weka Experimenter interface.
85

Visualização como suporte à extração e exploração de regras de associação / Vusualization as support to the extraction and exploration of association rules

Claudio Haruo Yamamoto 17 April 2009 (has links)
Desde a definção do problema de obtenção de regras de associação, vários algoritmos eficientes foram introduzidos para tratá-lo. Entretanto, ainda hoje o problema apresenta várias dificuldades práticas para os mineradores, como a determinação de limiares adequados de suporte mínimo e confiança mínima, a manipulação de grandes conjuntos de regras, e a compreensão de regras (especialmente aquelas contendo muitos itens). Para tratar estes problemas, pesquisadores têm investigado a aplicação de técnicas interativas, sumarização (de conjuntos de regras) e representações visuais. Entretanto, nenhuma abordagem na qual os usuários podem entender e controlar o processo por meio da interação com o algoritmo analítico ao longo de sua execução foi introduzida. Neste trabalho, é introduzida uma abordagem interativa para extração e exploração de regras de associação que insere o usuário no processo por meio de: execução interativa do Apriori ; seleção interativa de itemsets freqüentes; extração de regras baseada em itemsets e orientada por agrupamentos de itemsets similares; e exploração de regras aos pares. Para validar a abordagem, foram realizados diversos estudos, apoiados pelo Sistema \'I IND.2\' E, com o objetivo de: comparar a abordagem interativa, sob diversos aspectos, com uma abordagem convencional de obtenção de regras de associação; avaliar o efeito de variar alguns parâmetros do processo nos resultados finais; e mostrar a aplicação dos recursos oferecidos em situações reais e com usuários reais. Os resultados indicam que a abordagem apresentada é adequada, tanto em cenários exploratórios quanto em cenários em que há um direcionamento inicial para o processo, à execução de certas tarefas de extração de regras de associação, pois: provém recursos capazes de evitar execuções inteiras do algoritmo antes que os resultados sejam analisados; gera conjuntos de regras mais compactos; preserva a cobertura de itemsets; favorece a reformulação de tarefas ou a formulação de novas tarefas; e provê meios para comparação visual de regras, aumentando o poder de análise do minerador / Since the definition of the association rule mining problem, many efficient algorithms have been introduced to deal with it. However, the problem still presents many practical difficulties to the miners, such as the determination of suitable minimum support and minimum confidence thresholds, manipulation of large rule sets, and comprehension of rules (specially those containing many items). In order to deal with these problems, researchers have been investigating the application of interactive techniques, sumarization (of rule sets) and visual representations. Nonetheless, no approach in which users can understand and control the process through interaction with the analytical algorithm along its execution has been introduced. We introduce an interactive approach to extract and explore association rules that inserts the user into the process through: interactive execution of the Apriori ; interactive selection of frequent itemsets; itemset-based and cluster-oriented extraction of rules; and pairwise exploration of rules. To validate the approach, several studies have been conducted, supported by the \'I IND.2\' E System, aiming at: comparing the interactive approach, under several aspects, with a conventional approach to obtain association rules; evaluate the effect of different execution parameters in the final results; and illustrate its application in real situations and with real users. Results of these studies indicate that the approach is adequate, both in exploratory scenarios and in scenarios in which there is an initial guidance for the process, to the execution of certain association rule extraction tasks, because: it provides resources to avoid complete algorithm executions before results are analyzed; generates more compact rule sets for exploration; preserves rule diversity; favors the reformulation of tasks; and provides support for rule comparison, enhancing analysis capability for miners
86

Regras de associação aplicadas aos filtros de mensagens e canais de informação do projeto direto / Association rules applied to messages filters and information channel in the direto environment

Frighetto, Michele January 2003 (has links)
Neste trabalho é apresentado um breve estudo sobre o processo de descoberta de conhecimento em banco de dados, com enfoque na etapa de mineração de dados através de regras de associação. Propostas por Agrawal em 1993, num estudo chamado análise de cesta de mercado, as regras de associação representam que com um certo grau de suporte e confiança um conjunto de itens pode estar presente numa transação visto que outro conjunto está presente. A necessidade de análise semelhante às realizadas por Agrawal surgiu em outros campos e estas foram estendidas a outras aplicações. Neste, são apresentadas as principais variações sobre o tema regras de associação encontradas na literatura. É proposta a mineração de dados através de regras de associação sobre filtros de mensagens e canais de informação do software de catálogo, agenda e correio eletrônico Direto. Para as pesquisas são utilizadas três ferramentas: Intelligent Miner, CBA e Magnus Opus. Elas foram aplicadas sobre uma lista de discussão da Linguagem Java, pois o projeto Direto ainda não possui mensagens públicas. As ferramentas possuem características distintas: o Intelligent Miner permite a definição de hierarquias sobre os dados que serão minerados; o Magnus Opus trabalha com diversos filtros e com a definição de intervalos para o tratamento de campos numéricos; o CBA permite que sejam especificados suportes múltiplos para os itens. / This work presents a brief review about knowledge discovery in database having association rules as the data mining process. Association rules were proposed by Agrawal in 1993 in a basket data analysis. Association rules have been extended to other applications because there is a necessity for similar Agrawal’s analysis in different domains. Here are presented some variations proposed in the literature about association rules along with the main algorithms. This work proposes the use of association rules over message filters and information channels from the Direto, which is a catalog, schedule and e-mail software. Three data mining tools were used: Intelligent Miner, CBA and Magnus Opus. They were applied over a Java discussion list because Direto project does not have public messages. Each tool has distinct features: Intelligent Miner allows to define a hierarchy over the data that will be mined; Magnus Opus works with many filters over the data and permits to define ranges over numeric fields and CBA allows to specify multiple minimum support over the items.
87

Redes de regras de associação filtradas e multialvo / Filtered and multi-target association rules networks

Calçada, Dario Brito 21 March 2019 (has links)
A descoberta de Regras de Associação é uma tarefa de mineração de dados que procura identificar padrões em datasets, permitindo, após a sua interpretação, identificar conhecimento específico acerca do problema em análise. A Mineração de Regras de Associação pode ser usada como uma metodologia para descobrir hipóteses ou teorias candidatas em um domínio do conhecimento. No entanto, o processo de Mineração de Regras de Associação gera um grande número de regras superando a capacidade de exploração do usuário. Esse fato pode tornar o processo de análise inviável, além de afetar negativamente o resultado de alguns algoritmos de extração de conhecimento. Diante disso, várias abordagens foram propostas para guiar o usuário na exploração das Regras de Associação descobertas, em especial com a utilização de estruturas de Rede, que permitem analisar as relações existentes entre as regras. Neste contexto, esse trabalho foi motivado pelo potencial uso de Redes na otimização da identificação do conhecimento, em processos de Mineração de Regras de Associação, formulando abordagens explicáveis. Outra motivação surge da lacuna referente ao uso de Redes em tarefas multialvo inerente de várias aplicações do mundo real. O desenvolvimento deste trabalho teve o intento de avançar as pesquisas da área de Mineração de Regras de Associação com o uso de Redes em relação a métodos de geração de hipóteses validáveis com um ou dois itens objetivo, tanto em relação à interpretabilidade como na expressividade das representações construídas. Um Mapeamento Sistemático da literatura da área foi realizado com a finalidade de conhecer o estado da arte sobre como o uso das Redes pode auxiliar nos processos de Mineração de Regras de Associação. Neste trabalho é proposto e desenvolvido um método de seleção e avaliação das medidas de suporte e confiança mínimos referentes a extração de Regras de Associação com o uso de Medidas de Centralidade de Redes, cuja contribuição principal foi a elaboração de um critério objetivo para extração de Regras de Associação. Foram também propostas, desenvolvidas e validadas duas novas Redes, as Redes de Regras de Associação Filtradas (Filtered-ARNs) e as Redes de Regras de Associação Multialvo (MTARNs) que promoveram um impacto positivo na identificação do conhecimento por meio da comprovação matemática da influência entre os elementos de uma Regra de Associação e ampliaram a capacidade de extração do conhecimento em estudos de aplicações multialvo. / The discovery of Association Rules is a data mining task that seeks to identify patterns in datasets, allowing, after its interpretation, to determine specific knowledge about the problem under analysis. Association Rules Mining can be used as a methodology for discovering hypotheses or candidate theories in a knowledge domain. However, the Association Rules Mining process generates a large number of rules that exceed the users ability to exploit. This fact may make the analysis process impracticable, as well as negatively affect the outcome of some knowledge extraction algorithms. Therefore, several approaches were proposed to guide the user in the exploration of the discovered Association Rules, especially with the use of Network structures, which allow to analyze the relations between the rules. In this context, this work was motivated by the potential use of Networks in the optimization of knowledge identification, in Association Rules Mining processes, formulating explanable approaches. Another motivation arises from the gap regarding the use of Networks in multi-target tasks inherent to several real-world applications. The development of this work was intended to advance the research of the Association Rules Mining with the use of Networks with methods of generating validate hypotheses with one or two target items, both about the interpretability and in the expressiveness of representations built. A Systematic Mapping of the literature of the area was carried out with the purpose of knowing the state of the art on how the use of the Networks can help in the Mining processes of Association Rules. In this work, a method of selection and evaluation of the minimum support and trust measures regarding the extraction of Association Rules with the use of Network Centralization Measures was proposed and developed, whose main contribution was the elaboration of an objective criterion for extraction of Association Rules. Two new networks were also introduced, developed and validated, the Filtered Association Rules Networks (Filtered-ARNs) and the Multi-Target Association Rules Networks (MTARNs) that promoted a positive impact on the identification of knowledge through mathematical proof of the influence between the elements of an Association Rule and extended the capacity of knowledge extraction in studies of multi-target applications.
88

如何在資料庫中發掘空間性週期關聯規則--以便利商店交易資料為例 / Data Mining of Spatial Cyclic Association Rules in Databases -- A Convenience Store Transaction Data Example

郭家佑, Guo, Jia-You Unknown Date (has links)
資料發掘目前在傳統關聯式資料庫相關議題上已有不少研究,但如果能再整合空間和時間要素進來,將可從資料中發掘出更明確、更具體的知識。以往常使用統計分析方法來分析空間資料,不幸的是,統計分析方法仍有許多問題亟待解決。而Han等人利用概念樹發掘「多層次關聯規則」的技術已相當成熟,值得學習。在時間方面,另外有學者提出「週期關聯規則」的觀念。於是本研究便想結合以上研究的優點,希望能創造出新的應用。 本研究試著將「空間特性」和「週期關聯規則」結合,提出「空間性週期關聯規則」的想法。首先從相關文獻中分別瞭解目前空間、時間資料發掘領域的研究現況,從而整合相關研究,提出研究架構。再以動態網頁技術配合假想的台北市便利商店交易資料庫,發展出一套雛型系統(目前只能作單一項目之間的關聯),以驗證本架構的可行性。最後提出進一步的研究建議,以供後續研究參考。 / There have been a lot of research about data mining in relational database. We can mine more specific and concrete knowledge in transaction databases by further considering spatial and temporal dimension. Until now the statistical spatial analysis has been one common technique for analyzing spatial data. However , there are still many remaining problems. Han et al. used concept hierarchies to mine multiple-level association rules. Their ideas are great and worth our learning. On the other hand , some scholars proposed the notion of cyclic association rules. Therefore , we combine the merits of these researches to discover more meaningful knowledge. In this research , we try to integrate the ideas of spatial associations with cyclic association and propose the idea of spatial cyclic association rules. First , we survey these researches in the fields of spatial and temporal data mining. A framework is then proposed. Finally , we implement a prototype system in WWW ( 1-itemset and 2-itemset only now).
89

Association Pattern Analysis for Pattern Pruning, Clustering and Summarization

Li, Chung Lam 12 September 2008 (has links)
Automatic pattern mining from databases and the analysis of the discovered patterns for useful information are important and in great demand in science, engineering and business. Today, effective pattern mining methods, such as association rule mining and pattern discovery, have been developed and widely used in various challenging industrial and business applications. These methods attempt to uncover the valuable information trapped in large collections of raw data. The patterns revealed provide significant and useful information for decision makers. Paradoxically, pattern mining itself can produce such huge amounts of data that poses a new knowledge management problem: to tackle thousands or even more patterns discovered and held in a data set. Unlike raw data, patterns often overlap, entangle and interrelate to each other in the databases. The relationship among them is usually complex and the notion of distance between them is difficult to qualify and quantify. Such phenomena pose great challenges to the existing data mining discipline. In this thesis, the analysis of patterns after their discovery by existing pattern mining methods is referred to as pattern post-analysis since the patterns to be analyzed are first discovered. Due to the overwhelmingly huge volume of discovered patterns in pattern mining, it is virtually impossible for a human user to manually analyze them. Thus, the valuable trapped information in the data is shifted to a large collection of patterns. Hence, to automatically analyze the patterns discovered and present the results in a user-friendly manner such as pattern post-analysis is badly needed. This thesis attempts to solve the problems listed below. It addresses 1) the important factors contributing to the interrelating relationship among patterns and hence more accurate measurements of distances between them; 2) the objective pruning of redundant patterns from the discovered patterns; 3) the objective clustering of the patterns into coherent pattern clusters for better organization; 4) the automatic summarization of each pattern cluster for human interpretation; and 5) the application of pattern post-analysis to large database analysis and data mining. In this thesis, the conceptualization, theoretical formulation, algorithm design and system development of pattern post-analysis of categorical or discrete-valued data is presented. It starts with presenting a natural dual relationship between patterns and data. The relationship furnishes an explicit one-to-one correspondence between a pattern and its associated data and provides a base for an effective analysis of patterns by relating them back to the data. It then discusses the important factors that differentiate patterns and formulates the notion of distances among patterns using a formal graphical approach. To accurately measure the distances between patterns and their associated data, both the samples and the attributes matched by the patterns are considered. To achieve this, the distance measure between patterns has to account for the differences of their associated data clusters at the attribute value (i.e. item) level. Furthermore, to capture the degree of variation of the items matched by patterns, entropy-based distance measures are developed. It attempts to quantify the uncertainty of the matched items. Such distances render an accurate and robust distance measurement between patterns and their associated data. To understand the properties and behaviors of the new distance measures, the mathematical relation between the new distances and the existing sample-matching distances is analytically derived. The new pattern distances based on the dual pattern-data relationship and their related concepts are used and adapted to pattern pruning, pattern clustering and pattern summarization to furnish an integrated, flexible and generic framework for pattern post-analysis which is able to meet the challenges of today’s complex real-world problems. In pattern pruning, the system defines the amount of redundancy of a pattern with respect to another pattern at the item level. Such definition generalizes the classical closed itemset pruning and maximal itemset pruning which define redundancy at the sample level. A new generalized itemset pruning method is developed using the new definition. It includes the closed and maximal itemsets as two extreme special cases and provides a control parameter for the user to adjust the tradeoff between the number of patterns being pruned and the amount of information loss after pruning. The mathematical relation between the proposed generalized itemsets and the existing closed and maximal itemsets are also given. In pattern clustering, a dual clustering method, known as simultaneous pattern and data clustering, is developed using two common yet very different types of clustering algorithms: hierarchical clustering and k-means clustering. Hierarchical clustering generates the entire clustering hierarchy but it is slow and not scalable. K-means clustering produces only a partition so it is fast and scalable. They can be used to handle most real-world situations (i.e. speed and clustering quality). The new clustering method is able to simultaneously cluster patterns as well as their associated data while maintaining an explicit pattern-data relationship. Such relationship enables subsequent analysis of individual pattern clusters through their associated data clusters. One important analysis on a pattern cluster is pattern summarization. In pattern summarization, to summarize each pattern cluster, a subset of the representative patterns will be selected for the cluster. Again, the system measures how representative a pattern is at the item level and takes into account how the patterns overlap each other. The proposed method, called AreaCover, is extended from the well-known RuleCover algorithm. The relationship between the two methods is given. AreaCover is less prone to yield large, trivial patterns (large patterns may cause summary that is too general and not informative enough), and the resulting summary is more concise (with less duplicated attribute values among summary patterns) and more informative (describing more attribute values in the cluster and have longer summary patterns). The thesis also covers the implementation of the major ideas outlined in the pattern post-analysis framework in an integrated software system. It ends with a discussion on the experimental results of pattern post-analysis on both synthetic and real-world benchmark data. Compared with the existing systems, the new methodology that this thesis presents stands out, possessing significant and superior characteristics in pattern post-analysis and decision support.
90

Association Pattern Analysis for Pattern Pruning, Clustering and Summarization

Li, Chung Lam 12 September 2008 (has links)
Automatic pattern mining from databases and the analysis of the discovered patterns for useful information are important and in great demand in science, engineering and business. Today, effective pattern mining methods, such as association rule mining and pattern discovery, have been developed and widely used in various challenging industrial and business applications. These methods attempt to uncover the valuable information trapped in large collections of raw data. The patterns revealed provide significant and useful information for decision makers. Paradoxically, pattern mining itself can produce such huge amounts of data that poses a new knowledge management problem: to tackle thousands or even more patterns discovered and held in a data set. Unlike raw data, patterns often overlap, entangle and interrelate to each other in the databases. The relationship among them is usually complex and the notion of distance between them is difficult to qualify and quantify. Such phenomena pose great challenges to the existing data mining discipline. In this thesis, the analysis of patterns after their discovery by existing pattern mining methods is referred to as pattern post-analysis since the patterns to be analyzed are first discovered. Due to the overwhelmingly huge volume of discovered patterns in pattern mining, it is virtually impossible for a human user to manually analyze them. Thus, the valuable trapped information in the data is shifted to a large collection of patterns. Hence, to automatically analyze the patterns discovered and present the results in a user-friendly manner such as pattern post-analysis is badly needed. This thesis attempts to solve the problems listed below. It addresses 1) the important factors contributing to the interrelating relationship among patterns and hence more accurate measurements of distances between them; 2) the objective pruning of redundant patterns from the discovered patterns; 3) the objective clustering of the patterns into coherent pattern clusters for better organization; 4) the automatic summarization of each pattern cluster for human interpretation; and 5) the application of pattern post-analysis to large database analysis and data mining. In this thesis, the conceptualization, theoretical formulation, algorithm design and system development of pattern post-analysis of categorical or discrete-valued data is presented. It starts with presenting a natural dual relationship between patterns and data. The relationship furnishes an explicit one-to-one correspondence between a pattern and its associated data and provides a base for an effective analysis of patterns by relating them back to the data. It then discusses the important factors that differentiate patterns and formulates the notion of distances among patterns using a formal graphical approach. To accurately measure the distances between patterns and their associated data, both the samples and the attributes matched by the patterns are considered. To achieve this, the distance measure between patterns has to account for the differences of their associated data clusters at the attribute value (i.e. item) level. Furthermore, to capture the degree of variation of the items matched by patterns, entropy-based distance measures are developed. It attempts to quantify the uncertainty of the matched items. Such distances render an accurate and robust distance measurement between patterns and their associated data. To understand the properties and behaviors of the new distance measures, the mathematical relation between the new distances and the existing sample-matching distances is analytically derived. The new pattern distances based on the dual pattern-data relationship and their related concepts are used and adapted to pattern pruning, pattern clustering and pattern summarization to furnish an integrated, flexible and generic framework for pattern post-analysis which is able to meet the challenges of today’s complex real-world problems. In pattern pruning, the system defines the amount of redundancy of a pattern with respect to another pattern at the item level. Such definition generalizes the classical closed itemset pruning and maximal itemset pruning which define redundancy at the sample level. A new generalized itemset pruning method is developed using the new definition. It includes the closed and maximal itemsets as two extreme special cases and provides a control parameter for the user to adjust the tradeoff between the number of patterns being pruned and the amount of information loss after pruning. The mathematical relation between the proposed generalized itemsets and the existing closed and maximal itemsets are also given. In pattern clustering, a dual clustering method, known as simultaneous pattern and data clustering, is developed using two common yet very different types of clustering algorithms: hierarchical clustering and k-means clustering. Hierarchical clustering generates the entire clustering hierarchy but it is slow and not scalable. K-means clustering produces only a partition so it is fast and scalable. They can be used to handle most real-world situations (i.e. speed and clustering quality). The new clustering method is able to simultaneously cluster patterns as well as their associated data while maintaining an explicit pattern-data relationship. Such relationship enables subsequent analysis of individual pattern clusters through their associated data clusters. One important analysis on a pattern cluster is pattern summarization. In pattern summarization, to summarize each pattern cluster, a subset of the representative patterns will be selected for the cluster. Again, the system measures how representative a pattern is at the item level and takes into account how the patterns overlap each other. The proposed method, called AreaCover, is extended from the well-known RuleCover algorithm. The relationship between the two methods is given. AreaCover is less prone to yield large, trivial patterns (large patterns may cause summary that is too general and not informative enough), and the resulting summary is more concise (with less duplicated attribute values among summary patterns) and more informative (describing more attribute values in the cluster and have longer summary patterns). The thesis also covers the implementation of the major ideas outlined in the pattern post-analysis framework in an integrated software system. It ends with a discussion on the experimental results of pattern post-analysis on both synthetic and real-world benchmark data. Compared with the existing systems, the new methodology that this thesis presents stands out, possessing significant and superior characteristics in pattern post-analysis and decision support.

Page generated in 0.0355 seconds