Global ETD Search

21	A Data Mining Approach to New Library-Book Recommendations Lai, Yu-Ting 11 August 2003 (has links) In this thesis, we propose a data mining approach to recommending library new books that have never been rated or borrowed by users. In our problem context, users are characterized by their demographic attributes, and concept hierarchies can be defined for some of these demographic attributes. Books are assigned to the base categories of taxonomy. The proposed approach starts with the identification of the type of users who are interested in some specific type of books. We call such knowledge generalized profile association rules. Less interesting or redundant generalized profile association rules are then pruned to form a concise rule set. The resultant rule set is then used for promotion of new books. We develop a new definition of rule interestingness with respect to book recommendation, propose an approximation scheme for estimating the interestingness of a rule, and construct a scheme to effectively conduct new book recommendation by using the interesting rules. We finally apply the book circulation data of a university library to the proposed approach for performance evaluation. data mining generalized profile association rules concept hierarchies
22	An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams Peng, Wei-hau 25 June 2009 (has links) Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many applications of using association rules in data streams, such as market analysis, network security, sensor networks and web tracking. Mining closed frequent itemsets is a further work of mining association rules, which aims to find the subsets of frequent itemsets that could extract all frequent itemsets. Formally, a closed frequent itemset is an frequent itemset which has no superset with the same support as it. Since data streams are continuous, high-speed, and unbounded, archiving everything from data streams is impossible. That is, we can only scan once for the data streams and it is a main-memory database. Therefore, previous algorithms to mine closed frequent itemsets in the traditional database are not suitable for data streams. On the other hand, many applications are interested in the most recent data, and there is a model to deal with the most recent data in data streams, called emph{Sliding Window Model}, which acquires the recent data with a window size meets this characteristic. One of well-known algorithms for mining closed frequent itemsets which based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. Therefore, in this thesis, we propose a sliding window approach, the Subset-Lattice algorithm, which embeds the subset property into the lattice structure to efficiently mine closed frequent itemsets. Basically, Our proposed algorithm considers five kinds of set concepts : (1) equivalent, (2) superset, (3) subset, (4) intersection, (5) empty relation, when data items are inserted. We judge closed frequent itemsets without generating unclosed frequent itemsets by these five kinds of set concepts. Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithm will not reconstruct the whole lattice structure. Therefore, our Subset-Lattice algorithm is more efficient than the Moment algorithm. Furthermore, we use the bit-pattern to represent the itemsets, and use bit-operations to speed up the set-checking. From our simulation results, we show that our Subset-Lattice algorithm needs less memory and less processing time than the NewMoment algorithm. When window slides, the execution time could be saved up to 50\%. frequent itemset closed frequent itemset Data Streams Association Rules
23	Topics in Association Rules Shaikh, Mateen 21 June 2013 (has links) Association rules are a useful concept in data mining with the goal of summa- rizing the strong patterns that exist in data. We have identified several issues in mining association rules and addressed them in three main areas. The first area we explore is standardized interestingness measures. Different interestingness measures exist on different ranges, and interpreting them can be subtly problematic. We standardize several interestingness measures and show how these are useful to consider in association rule mining in three examples. A second area we address is incomplete transactions. By applying statistical methods in new ways to association rules, we provide a more comprehensive means of analyzing incomplete transactions. We also describe how to find families of distributions for interestingness measure values when transactions are incomplete. Finally, we address the common result of mining: a plethora of association rules. Unlike methods which attempt to reduce the number of resulting rules, we harness this large quantity to find a higher-level set of patterns. / NSERC Discovery Grant and OMRI Early Researcher Award Association Rules Data Mining Statistics Missing Data Hierarchies Clustering
24	Efficient mining of interesting emerging patterns and their effective use in classification Fan, Hongjian Unknown Date (has links) (PDF) Knowledge Discovery in Databases (KDD), or Data Mining is used to discover interesting or useful patterns and relationships in data, with an emphasis on large volume of observational databases. Among many other types of information (knowledge) that can be discovered in data, patterns that are expressed in terms of features are popular because they can be understood and used directly by people. The recently proposed Emerging Pattern (EP) is one type of such knowledge patterns. Emerging Patterns are sets of items (conjunctions of attribute values) whose frequency change significantly from one dataset to another. They are useful as a means of discovering distinctions inherently present amongst a collection of datasets and have been shown to be a powerful method for constructing accurate classifiers. (For complete abstract open document)
25	Interesting Association Rules Mining Based on Improved Rarity Algorithm Xiang, Lan January 2018 (has links) With the rapid development of science and technology, our society has been in the big data era. In human activities, we produce a lot of data in every second and every minute, what contain much information. Then, how to select the useful information from those complicated data is a significant issue. So the association rules mining, a technique of mining patterns or associations between itemsets, comes into being. And this technique aims to find some important associations in data to get useful knowledge. Nowadays, most scholars at home and abroad focus on the frequent pattern mining. However, it is undeniable that the rare pattern mining also plays an important role in many areas, such as the medical, financial, and scientific field. Comparing with frequent pattern mining, studying rare pattern mining is more valuable, because it tends to find unknown, unexpected, and more interesting rules. But the study of rare pattern mining is little difficult because of the scarcity of data used for verifying rules. In the frequent pattern mining, there are two general algorithms of discovering frequent itemsets, i.e., Apriori, the earliest algorithm which is proposed by R.Agrawal in 1994, and FP-Tree, the improved algorithm which reduced the time complexity. And in rare pattern mining, there are also two algorithms, Arima and Rarity, what are similar to Apriori and FP-Tree algorithms, but they still exist some problems, for example, Arima is time-consuming because of repeatedly scanning the large database, and Rarity is space-consuming because of the establishment of the full-combination tree. Therefore, based on the Rarity algorithm, this report presents an improved method to efficiently discover interesting association rules among rare itemsets and aims to get a balance between time and space. It is a top-down strategy which uses the graph structure to indicate all combinations of existing items, defines pattern matrix to record itemsets, and combines the hash table to accelerate calculation process. This method decreases both the time cost and the space cost when comparing with Arima, and reduces the space waste to solve the problem of Rarity, but its searching time of mining rare itemsets is more than Rarity, and we verified the feasibility of this algorithm only on abstract and small databases. Thus in the future, on the one hand, we will continue improving our method to explore how to decrease the searching time in the process and adjust the hash function to optimize the space utilization. And on the other hand, we will apply our method to actual large databases, such as the clinical database of the diabetic patients to mine association rules in diabetic complications. Association rules mining Rare pattern mining Computer Systems Datorsystem
26	Descoberta direta e eficiente de regras de associação ótimas / Discovery direct and efficient of optimal association rules Alinson Sousa de Assunção 16 December 2011 (has links) Um dos principais interesses na descoberta do conhecimento e mineração de dados é a indução de regras de associação. Regras de associação caracterizam as relações entre os dados a partir de um conjunto de dados estruturado com transações, onde cada transação contém um subconjunto de itens. Seja X e Y dois conjuntos de itens disjuntos, então a regra X → Y define um relacionamento, isto é, a dependência ou a co-ocorrência entre os conjuntos X e Y. Um dos algoritmos mais conhecidos para geração de regras de associação é o algoritmo Apriori. Ele explora regras de associação que respeitam o limiar suporte mínimo, ou seja, as regras devem aparecer em uma quantidade mínima de transações. Esse limiar tem a capacidade de controlar a quantidade de regras extraídas durante a mineração. Entretanto, a frequência ou suporte não consegue medir o nível de interesse de uma regra. Para medir a importância ou interesse de uma regra em relação a outras foram desenvolvidas medidas de interesse. Tais medidas são calculadas a partir das frequências dos conjuntos de itens X, Y e do par XY. Apesar das medidas de interesse realizarem uma filtragem das regras desinteressantes, elas não acarretam na diminuição no tempo de execução da mineração. Para vencer essa dificuldade, técnicas que exploram diretamente regras de associação ótimas foram desenvolvidas. Um conjunto de regras de associação ótimas é um conjunto de regras que otimiza uma determinada medida de interesse. Na literatura existem muitos trabalhos que buscam esse tipo de conjunto de regras de forma direta e eficiente. O trabalho corrente segue esta mesma direção e visou a melhoria dessa tarefa por descobrir uma quantidade arbitrária de regras de associação ótimas. As abordagens anteriores apresentam um entrave em especial, que é a utilização do algoritmo Apriori. Tal técnica realiza uma busca em largura sobre os conjuntos de itens. No entanto, as técnicas mais promissoras que descobrem regras ótimas realizam busca em profundidade sobre o espaço de busca de regras. Em virtude dessa característica, neste trabalho foi adotada a técnica FP-growth, que realiza uma busca em profundidade sobre os conjuntos de itens explorados. Além da adoção da técnica FP-growth, foram desenvolvidas novas estratégias de poda e uma nova estratégia de busca na travessia do espaço de regras. Todas essas inovações foram adicionadas aos algoritmos desenvolvidos no corrente trabalho e proporcionaram melhor eficiência (tempo de execução) em relação ao algoritmo baseline em todos os testes. Tais testes foram realizados sobre conjuntos de dados reais e artificiais. / The induction of association rules is one of the main interests in knowledge discovery and data mining. Association rules describe the relationships between data from a transactional dataset, so that each transaction contains a subset of items. Let X and Y be two disjoint itemsets, then any rule X → Y defines a relationship that represents the dependence or co-occurrence between itemsets X and Y. Apriori is the best-known algorithm to generate association rules. It generates association rules that satisfy a user defined minimum support threshold. This means the rules should occur at least in an arbitrary number of transactions from a dataset. This threshold limits the number of association rules generated by Apriori. Yet, it is not possible to measure the interest of a rule through support. For that, interestingness measures were developed to assess the importance or interest of a rule. The values of these interestingness measures are obtained through frequencies of X, Y and XY. However, it is still an expensive task mining all the association rules and then filter them according to an interestingness measure. To overcome this difficulty, techniques to induce optimal association rules have been developed. Optimal association rules are a ruleset that optimize an arbitrary interestingness measure. In the literature, there are many papers which aim at searching for optimal association rules directly and efficiently. The current MSc thesis follows this direction, aiming at improving this objective. Previous approaches share one obstacle in particular: the use of Apriori. This algorithm performs a breadth-first search on the itemsets space. However, the most promising techniques to find optimal rules perform a depth-first search on the space of rules. Hence, in this research we adopted the FP-growth algorithm, which performs a depth-first search on the itemsets space. Besides using this algorithm, new rule pruning techniques and a new search space traversing on the space rules were developed. The algorithms developed in the current research contain all these innovations. In all tests, the proposed algorithms surpassed the baseline algorithms in terms of efficiency. These tests were conducted on real and articial datasets. Mineração de dados Regras de associação Association rules Data mining
27	Using nuclear receptor interactions as biomarkers for metabolic syndrome Hettne, Kristina January 2003 (has links) Metabolic syndrome is taking epidemic proportions, especially in developed countries. Each risk factor component of the syndrome independently increases the risk of developing coronary artery disease. The risk factors are obesity, dyslipidemia, hypertension, diabetes type 2, insulin resistance, and microalbuminuria. Nuclear receptors is a family of receptors that has recently received a lot of attention due to their possible involvement in metabolic syndrome. Putting the receptors into context with their co-factors and ligands may reveal therapeutic targets not found by studying the receptors alone. Therefore, in this thesis, interactions between genes in nuclear receptor pathways were analysed with the goal of investigating if these interactions can supply leads to biomarkers for metabolic syndrome. Metabolic syndrome donor gene expression data from the BioExpressä, database was analysed with the APRIORI algorithm (Agrawal et al. 1993) for generating and mining association rules. No association rules were found to function as biomarkers for metabolic syndrome, but the resulting rules show that the data mining technique successfully found associations between genes in signaling pathways. metabolic syndrome pathways association rules Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
28	Methodology For Generating High-Confidence Cost-Sensitive Rules For Classification Bakshi, Arjun 21 October 2013 (has links) No description available. Computer Science Cost sensitive High Confidence Association Rules Classification Non forced
29	Redes de regras de associação filtradas e multialvo / Filtered and multi-target association rules networks Calçada, Dario Brito 21 March 2019 (has links) A descoberta de Regras de Associação é uma tarefa de mineração de dados que procura identificar padrões em datasets, permitindo, após a sua interpretação, identificar conhecimento específico acerca do problema em análise. A Mineração de Regras de Associação pode ser usada como uma metodologia para descobrir hipóteses ou teorias candidatas em um domínio do conhecimento. No entanto, o processo de Mineração de Regras de Associação gera um grande número de regras superando a capacidade de exploração do usuário. Esse fato pode tornar o processo de análise inviável, além de afetar negativamente o resultado de alguns algoritmos de extração de conhecimento. Diante disso, várias abordagens foram propostas para guiar o usuário na exploração das Regras de Associação descobertas, em especial com a utilização de estruturas de Rede, que permitem analisar as relações existentes entre as regras. Neste contexto, esse trabalho foi motivado pelo potencial uso de Redes na otimização da identificação do conhecimento, em processos de Mineração de Regras de Associação, formulando abordagens explicáveis. Outra motivação surge da lacuna referente ao uso de Redes em tarefas multialvo inerente de várias aplicações do mundo real. O desenvolvimento deste trabalho teve o intento de avançar as pesquisas da área de Mineração de Regras de Associação com o uso de Redes em relação a métodos de geração de hipóteses validáveis com um ou dois itens objetivo, tanto em relação à interpretabilidade como na expressividade das representações construídas. Um Mapeamento Sistemático da literatura da área foi realizado com a finalidade de conhecer o estado da arte sobre como o uso das Redes pode auxiliar nos processos de Mineração de Regras de Associação. Neste trabalho é proposto e desenvolvido um método de seleção e avaliação das medidas de suporte e confiança mínimos referentes a extração de Regras de Associação com o uso de Medidas de Centralidade de Redes, cuja contribuição principal foi a elaboração de um critério objetivo para extração de Regras de Associação. Foram também propostas, desenvolvidas e validadas duas novas Redes, as Redes de Regras de Associação Filtradas (Filtered-ARNs) e as Redes de Regras de Associação Multialvo (MTARNs) que promoveram um impacto positivo na identificação do conhecimento por meio da comprovação matemática da influência entre os elementos de uma Regra de Associação e ampliaram a capacidade de extração do conhecimento em estudos de aplicações multialvo. / The discovery of Association Rules is a data mining task that seeks to identify patterns in datasets, allowing, after its interpretation, to determine specific knowledge about the problem under analysis. Association Rules Mining can be used as a methodology for discovering hypotheses or candidate theories in a knowledge domain. However, the Association Rules Mining process generates a large number of rules that exceed the users ability to exploit. This fact may make the analysis process impracticable, as well as negatively affect the outcome of some knowledge extraction algorithms. Therefore, several approaches were proposed to guide the user in the exploration of the discovered Association Rules, especially with the use of Network structures, which allow to analyze the relations between the rules. In this context, this work was motivated by the potential use of Networks in the optimization of knowledge identification, in Association Rules Mining processes, formulating explanable approaches. Another motivation arises from the gap regarding the use of Networks in multi-target tasks inherent to several real-world applications. The development of this work was intended to advance the research of the Association Rules Mining with the use of Networks with methods of generating validate hypotheses with one or two target items, both about the interpretability and in the expressiveness of representations built. A Systematic Mapping of the literature of the area was carried out with the purpose of knowing the state of the art on how the use of the Networks can help in the Mining processes of Association Rules. In this work, a method of selection and evaluation of the minimum support and trust measures regarding the extraction of Association Rules with the use of Network Centralization Measures was proposed and developed, whose main contribution was the elaboration of an objective criterion for extraction of Association Rules. Two new networks were also introduced, developed and validated, the Filtered Association Rules Networks (Filtered-ARNs) and the Multi-Target Association Rules Networks (MTARNs) that promoted a positive impact on the identification of knowledge through mathematical proof of the influence between the elements of an Association Rule and extended the capacity of knowledge extraction in studies of multi-target applications. Association rules Association rules networks Data mining Generation of hypotheses Geração de hipóteses Mineração de dados Multi-target Multialvo Networks Redes Redes de regras de associação Regras de associação
30	Parallel Mining of Association Rules Using a Lattice Based Approach Thomas, Wessel Morant 01 January 2009 (has links) The discovery of interesting patterns from database transactions is one of the major problems in knowledge discovery in database. One such interesting pattern is the association rules extracted from these transactions. Parallel algorithms are required for the mining of association rules due to the very large databases used to store the transactions. In this paper we present a parallel algorithm for the mining of association rules. We implemented a parallel algorithm that used a lattice approach for mining association rules. The Dynamic Distributed Rule Mining (DDRM) is a lattice-based algorithm that partitions the lattice into sublattices to be assigned to processors for processing and identification of frequent itemsets. Experimental results show that DDRM utilizes the processors efficiently and performed better than the prefix-based and partition algorithms that use a static approach to assign classes to the processors. The DDRM algorithm scales well and shows good speedup. Apriori Association rules Data mining Distributed algorithms Lattice Parallel systems Computer Sciences

Search results