Global ETD Search

11	Descoberta direta e eficiente de regras de associação ótimas / Discovery direct and efficient of optimal association rules Assunção, Alinson Sousa de 16 December 2011 (has links) Um dos principais interesses na descoberta do conhecimento e mineração de dados é a indução de regras de associação. Regras de associação caracterizam as relações entre os dados a partir de um conjunto de dados estruturado com transações, onde cada transação contém um subconjunto de itens. Seja X e Y dois conjuntos de itens disjuntos, então a regra X → Y define um relacionamento, isto é, a dependência ou a co-ocorrência entre os conjuntos X e Y. Um dos algoritmos mais conhecidos para geração de regras de associação é o algoritmo Apriori. Ele explora regras de associação que respeitam o limiar suporte mínimo, ou seja, as regras devem aparecer em uma quantidade mínima de transações. Esse limiar tem a capacidade de controlar a quantidade de regras extraídas durante a mineração. Entretanto, a frequência ou suporte não consegue medir o nível de interesse de uma regra. Para medir a importância ou interesse de uma regra em relação a outras foram desenvolvidas medidas de interesse. Tais medidas são calculadas a partir das frequências dos conjuntos de itens X, Y e do par XY. Apesar das medidas de interesse realizarem uma filtragem das regras desinteressantes, elas não acarretam na diminuição no tempo de execução da mineração. Para vencer essa dificuldade, técnicas que exploram diretamente regras de associação ótimas foram desenvolvidas. Um conjunto de regras de associação ótimas é um conjunto de regras que otimiza uma determinada medida de interesse. Na literatura existem muitos trabalhos que buscam esse tipo de conjunto de regras de forma direta e eficiente. O trabalho corrente segue esta mesma direção e visou a melhoria dessa tarefa por descobrir uma quantidade arbitrária de regras de associação ótimas. As abordagens anteriores apresentam um entrave em especial, que é a utilização do algoritmo Apriori. Tal técnica realiza uma busca em largura sobre os conjuntos de itens. No entanto, as técnicas mais promissoras que descobrem regras ótimas realizam busca em profundidade sobre o espaço de busca de regras. Em virtude dessa característica, neste trabalho foi adotada a técnica FP-growth, que realiza uma busca em profundidade sobre os conjuntos de itens explorados. Além da adoção da técnica FP-growth, foram desenvolvidas novas estratégias de poda e uma nova estratégia de busca na travessia do espaço de regras. Todas essas inovações foram adicionadas aos algoritmos desenvolvidos no corrente trabalho e proporcionaram melhor eficiência (tempo de execução) em relação ao algoritmo baseline em todos os testes. Tais testes foram realizados sobre conjuntos de dados reais e artificiais. / The induction of association rules is one of the main interests in knowledge discovery and data mining. Association rules describe the relationships between data from a transactional dataset, so that each transaction contains a subset of items. Let X and Y be two disjoint itemsets, then any rule X → Y defines a relationship that represents the dependence or co-occurrence between itemsets X and Y. Apriori is the best-known algorithm to generate association rules. It generates association rules that satisfy a user defined minimum support threshold. This means the rules should occur at least in an arbitrary number of transactions from a dataset. This threshold limits the number of association rules generated by Apriori. Yet, it is not possible to measure the interest of a rule through support. For that, interestingness measures were developed to assess the importance or interest of a rule. The values of these interestingness measures are obtained through frequencies of X, Y and XY. However, it is still an expensive task mining all the association rules and then filter them according to an interestingness measure. To overcome this difficulty, techniques to induce optimal association rules have been developed. Optimal association rules are a ruleset that optimize an arbitrary interestingness measure. In the literature, there are many papers which aim at searching for optimal association rules directly and efficiently. The current MSc thesis follows this direction, aiming at improving this objective. Previous approaches share one obstacle in particular: the use of Apriori. This algorithm performs a breadth-first search on the itemsets space. However, the most promising techniques to find optimal rules perform a depth-first search on the space of rules. Hence, in this research we adopted the FP-growth algorithm, which performs a depth-first search on the itemsets space. Besides using this algorithm, new rule pruning techniques and a new search space traversing on the space rules were developed. The algorithms developed in the current research contain all these innovations. In all tests, the proposed algorithms surpassed the baseline algorithms in terms of efficiency. These tests were conducted on real and articial datasets. Association rules Data mining Mineração de dados Regras de associação
12	Using Association Rules to Guide a Search for Best Fitting Transfer Models of Student Learning Freyberger, Jonathan E 30 April 2004 (has links) Transfer models provide a viable means of determining which skills a student needs in order to solve a given problem. However, constructing a good fitting transfer model requires a lot of trial and error. The main goal of this thesis was to develop a procedure for developing better fit transfer models for intelligent tutoring systems. The procedure implements a search method using association rules as a means of guiding the search. The association rules are mined from the instances in the dataset that the transfer model predicts incorrectly. The association rules found in the mining process determines what operation to perform on the current transfer model. Our search algorithm using association rules was compared to a blind search method that finds all possible transfer models for a given set of factors. Our search process was able to find statistically similar models to the ones the blind search method finds in a considerably shorter amount of time. The difference in times between our search process and the blind search method is days to minutes. Being able to find good transfer models quicker will help intelligent tutor system builders as well as cognitive science researchers better assess what makes certain problems hard and other problems easy for students. aprior ASAS association rules logistic regression transfer models predicting performance
13	Association rules for exploit code analysis to prevent Buffer Overflow Li, Chang-Yu 01 August 2007 (has links) As the development of software applications and Internet, the security issues that come with get more serious. Buffer Overflow is an unavoidable problem while software programming. According to the advisories of each year, they show that many security vulnerabilities are from Buffer Overflow. Buffer Overflow is also the cause of intrusion made by hackers. The users of software applications usually depend on the software updates released by software venders to prevent the attacks caused by Buffer Overflow. So before applying software updates, that how to avoid attacks to software and prolong the save period of software is an important issue to prevent Buffer Overflow. By collecting and analyzing the exploit codes used by hackers, we can build the overall pattern of Buffer Overflow attacks, and we can take this pattern as the basis for preventing future Buffer Overflow attacks. Association rules can find the relations of unknown things, so it can help to build the common pattern between Buffer Overflow attacks. So this work applies association rules to build the pattern of Buffer Overflow attacks, and to find out the relations of system calls inside the exploit codes. We experiment and build a group of system call rules that can differentiate the attack behavior and the normal behavior. These rules can detect the Buffer Overflow attacks exactly and perform well in false positives. And then they can help to do further defenses after detecting attacks and alleviate the seriousness of Buffer Overflow attacks to computer systems. system call association rules Buffer Overflow exploit code
14	Efficient Mining Approaches for Coherent Association Rules Lin, Yui-Kai 29 August 2012 (has links) The goal of data mining is to help market managers find relationships among items from large datasets to increase profits. Among the mining techniques, the Apriori algorithm is the most basic and important for association rule mining. Although a lot of mining approaches have been proposed based on the Apriori algorithm, most of them focus on positive association rules, such as R1: ¡§If milk is bought, then bread is bought¡¨. However, rule R1 may confuses users and makes wrong decision if the negative relation rules are not considered. For example, the rule such as R2: ¡§If milk is not bought, then bread is bought¡¨ may also be found. Then, the rule R2 conflicts with the positive rule R1. So, if two rules such as ¡§If milk is bought, then bread is bought¡¨ and ¡§If milk is not bought, then bread is not bought¡¨ are found at the same time, the rules which is called coherent rule may be more valuable.In this thesis, we thus propose two algorithms for solving this problem. The first proposed algorithm is named Highly Coherent Rule Mining algorithm (HCRM), which takes the properties of propositional logic into consideration and is based on Apriori approach for finding coherent rules. The lower and upper bounds of itemsets are also tightened to remove unnecessary check. Besides, in order to improve the efficiency of the mining process, the second algorithm, namely Projection-based Coherent Mining Algorithm (PCA), based on data projection is proposed for speeding up the execution time. Experiments are conducted on real and simulation datasets to demonstrate the performance of the proposed approaches and the results show that both HCRM and PCA can find more reliable rules and PCA is more efficient. projection coherent rules propositional logic association rules data mining
15	A Sliding-Window Approach to Mining Maximal Large Itemsets for Large Databases Chang, Yuan-feng 28 July 2004 (has links) Mining association rules, means a process of nontrivial extraction of implicit, previously and potentially useful information from data in databases. Mining maximal large itemsets is a further work of mining association rules, which aims to find the set of all subsets of large (frequent) itemsets that could be representative of all large itemsets. Previous algorithms to mining maximal large itemsets can be classified into two approaches: exhausted and shortcut. The shortcut approach could generate smaller number of candidate itemsets than the exhausted approach, resulting in better performance in terms of time and storage space. On the other hand, when updates to the transaction databases occur, one possible approach is to re-run the mining algorithm on the whole database. The other approach is incremental mining, which aims for efficient maintenance of discovered association rules without re-running the mining algorithms. However, previous algorithms for mining maximal large itemsets based on the shortcut approach can not support incremental mining for mining maximal large itemsets. While the algorithms for incremental mining, {it e.g.}, the SWF algorithm, could not efficiently support mining maximal large itemsets, since it is based on the exhausted approach. Therefore, in this thesis, we focus on the design of an algorithm which could provide good performance for both mining maximal itemsets and incremental mining. Based on some observations, for example, ``{it if an itemset is large, all its subsets must be large; therefore, those subsets need not to be examined further}", we propose a Sliding-Window approach, the SWMax algorithm, for efficiently mining maximal large itemsets and incremental mining. Our SWMax algorithm is a two-passes partition-based approach. We will find all candidate 1-itemsets ($C_1$), candidate 3-itemsets ($C_3$), large 1-itemsets ($L_1$), and large 3-itemsets ($L_3$) in the first pass. We generate the virtual maximal large itemsets after the first pass. Then, we use $L_1$ to generate $C_2$, use $L_3$ to generate $C_4$, use $C_4$ to generate $C_5$, until there is no $C_k$ generated. In the second pass, we use the virtual maximal large itemsets to prune $C_k$, and decide the maximal large itemsets. For incremental mining, we consider two cases: (1) data insertion, (2) data deletion. Both in Case 1 and Case 2, if an itemset with size equal to 1 is not large in the original database, it could not be found in the updated database based on the SWF algorithm. That is, a missing case could occur in the incremental mining process of the SWF algorithm, because the SWF algorithm only keeps the $C_2$ information. While our SWMax algorithm could support incremental mining correctly, since $C_1$ and $C_3$ are maintained in our algorithm. We generate some synthetic databases to simulate the real transaction databases in our simulation. From our simulation, the results show that our SWMax algorithm could generate fewer number of candidates and needs less time than the SWF algorithm. partition Incremental Mining Maximal Large Itemsets Data Mining Association Rules
16	Targeted Advertising Based on GP-association rules Tsai, Chai-wen 13 August 2004 (has links) Targeting a small portion of customers for advertising has long been recognized by businesses. In this thesis we proposed a novel approach to promoting products with no prior transaction records. This approach starts with discovering the GP-association rules between customer types and product genres that had occurred frequently in transaction records. Customers are characterized by demographic attributes, some of these attributes have concept hierarchies and products can be generalized through some product taxonomy. Based on GP-association rules set, we developed a comprehensive algorithm to locating a short list of prospective customers for a given promotion product. The new approach was evaluated using the patron¡¦s circulation data from OPAC system of our university library. We measured the accuracy of estimated method and the effectiveness of targeted advertising in different parameters. The result shows that our approach achieved higher accuracy and effectiveness than other methods. GP-association rules Targeted Advertising Product Recommendation Concept Hierarchies
17	A Class-rooted FP-tree Approach to Data Classification Chen, Chien-hung 29 June 2005 (has links) Classification, an important problem of data mining, is one of useful techniques for prediction. The goal of the classification problem is to construct a classifier from a given database for training, and to predict new data with the unknown class. Classification has been widely applied to many areas, such as medical diagnosis and weather prediction. The decision tree is the most popular model among classifiers, since it can generate understandable rules and perform classification without requiring any computation. However, a major drawback of the decision tree model is that it only examines a single attribute at a time. In the real world, attributes in some databases are dependent on each other. Thus, we may improve the accuracy of the decision tree by discovering the correlation between attributes. The CAM method applies the method of mining association rules, like the Apriori method, for discovering the attribute dependence. However, traditional methods for mining association rules are inefficient in the classification applications and could have five problems: (1) the combinatorial explosion problem, (2) invalid candidates, (3) unsuitable minimal support, (4) the ignored meaningful class values, and (5) itemsets without class data. The FP-growth avoids the first two problems. However, it is still suffered from the remaining three problems. Moreover, one more problem occurs: Unnecessary nodes for the classification problem which make the FP-tree incompact and huge. Furthermore, the workload of the CAM method is expensive due to too many times of database scanning, and the attribute combination problem causes some misclassification. Therefore, in this thesis, we present an efficient and accurate decision tree building method which resolves the above six problems and reduces the overhead of database scanning in the CAM method. We build a structure named class-rooted FP-tree which is a tree similar to the FP-tree, except the root of the tree is always a class item. Instead of using a static minimal support applied in the FP-growth method, we decide the minimal support dynamically, which can avoid some misjudgement of large itemsets used for the classification problem. In the decision tree building phase, we provide a pruning strategy that can reduce the times of database scanning. We also solve the attribute combination problem in the CAM method and improve the accuracy. From our simulation, we show that the performance of the proposed class-rooted FP-tree mining method is better than that of other mining association rule methods in terms of storage usage. Our simulation also shows the performance improvement of our method in terms of the times of database scanning and classification accuracy as compared with the CAM method. Therefore, the mining strategy of our proposed method is applicable to any method for building decision tree, and provides high accuracy in the real world. data mining correlated attributes association rules classification decision trees
18	A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns IBARAKI, Toshihide, BOROS, Endre, YAGIURA, Mutsunori, HARAGUCHI, Kazuya 01 March 2008 (has links) No description available. probabilistic analysis knowledge discovery association rules frequent/infrequent item sets
19	Data Mining in Acquiring Association Knowledge Between Diseases and Medicine Treatments Chen, Shih-Yuan 02 August 2000 (has links) None Association Rules Medical Treatments Data Mining Sequential Patterns
20	The Research on Finding Generalized Association Rules from Library Circulation Records Hung, Chin-Yuan 02 August 2001 (has links) Abstract Libraries have long been widely recognized as import information-offering institutes. Thousands of new books are acquired per month by our university¡Xa mid-sized university in Taiwan), and patrons may have difficulties identifying the small set of books that really interest them. This gives rise to the problem of finding an effective way to recommend patrons the newly arrived books in a library. In this work, we address this problem in finding generalized association rules between patrons and books. We first discuss how to identify relevant but independent patron attributes in regard of the books they checked out. Then, we propose a set of algorithms for generating large itemsets and evaluate their performance experimentally. In addition, we define interestingness of rules and propose an algorithm for pruning uninteresting rules. Finally, we apply our approach to the circulation data of National SUN Yat-Sen University library and report our experiences. Association Rules Digital Library New Book Recommendation Selective Dissemination of Information

Search results