Spelling suggestions: "subject:"associative classification rule"" "subject:"dissociative classification rule""
1 |
Enhancing fuzzy associative rule mining approaches for improving prediction accuracy : integration of fuzzy clustering, apriori and multiple support approaches to develop an associative classification rule baseSowan, Bilal Ibrahim January 2011 (has links)
Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. This thesis focuses on building and enhancing a generic predictive model for estimating a future value by extracting association rules (knowledge) from a quantitative database. This model is applied to several data sets obtained from different benchmark problems, and the results are evaluated through extensive experimental tests. The thesis presents an incremental development process for the prediction model with three stages. Firstly, a Knowledge Discovery (KD) model is proposed by integrating Fuzzy C-Means (FCM) with Apriori approach to extract Fuzzy Association Rules (FARs) from a database for building a Knowledge Base (KB) to predict a future value. The KD model has been tested with two road-traffic data sets. Secondly, the initial model has been further developed by including a diversification method in order to improve a reliable FARs to find out the best and representative rules. The resulting Diverse Fuzzy Rule Base (DFRB) maintains high quality and diverse FARs offering a more reliable and generic model. The model uses FCM to transform quantitative data into fuzzy ones, while a Multiple Support Apriori (MSapriori) algorithm is adapted to extract the FARs from fuzzy data. The correlation values for these FARs are calculated, and an efficient orientation for filtering FARs is performed as a post-processing method. The FARs diversity is maintained through the clustering of FARs, based on the concept of the sharing function technique used in multi-objectives optimization. The best and the most diverse FARs are obtained as the DFRB to utilise within the Fuzzy Inference System (FIS) for prediction. The third stage of development proposes a hybrid prediction model called Fuzzy Associative Classification Rule Mining (FACRM) model. This model integrates the ii improved Gustafson-Kessel (G-K) algorithm, the proposed Fuzzy Associative Classification Rules (FACR) algorithm and the proposed diversification method. The improved G-K algorithm transforms quantitative data into fuzzy data, while the FACR generate significant rules (Fuzzy Classification Association Rules (FCARs)) by employing the improved multiple support threshold, associative classification and vertical scanning format approaches. These FCARs are then filtered by calculating the correlation value and the distance between them. The advantage of the proposed FACRM model is to build a generalized prediction model, able to deal with different application domains. The validation of the FACRM model is conducted using different benchmark data sets from the University of California, Irvine (UCI) of machine learning and KEEL (Knowledge Extraction based on Evolutionary Learning) repositories, and the results of the proposed FACRM are also compared with other existing prediction models. The experimental results show that the error rate and generalization performance of the proposed model is better in the majority of data sets with respect to the commonly used models. A new method for feature selection entitled Weighting Feature Selection (WFS) is also proposed. The WFS method aims to improve the performance of FACRM model. The prediction performance is improved by minimizing the prediction error and reducing the number of generated rules. The prediction results of FACRM by employing WFS have been compared with that of FACRM and Stepwise Regression (SR) models for different data sets. The performance analysis and comparative study show that the proposed prediction model provides an effective approach that can be used within a decision support system.
|
2 |
Análise de desempenho dos algoritmos Apriori e Fuzzy Apriori na extração de regras de associação aplicados a um Sistema de Detecção de Intrusos. / Performance analysis of algorithms Apriori and Fuzzy Apriori in association rules mining applied to a System for Intrusion Detection.Ricardo Ferreira Vieira de Castro 20 February 2014 (has links)
A extração de regras de associação (ARM - Association Rule Mining) de dados quantitativos tem sido pesquisa de grande interesse na área de mineração de dados. Com o crescente aumento das bases de dados, há um grande investimento na área de pesquisa na criação de algoritmos para melhorar o desempenho relacionado a quantidade de regras, sua relevância e a performance computacional. O algoritmo APRIORI, tradicionalmente usado na extração de regras de associação, foi criado originalmente para trabalhar com atributos categóricos. Geralmente, para usá-lo com atributos contínuos, ou quantitativos, é necessário transformar os atributos contínuos, discretizando-os e, portanto, criando categorias a partir dos intervalos discretos. Os métodos mais tradicionais de discretização produzem intervalos com fronteiras sharp, que podem subestimar ou superestimar elementos próximos dos limites das partições, e portanto levar a uma representação imprecisa de semântica. Uma maneira de tratar este problema é criar partições soft, com limites suavizados. Neste trabalho é utilizada uma partição fuzzy das variáveis contínuas, que baseia-se na teoria dos conjuntos fuzzy e transforma os atributos quantitativos em partições de termos linguísticos. Os algoritmos de mineração de regras de associação fuzzy (FARM - Fuzzy Association Rule Mining) trabalham com este princípio e, neste trabalho, o algoritmo FUZZYAPRIORI, que pertence a esta categoria, é utilizado. As regras extraídas são expressas em termos linguísticos, o que é mais natural e interpretável pelo raciocício humano. Os algoritmos APRIORI tradicional e FUZZYAPRIORI são comparado, através de classificadores associativos, baseados em regras extraídas por estes algoritmos. Estes classificadores foram aplicados em uma base de dados relativa a registros de conexões TCP/IP que destina-se à criação de um Sistema de Detecção de Intrusos. / The mining of association rules of quantitative data has been of great research interest in the area of data mining. With the increasing size of databases, there is a large investment in research in creating algorithms to improve performance related to the amount of rules, its relevance and computational performance. The APRIORI algorithm, traditionally used in the extraction of association rules, was originally created to work with categorical attributes. In order to use continuous attributes, it is necessary to transform the continuous attributes, through discretization, into categorical attributes, where each categorie corresponds to a discrete interval. The more traditional discretization methods produce intervals with sharp boundaries, which may underestimate or overestimate elements near the boundaries of the partitions, therefore inducing an inaccurate semantical representation. One way to address this problem is to create soft partitions with smoothed boundaries. In this work, a fuzzy partition of continuous variables, which is based on fuzzy set theory is used. The algorithms for mining fuzzy association rules (FARM - Fuzzy Association Rule Mining) work with this principle, and, in this work, the FUZZYAPRIORI algorithm is used. In this dissertation, we compare the traditional APRIORI and the FUZZYAPRIORI, through classification results of associative classifiers based on rules extracted by these algorithms. These classifiers were applied to a database of records relating to TCP / IP connections that aims to create an Intrusion Detection System.
|
3 |
Análise de desempenho dos algoritmos Apriori e Fuzzy Apriori na extração de regras de associação aplicados a um Sistema de Detecção de Intrusos. / Performance analysis of algorithms Apriori and Fuzzy Apriori in association rules mining applied to a System for Intrusion Detection.Ricardo Ferreira Vieira de Castro 20 February 2014 (has links)
A extração de regras de associação (ARM - Association Rule Mining) de dados quantitativos tem sido pesquisa de grande interesse na área de mineração de dados. Com o crescente aumento das bases de dados, há um grande investimento na área de pesquisa na criação de algoritmos para melhorar o desempenho relacionado a quantidade de regras, sua relevância e a performance computacional. O algoritmo APRIORI, tradicionalmente usado na extração de regras de associação, foi criado originalmente para trabalhar com atributos categóricos. Geralmente, para usá-lo com atributos contínuos, ou quantitativos, é necessário transformar os atributos contínuos, discretizando-os e, portanto, criando categorias a partir dos intervalos discretos. Os métodos mais tradicionais de discretização produzem intervalos com fronteiras sharp, que podem subestimar ou superestimar elementos próximos dos limites das partições, e portanto levar a uma representação imprecisa de semântica. Uma maneira de tratar este problema é criar partições soft, com limites suavizados. Neste trabalho é utilizada uma partição fuzzy das variáveis contínuas, que baseia-se na teoria dos conjuntos fuzzy e transforma os atributos quantitativos em partições de termos linguísticos. Os algoritmos de mineração de regras de associação fuzzy (FARM - Fuzzy Association Rule Mining) trabalham com este princípio e, neste trabalho, o algoritmo FUZZYAPRIORI, que pertence a esta categoria, é utilizado. As regras extraídas são expressas em termos linguísticos, o que é mais natural e interpretável pelo raciocício humano. Os algoritmos APRIORI tradicional e FUZZYAPRIORI são comparado, através de classificadores associativos, baseados em regras extraídas por estes algoritmos. Estes classificadores foram aplicados em uma base de dados relativa a registros de conexões TCP/IP que destina-se à criação de um Sistema de Detecção de Intrusos. / The mining of association rules of quantitative data has been of great research interest in the area of data mining. With the increasing size of databases, there is a large investment in research in creating algorithms to improve performance related to the amount of rules, its relevance and computational performance. The APRIORI algorithm, traditionally used in the extraction of association rules, was originally created to work with categorical attributes. In order to use continuous attributes, it is necessary to transform the continuous attributes, through discretization, into categorical attributes, where each categorie corresponds to a discrete interval. The more traditional discretization methods produce intervals with sharp boundaries, which may underestimate or overestimate elements near the boundaries of the partitions, therefore inducing an inaccurate semantical representation. One way to address this problem is to create soft partitions with smoothed boundaries. In this work, a fuzzy partition of continuous variables, which is based on fuzzy set theory is used. The algorithms for mining fuzzy association rules (FARM - Fuzzy Association Rule Mining) work with this principle, and, in this work, the FUZZYAPRIORI algorithm is used. In this dissertation, we compare the traditional APRIORI and the FUZZYAPRIORI, through classification results of associative classifiers based on rules extracted by these algorithms. These classifiers were applied to a database of records relating to TCP / IP connections that aims to create an Intrusion Detection System.
|
4 |
Enhancing Fuzzy Associative Rule Mining Approaches for Improving Prediction Accuracy. Integration of Fuzzy Clustering, Apriori and Multiple Support Approaches to Develop an Associative Classification Rule BaseSowan, Bilal I. January 2011 (has links)
Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. This thesis focuses on building and enhancing a generic predictive model for estimating a future value by extracting association rules (knowledge) from a quantitative database. This model is applied to several data sets obtained from different benchmark problems, and the results are evaluated through extensive experimental tests.
The thesis presents an incremental development process for the prediction model with three stages. Firstly, a Knowledge Discovery (KD) model is proposed by integrating Fuzzy C-Means (FCM) with Apriori approach to extract Fuzzy Association Rules (FARs) from a database for building a Knowledge Base (KB) to predict a future value. The KD model has been tested with two road-traffic data sets.
Secondly, the initial model has been further developed by including a diversification method in order to improve a reliable FARs to find out the best and representative rules. The resulting Diverse Fuzzy Rule Base (DFRB) maintains high quality and diverse FARs offering a more reliable and generic model. The model uses FCM to transform quantitative data into fuzzy ones, while a Multiple Support Apriori (MSapriori) algorithm is adapted to extract the FARs from fuzzy data. The correlation values for these FARs are calculated, and an efficient orientation for filtering FARs is performed as a post-processing method. The FARs diversity is maintained through the clustering of FARs, based on the concept of the sharing function technique used in multi-objectives optimization. The best and the most diverse FARs are obtained as the DFRB to utilise within the Fuzzy Inference System (FIS) for prediction.
The third stage of development proposes a hybrid prediction model called Fuzzy Associative Classification Rule Mining (FACRM) model. This model integrates the
ii
improved Gustafson-Kessel (G-K) algorithm, the proposed Fuzzy Associative Classification Rules (FACR) algorithm and the proposed diversification method. The improved G-K algorithm transforms quantitative data into fuzzy data, while the FACR generate significant rules (Fuzzy Classification Association Rules (FCARs)) by employing the improved multiple support threshold, associative classification and vertical scanning format approaches. These FCARs are then filtered by calculating the correlation value and the distance between them. The advantage of the proposed FACRM model is to build a generalized prediction model, able to deal with different application domains. The validation of the FACRM model is conducted using different benchmark data sets from the University of California, Irvine (UCI) of machine learning and KEEL (Knowledge Extraction based on Evolutionary Learning) repositories, and the results of the proposed FACRM are also compared with other existing prediction models. The experimental results show that the error rate and generalization performance of the proposed model is better in the majority of data sets with respect to the commonly used models.
A new method for feature selection entitled Weighting Feature Selection (WFS) is also proposed. The WFS method aims to improve the performance of FACRM model. The prediction performance is improved by minimizing the prediction error and reducing the number of generated rules. The prediction results of FACRM by employing WFS have been compared with that of FACRM and Stepwise Regression (SR) models for different data sets. The performance analysis and comparative study show that the proposed prediction model provides an effective approach that can be used within a decision support system. / Applied Science University (ASU) of Jordan
|
Page generated in 0.1341 seconds