Spelling suggestions: "subject:"classification rule"" "subject:"1classification rule""
1 |
Identification of human gait using genetic algorithms tuned fuzzy logicMahmoud, Abdallah Abdel-Rahman Hassan, January 2009 (has links)
Thesis (M.S.)--University of Texas at El Paso, 2009. / Title from title screen. Vita. CD-ROM. Includes bibliographical references. Also available online.
|
2 |
Automating Geographic Object-Based Image Analysis and Assessing the Methods Transferability : A Case Study Using High Resolution Geografiska SverigedataTM OrthophotosHast, Isak, Mehari, Asmelash January 2016 (has links)
Geographic object-based image analysis (GEOBIA) is an innovative image classification technique that treats spatial features in an image as objects, rather than as pixels; thus resembling closer to that of human perception of the geographic space. However, the process of a GEOBIA application allows for multiple interpretations. Particularly sensitive parts of the process include image segmentation and training data selection. The multiresolution segmentation algorithm (MSA) is commonly applied. The performance of segmentation depends primarily on the algorithms scale parameter, since scale controls the size of image objects produced. The fact that the scale parameter is unit less makes it a challenge to select a suitable one; thus, leaving the analyst to a method of trial and error. This can lead to a possible bias. Additionally, part from the segmentation, training area selection usually means that the data has to be manually collected. This is not only time consuming but also prone to subjectivity. In order to overcome these challenges, we tested a GEOBIA scheme that involved automatic methods of MSA scale parameterisation and training area selection which enabled us to more objectively classify images. Three study areas within Sweden were selected. The data used was high resolution Geografiska Sverigedata (GSD) orthophotos from the Swedish mapping agency, Lantmäteriet. We objectively found scale for each classification using a previously published technique embedded as a tool in eCognition software. Based on the orthophoto inputs, the tool calculated local variance and rate of change at different scales. These figures helped us to determine scale value for the MSA segmentation. Moreover, we developed in this study a novel method for automatic training area selection. The method is based on thresholded feature statistics layers computed from the orthophoto band derivatives. Thresholds were detected by Otsu’s single and multilevel algorithms. The layers were run through a filtering process which left only those fit for use in the classification process. We also tested the transferability of classification rule-sets for two of the study areas. This test helped us to investigate the degree to which automation can be realised. In this study we have made progress toward a more objective way of object-based image classification, realised by automating the scheme. Particularly noteworthy is the algorithm for automatic training area selection proposed, which compared to manual selection restricts human intervention to a minimum. Results of the classification show overall well delineated classes, in particular, the border between open area and forest contributed by the elevation data. On the other hand, there still persists some challenges regarding separating between deciduous and coniferous forest. Furthermore, although water was accurately classified in most instances, in one of the study areas, the water class showed contradictory results between its thematic and positional accuracy; hence stressing the importance of assessing the result based on more than the thematic accuracy. From the transferability test we noted the importance of considering the spatial/spectral characteristics of an area before transferring of rule-sets as these factors are a key to determine whether a transfer is possible.
|
3 |
Data Mining Using Neural NetworksRahman, Sardar Muhammad Monzurur, mrahman99@yahoo.com January 2006 (has links)
Data mining is about the search for relationships and global patterns in large databases that are increasing in size. Data mining is beneficial for anyone who has a huge amount of data, for example, customer and business data, transaction, marketing, financial, manufacturing and web data etc. The results of data mining are also referred to as knowledge in the form of rules, regularities and constraints. Rule mining is one of the popular data mining methods since rules provide concise statements of potentially important information that is easily understood by end users and also actionable patterns. At present rule mining has received a good deal of attention and enthusiasm from data mining researchers since rule mining is capable of solving many data mining problems such as classification, association, customer profiling, summarization, segmentation and many others. This thesis makes several contributions by proposing rule mining methods using genetic algorithms and neural networks. The thesis first proposes rule mining methods using a genetic algorithm. These methods are based on an integrated framework but capable of mining three major classes of rules. Moreover, the rule mining processes in these methods are controlled by tuning of two data mining measures such as support and confidence. The thesis shows how to build data mining predictive models using the resultant rules of the proposed methods. Another key contribution of the thesis is the proposal of rule mining methods using supervised neural networks. The thesis mathematically analyses the Widrow-Hoff learning algorithm of a single-layered neural network, which results in a foundation for rule mining algorithms using single-layered neural networks. Three rule mining algorithms using single-layered neural networks are proposed for the three major classes of rules on the basis of the proposed theorems. The thesis also looks at the problem of rule mining where user guidance is absent. The thesis proposes a guided rule mining system to overcome this problem. The thesis extends this work further by comparing the performance of the algorithm used in the proposed guided rule mining system with Apriori data mining algorithm. Finally, the thesis studies the Kohonen self-organization map as an unsupervised neural network for rule mining algorithms. Two approaches are adopted based on the way of self-organization maps applied in rule mining models. In the first approach, self-organization map is used for clustering, which provides class information to the rule mining process. In the second approach, automated rule mining takes the place of trained neurons as it grows in a hierarchical structure.
|
4 |
Enhancing fuzzy associative rule mining approaches for improving prediction accuracy : integration of fuzzy clustering, apriori and multiple support approaches to develop an associative classification rule baseSowan, Bilal Ibrahim January 2011 (has links)
Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. This thesis focuses on building and enhancing a generic predictive model for estimating a future value by extracting association rules (knowledge) from a quantitative database. This model is applied to several data sets obtained from different benchmark problems, and the results are evaluated through extensive experimental tests. The thesis presents an incremental development process for the prediction model with three stages. Firstly, a Knowledge Discovery (KD) model is proposed by integrating Fuzzy C-Means (FCM) with Apriori approach to extract Fuzzy Association Rules (FARs) from a database for building a Knowledge Base (KB) to predict a future value. The KD model has been tested with two road-traffic data sets. Secondly, the initial model has been further developed by including a diversification method in order to improve a reliable FARs to find out the best and representative rules. The resulting Diverse Fuzzy Rule Base (DFRB) maintains high quality and diverse FARs offering a more reliable and generic model. The model uses FCM to transform quantitative data into fuzzy ones, while a Multiple Support Apriori (MSapriori) algorithm is adapted to extract the FARs from fuzzy data. The correlation values for these FARs are calculated, and an efficient orientation for filtering FARs is performed as a post-processing method. The FARs diversity is maintained through the clustering of FARs, based on the concept of the sharing function technique used in multi-objectives optimization. The best and the most diverse FARs are obtained as the DFRB to utilise within the Fuzzy Inference System (FIS) for prediction. The third stage of development proposes a hybrid prediction model called Fuzzy Associative Classification Rule Mining (FACRM) model. This model integrates the ii improved Gustafson-Kessel (G-K) algorithm, the proposed Fuzzy Associative Classification Rules (FACR) algorithm and the proposed diversification method. The improved G-K algorithm transforms quantitative data into fuzzy data, while the FACR generate significant rules (Fuzzy Classification Association Rules (FCARs)) by employing the improved multiple support threshold, associative classification and vertical scanning format approaches. These FCARs are then filtered by calculating the correlation value and the distance between them. The advantage of the proposed FACRM model is to build a generalized prediction model, able to deal with different application domains. The validation of the FACRM model is conducted using different benchmark data sets from the University of California, Irvine (UCI) of machine learning and KEEL (Knowledge Extraction based on Evolutionary Learning) repositories, and the results of the proposed FACRM are also compared with other existing prediction models. The experimental results show that the error rate and generalization performance of the proposed model is better in the majority of data sets with respect to the commonly used models. A new method for feature selection entitled Weighting Feature Selection (WFS) is also proposed. The WFS method aims to improve the performance of FACRM model. The prediction performance is improved by minimizing the prediction error and reducing the number of generated rules. The prediction results of FACRM by employing WFS have been compared with that of FACRM and Stepwise Regression (SR) models for different data sets. The performance analysis and comparative study show that the proposed prediction model provides an effective approach that can be used within a decision support system.
|
5 |
Implementation of a classification algorithm for institutional analysisSun, Hongliang, University of Lethbridge. Faculty of Arts and Science January 2008 (has links)
The report presents an implemention of a classification algorithm for the Institutional Analysis
Project. The algorithm used in this project is the decision tree classification algorithm
which uses a gain ratio attribute selectionmethod. The algorithm discovers the hidden rules
from the student records, which are used to predict whether or not other students are at risk
of dropping out. It is shown that special rules exist in different data sets, each with their
natural hidden knowledge. In other words, the rules that are obtained depend on the data
that is used for classification. In our preliminary experiments, we show that between 55-78
percent of data with unknown class lables can be correctly classified, using the rules obtained
from data whose class labels are known. We feel this is acceptable, given the large
number of records, attributes, and attribute values that are used in the experiments. The
project results are useful for large data set analysis. / viii, 38 leaves ; 29 cm. --
|
6 |
Αυτόματη παραγωγή έμπειρων συστημάτων με συντελεστές βεβαιότητας από σύνολα δεδομένων / Automatic generation of expert systems with certainty factors from datasetsΚόβας, Κωνσταντίνος 11 August 2011 (has links)
Σκοπός της συγκεκριμένης εργασίας είναι η έρευνα πάνω στον τομέα της αυτόματης παραγωγής έμπειρων συστημάτων, ανακαλύπτοντας γνώση μέσα σε σύνολα δεδομένων και αναπαριστώντας την με την μορφή κανόνων. Ουσιαστικά πρόκειται για μια μέθοδο επιτηρούμενης μάθησης όπως η εξόρυξη κανόνων ταξινόμησης, ωστόσο ο στόχος δεν είναι αποκλειστικά η ταξινόμηση, αλλά και η τήρηση σημαντικών προδιαγραφών ενός έμπειρου συστήματος όπως η επεξήγηση, η ενημέρωση για νέα δεδομένα κ.α. Στα πλαίσια της προπτυχιακής μου εργασίας αναπτύχθηκε ένα εργαλείο που είχε σκοπό την σύγκριση μεθόδων για συνδυασμό αβέβαιων συμπερασμάτων για το ίδιο γεγονός, στο μοντέλο των Συντελεστών Βεβαιότητας. Το εργαλείο έδινε την δυνατότητα να παραχθούν Έμπειρα Συστήματα (στη γλώσσα CLIPS) που χρησιμοποιούν τις παραπάνω μεθόδους. Σκοπός της παρούσας εργασίας ήταν η διερεύνηση του τομέα της μηχανικής μάθησης και η επέκταση του υπάρχοντος εργαλείου, ώστε να παράγει έμπειρα συστήματα με έναν πιο αυτόματο, αποδοτικό και λειτουργικό τρόπο. Πιο συγκεκριμένα τροποποιήθηκε η αρχιτεκτονική για την υποστήριξη μεταβλητών εξόδου με περισσότερες από δυο κλάσεις (Multiclass Classification). Επίσης έγινε επέκταση ώστε να μπορούν να εξαχθούν κανόνες για περισσότερες μεταβλητές του συνόλου δεδομένων (εκτός δηλαδή από την μεταβλητή εξόδου), για τις οποίες δεν χρειάζεται πλέον να γνωρίζει τιμές ο τελικός χρήστης του έμπειρου συστήματος. Η επέκταση αυτή δίνει την δυνατότητα να σχεδιαστούν πιο πολύπλοκες ιεραρχίες κανόνων, που ακολουθούν μια δενδρική δομή, εύκολα ερμηνεύσιμη από τον άνθρωπο. Το μοντέλο συντελεστών βεβαιότητας επανασχεδιάστηκε, ενώ πλέον προσφέρεται και ένας εναλλακτικός τρόπος υπολογισμού των συντελεστών βεβαιότητας των κανόνων ταξινόμησης ο οποίος βασίζεται στον ορισμό τους στο έμπειρο σύστημα MYCIN. Τα αποτελέσματα έδειξαν ότι σε μη ισορροπημένα σύνολα δεδομένων η μέθοδος αυτή ευνοεί την πρόβλεψη για την κλάση μειοψηφίας. Τεχνικές επιλογής υποσυνόλων χαρακτηριστικών, δίνουν την δυνατότητα αυτοματοποίησης σε μεγάλο βαθμό της διαδικασίας παραγωγής του έμπειρου συστήματος με τρόπο αποδοτικό. Άλλες προσθήκες είναι η δυνατότητα δημιουργίας συστημάτων που μπορούν να ενημερώνονται δυναμικά αξιοποιώντας νέα δεδομένα για το πρόβλημα, η παραγωγή κανόνων και συναρτήσεων για την αλληλεπίδραση με τον χρήστη, η παροχή γραφικού περιβάλλοντος για το παραγόμενο έμπειρο σύστημα κ.α. / The main objective of this thesis is to present a method for automatic generation of expert systems, by extracting knowledge from datasets and representing it in the form of production rules. We use a supervised machine learning method, resembling Classification Rule Mining, although classification is not our only goal. Important operational characteristics of expert systems, like explanation of conclusions and dynamic update of the knowledge base, are also taken into account. Our approach is implemented within an existing tool, initially developed by us to compare methods for combining uncertain conclusions about the same event, based on the uncertainty model of Certainty Factors. That tool could generate Expert Systems (in CLIPS language) that use the above methods. The main aim of this thesis is to do research mainly on the field of machine learning in order to enhance the above mentioned tool for generating Expert Systems in a more automatic, efficient and functional fashion.
More specifically, the architecture has been modified to support output variables classified in more than two classes (Multiclass Classification). An extension of the system made it possible to generate classification rules for additional variables (apart from the output variable), for which the final user of the expert system cannot provide values. This gives the ability to design more complex rule hierarchies, which are represented in an easy-to-understand tree form. Furthermore, the certainty factors model has been revised and an additional method of computing them is offered, following the definitions in MYCIN’s model. Experimental results showed improved performance, especially for prediction of minority classes in imbalanced datasets. Feature ranking and subset selection techniques help to achieve the generation task in a more automatic and efficient way. Other enhancements include the ability to produce expert systems that dynamically update the certainty factors in their rules, the generation of rules and functions for interaction with the end-user and a graphical interface for the produced expert system.
|
7 |
Análise de desempenho dos algoritmos Apriori e Fuzzy Apriori na extração de regras de associação aplicados a um Sistema de Detecção de Intrusos. / Performance analysis of algorithms Apriori and Fuzzy Apriori in association rules mining applied to a System for Intrusion Detection.Ricardo Ferreira Vieira de Castro 20 February 2014 (has links)
A extração de regras de associação (ARM - Association Rule Mining) de dados quantitativos tem sido pesquisa de grande interesse na área de mineração de dados. Com o crescente aumento das bases de dados, há um grande investimento na área de pesquisa na criação de algoritmos para melhorar o desempenho relacionado a quantidade de regras, sua relevância e a performance computacional. O algoritmo APRIORI, tradicionalmente usado na extração de regras de associação, foi criado originalmente para trabalhar com atributos categóricos. Geralmente, para usá-lo com atributos contínuos, ou quantitativos, é necessário transformar os atributos contínuos, discretizando-os e, portanto, criando categorias a partir dos intervalos discretos. Os métodos mais tradicionais de discretização produzem intervalos com fronteiras sharp, que podem subestimar ou superestimar elementos próximos dos limites das partições, e portanto levar a uma representação imprecisa de semântica. Uma maneira de tratar este problema é criar partições soft, com limites suavizados. Neste trabalho é utilizada uma partição fuzzy das variáveis contínuas, que baseia-se na teoria dos conjuntos fuzzy e transforma os atributos quantitativos em partições de termos linguísticos. Os algoritmos de mineração de regras de associação fuzzy (FARM - Fuzzy Association Rule Mining) trabalham com este princípio e, neste trabalho, o algoritmo FUZZYAPRIORI, que pertence a esta categoria, é utilizado. As regras extraídas são expressas em termos linguísticos, o que é mais natural e interpretável pelo raciocício humano. Os algoritmos APRIORI tradicional e FUZZYAPRIORI são comparado, através de classificadores associativos, baseados em regras extraídas por estes algoritmos. Estes classificadores foram aplicados em uma base de dados relativa a registros de conexões TCP/IP que destina-se à criação de um Sistema de Detecção de Intrusos. / The mining of association rules of quantitative data has been of great research interest in the area of data mining. With the increasing size of databases, there is a large investment in research in creating algorithms to improve performance related to the amount of rules, its relevance and computational performance. The APRIORI algorithm, traditionally used in the extraction of association rules, was originally created to work with categorical attributes. In order to use continuous attributes, it is necessary to transform the continuous attributes, through discretization, into categorical attributes, where each categorie corresponds to a discrete interval. The more traditional discretization methods produce intervals with sharp boundaries, which may underestimate or overestimate elements near the boundaries of the partitions, therefore inducing an inaccurate semantical representation. One way to address this problem is to create soft partitions with smoothed boundaries. In this work, a fuzzy partition of continuous variables, which is based on fuzzy set theory is used. The algorithms for mining fuzzy association rules (FARM - Fuzzy Association Rule Mining) work with this principle, and, in this work, the FUZZYAPRIORI algorithm is used. In this dissertation, we compare the traditional APRIORI and the FUZZYAPRIORI, through classification results of associative classifiers based on rules extracted by these algorithms. These classifiers were applied to a database of records relating to TCP / IP connections that aims to create an Intrusion Detection System.
|
8 |
Análise de desempenho dos algoritmos Apriori e Fuzzy Apriori na extração de regras de associação aplicados a um Sistema de Detecção de Intrusos. / Performance analysis of algorithms Apriori and Fuzzy Apriori in association rules mining applied to a System for Intrusion Detection.Ricardo Ferreira Vieira de Castro 20 February 2014 (has links)
A extração de regras de associação (ARM - Association Rule Mining) de dados quantitativos tem sido pesquisa de grande interesse na área de mineração de dados. Com o crescente aumento das bases de dados, há um grande investimento na área de pesquisa na criação de algoritmos para melhorar o desempenho relacionado a quantidade de regras, sua relevância e a performance computacional. O algoritmo APRIORI, tradicionalmente usado na extração de regras de associação, foi criado originalmente para trabalhar com atributos categóricos. Geralmente, para usá-lo com atributos contínuos, ou quantitativos, é necessário transformar os atributos contínuos, discretizando-os e, portanto, criando categorias a partir dos intervalos discretos. Os métodos mais tradicionais de discretização produzem intervalos com fronteiras sharp, que podem subestimar ou superestimar elementos próximos dos limites das partições, e portanto levar a uma representação imprecisa de semântica. Uma maneira de tratar este problema é criar partições soft, com limites suavizados. Neste trabalho é utilizada uma partição fuzzy das variáveis contínuas, que baseia-se na teoria dos conjuntos fuzzy e transforma os atributos quantitativos em partições de termos linguísticos. Os algoritmos de mineração de regras de associação fuzzy (FARM - Fuzzy Association Rule Mining) trabalham com este princípio e, neste trabalho, o algoritmo FUZZYAPRIORI, que pertence a esta categoria, é utilizado. As regras extraídas são expressas em termos linguísticos, o que é mais natural e interpretável pelo raciocício humano. Os algoritmos APRIORI tradicional e FUZZYAPRIORI são comparado, através de classificadores associativos, baseados em regras extraídas por estes algoritmos. Estes classificadores foram aplicados em uma base de dados relativa a registros de conexões TCP/IP que destina-se à criação de um Sistema de Detecção de Intrusos. / The mining of association rules of quantitative data has been of great research interest in the area of data mining. With the increasing size of databases, there is a large investment in research in creating algorithms to improve performance related to the amount of rules, its relevance and computational performance. The APRIORI algorithm, traditionally used in the extraction of association rules, was originally created to work with categorical attributes. In order to use continuous attributes, it is necessary to transform the continuous attributes, through discretization, into categorical attributes, where each categorie corresponds to a discrete interval. The more traditional discretization methods produce intervals with sharp boundaries, which may underestimate or overestimate elements near the boundaries of the partitions, therefore inducing an inaccurate semantical representation. One way to address this problem is to create soft partitions with smoothed boundaries. In this work, a fuzzy partition of continuous variables, which is based on fuzzy set theory is used. The algorithms for mining fuzzy association rules (FARM - Fuzzy Association Rule Mining) work with this principle, and, in this work, the FUZZYAPRIORI algorithm is used. In this dissertation, we compare the traditional APRIORI and the FUZZYAPRIORI, through classification results of associative classifiers based on rules extracted by these algorithms. These classifiers were applied to a database of records relating to TCP / IP connections that aims to create an Intrusion Detection System.
|
9 |
Enhancing Fuzzy Associative Rule Mining Approaches for Improving Prediction Accuracy. Integration of Fuzzy Clustering, Apriori and Multiple Support Approaches to Develop an Associative Classification Rule BaseSowan, Bilal I. January 2011 (has links)
Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. This thesis focuses on building and enhancing a generic predictive model for estimating a future value by extracting association rules (knowledge) from a quantitative database. This model is applied to several data sets obtained from different benchmark problems, and the results are evaluated through extensive experimental tests.
The thesis presents an incremental development process for the prediction model with three stages. Firstly, a Knowledge Discovery (KD) model is proposed by integrating Fuzzy C-Means (FCM) with Apriori approach to extract Fuzzy Association Rules (FARs) from a database for building a Knowledge Base (KB) to predict a future value. The KD model has been tested with two road-traffic data sets.
Secondly, the initial model has been further developed by including a diversification method in order to improve a reliable FARs to find out the best and representative rules. The resulting Diverse Fuzzy Rule Base (DFRB) maintains high quality and diverse FARs offering a more reliable and generic model. The model uses FCM to transform quantitative data into fuzzy ones, while a Multiple Support Apriori (MSapriori) algorithm is adapted to extract the FARs from fuzzy data. The correlation values for these FARs are calculated, and an efficient orientation for filtering FARs is performed as a post-processing method. The FARs diversity is maintained through the clustering of FARs, based on the concept of the sharing function technique used in multi-objectives optimization. The best and the most diverse FARs are obtained as the DFRB to utilise within the Fuzzy Inference System (FIS) for prediction.
The third stage of development proposes a hybrid prediction model called Fuzzy Associative Classification Rule Mining (FACRM) model. This model integrates the
ii
improved Gustafson-Kessel (G-K) algorithm, the proposed Fuzzy Associative Classification Rules (FACR) algorithm and the proposed diversification method. The improved G-K algorithm transforms quantitative data into fuzzy data, while the FACR generate significant rules (Fuzzy Classification Association Rules (FCARs)) by employing the improved multiple support threshold, associative classification and vertical scanning format approaches. These FCARs are then filtered by calculating the correlation value and the distance between them. The advantage of the proposed FACRM model is to build a generalized prediction model, able to deal with different application domains. The validation of the FACRM model is conducted using different benchmark data sets from the University of California, Irvine (UCI) of machine learning and KEEL (Knowledge Extraction based on Evolutionary Learning) repositories, and the results of the proposed FACRM are also compared with other existing prediction models. The experimental results show that the error rate and generalization performance of the proposed model is better in the majority of data sets with respect to the commonly used models.
A new method for feature selection entitled Weighting Feature Selection (WFS) is also proposed. The WFS method aims to improve the performance of FACRM model. The prediction performance is improved by minimizing the prediction error and reducing the number of generated rules. The prediction results of FACRM by employing WFS have been compared with that of FACRM and Stepwise Regression (SR) models for different data sets. The performance analysis and comparative study show that the proposed prediction model provides an effective approach that can be used within a decision support system. / Applied Science University (ASU) of Jordan
|
10 |
Σύγκριση μεθόδων δημιουργίας έμπειρων συστημάτων με κανόνες για προβλήματα κατηγοριοποίησης από σύνολα δεδομένωνΤζετζούμης, Ευάγγελος 31 January 2013 (has links)
Σκοπός της παρούσας εργασίας είναι η σύγκριση διαφόρων μεθόδων κατηγοριοποίησης που στηρίζονται σε αναπαράσταση γνώσης με κανόνες μέσω της δημιουργίας έμπειρων συστημάτων από γνωστά σύνολα δεδομένων. Για την εφαρμογή των μεθόδων και τη δημιουργία και υλοποίηση των αντίστοιχων έμπειρων συστημάτων χρησιμοποιούμε διάφορα εργαλεία όπως: (α) Το ACRES, το οποίο είναι ένα εργαλείο αυτόματης παραγωγής έμπειρων συστημάτων με συντελεστές βεβαιότητας. Οι συντελεστές βεβαιότητος μπορούν να υπολογίζονται κατά δύο τρόπους και επίσης παράγονται δύο τύποι έμπειρων συστημάτων που στηρίζονται σε δύο διαφορετικές μεθόδους συνδυασμού των συντελεστών βεβαιότητας (κατά MYCIN και μιας γενίκευσης αυτής του MYCIN με χρήση βαρών που υπολογίζονται μέσω ενός γενετικού αλγορίθμου). (β) Το WEKA, το οποίο είναι ένα εργαλείο που περιέχει αλγόριθμους μηχανικής μάθησης. Συγκεκριμένα, στην εργασία χρησιμοποιούμε τον αλγόριθμο J48, μια υλοποίηση του γνωστού αλγορίθμου C4.5, που παράγει δένδρα απόφασης, δηλ. κανόνες. (γ) Το CLIPS, το οποίο είναι ένα κέλυφος για προγραμματισμό με κανόνες. Εδώ, εξάγονται οι κανόνες από το δέντρο απόφασης του WEKA και υλοποιούνται στο CLIPS με ενδεχόμενες μετατροπές. (δ) Το FuzzyCLIPS, το οποίο επίσης είναι ένα κέλυφος για την δημιουργία ασαφών ΕΣ. Είναι μια επέκταση του CLIPS που χρησιμοποιεί ασαφείς κανόνες και συντελεστές βεβαιότητος. Εδώ, το έμπειρο σύστημα που παράγεται μέσω του CLIPS μετατρέπεται σε ασαφές έμπειρο σύστημα με ασαφοποίηση κάποιων μεταβλητών. (ε) Το GUI Ant-Miner, το οποίο είναι ένα εργαλείο για την εξαγωγή κανόνων κατηγοριοποίησης από ένα δοσμένο σύνολο δεδομένων. με τη χρήση ενός μοντέλου ακολουθιακής κάλυψης, όπως ο αλγόριθμος AntMiner.
Με βάση τις παραπάνω μεθόδους-εργαλεία δημιουργήθηκαν έμπειρα συστήματα από πέντε σύνολα δεδομένων κατηγοριοποίησης από τη βάση δεδομένων UCI Machine Learning Repository. Τα συστήματα αυτά αξιολογήθηκαν ως προς την ταξινόμηση με βάση γνωστές μετρικές (ορθότητα, ευαισθησία, εξειδίκευση και ακρίβεια). Από τη σύγκριση των μεθόδων και στα πέντε σύνολα δεδομένων, εξάγουμε τα παρακάτω συμπεράσματα: (α) Αν επιθυμούμε αποτελέσματα με μεγαλύτερη ακρίβεια και μεγάλη ταχύτητα, θα πρέπει μάλλον να στραφούμε στην εφαρμογή WEKA. (β) Αν θέλουμε να κάνουμε και παράλληλους υπολογισμούς, η μόνη εφαρμογή που μας παρέχει αυτή τη δυνατότητα είναι το FuzzyCLIPS, θυσιάζοντας όμως λίγη ταχύτητα και ακρίβεια. (γ) Όσον αφορά το GUI Ant-Miner, λειτουργεί τόσο καλά όσο και το WEKA όσον αφορά την ακρίβεια αλλά είναι πιο αργή μέθοδος. (δ) Σχετικά με το ACRES, λειτουργεί καλά όταν δουλεύουμε με υποσύνολα μεταβλητών, έτσι ώστε να παράγεται σχετικά μικρός αριθμός κανόνων και να καλύπτονται σχεδόν όλα τα στιγμιότυπα στο σύνολο έλεγχου. Στα σύνολα δεδομένων μας το ACRES δεν θεωρείται πολύ αξιόπιστο υπό την έννοια ότι αναγκαζόμαστε να δουλεύουμε με υποσύνολο μεταβλητών και όχι όλες τις μεταβλητές του συνόλου δεδομένων. Όσο πιο πολλές μεταβλητές πάρουμε ως υποσύνολο στο ACRES, τόσο πιο αργό γίνεται. / The aim of this thesis is the comparison of several classification methods that are based on knowledge representation with rules via the creation of expert systems from known data sets. For the application of those methods and the creation and implementation of the corresponding expert systems, we use various tools such as: (a) ACRES, which is a tool for automatic production of expert systems with certainty factors. The certainty factors can be calculated via two different methods and also two different types of expert systems can be produced based on different methods of certainty propagation (that of MYCIN and a generalized version of MYCIN one that uses weights calculated via a genetic algorithm). (b) WEKA, which is a tool that contains machine learning algorithms. Specifically, we use J48, an implementation of the known algorithm C4.5, which produces decision trees, which are coded rules. (c) CLIPS, which is a shell for rule based programming. Here, the rules encoded on the decision true produced by WEKA are extracted and codified in CLIPS with possible changes. (d) FuzzyCLIPS, which is a shell for creating fuzzy expert systems. It's an extension of CLIPS that uses fuzzy rules and certainty factors. Here, the expert system created via CLIPS is transferred to a fuzzy expert system by making some variables fuzzy. (e) GUI Ant-Miner, which is a tool for classification rules extraction from a given data set, using a sequential covering model, such as the AntMiner algorithm.
Based on the above methods-tools, expert systems were created from five (5) classification data sets from the UCI Machine Learning Repository. Those systems have been evaluated according to their classification capabilities based on known metrics (accuracy, sensitivity, specificity and precision). From the comparison of the methods on the five data sets, we conclude the following: (a) if we want results with greater accuracy and high speed, we should probably turn into WEKA. (b) if we want to do parallel calculations too, the only tool that provides us this capability is FuzzyCLIPS, sacrificing little speed and accuracy. (c) With regards to GUI Ant-Miner, it works as well as WEKA in terms of accuracy, but it is slower. (d) About ACRES, it works well when we work with subsets of the variables, so that it produces a relatively small number or rules and covers almost all the instances of the test set. For our datasets, ACRES is not considered very reliable in the sense that we should work with subsets of variables, not all the variables of the dataset. The more variables we consider as a subset in ACRES, the slower it becomes.
|
Page generated in 0.1115 seconds