• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 58
  • 36
  • 23
  • 8
  • 5
  • 5
  • 3
  • 3
  • 2
  • 1
  • Tagged with
  • 152
  • 152
  • 101
  • 101
  • 30
  • 29
  • 27
  • 26
  • 26
  • 25
  • 25
  • 21
  • 21
  • 20
  • 17
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

Enhancing association rules algorithms for mining distributed databases : integration of fast BitTable and multi-agent association rules mining in distributed medical databases for decision support

Abdo, Walid Adly Atteya January 2012 (has links)
Over the past few years, mining data located in heterogeneous and geographically distributed sites have been designated as one of the key important issues. Loading distributed data into centralized location for mining interesting rules is not a good approach. This is because it violates common issues such as data privacy and it imposes network overheads. The situation becomes worse when the network has limited bandwidth which is the case in most of the real time systems. This has prompted the need for intelligent data analysis to discover the hidden information in these huge amounts of distributed databases. In this research, we present an incremental approach for building an efficient Multi-Agent based algorithm for mining real world databases in geographically distributed sites. First, we propose the Distributed Multi-Agent Association Rules algorithm (DMAAR) to minimize the all-to-all broadcasting between distributed sites. Analytical calculations show that DMAAR reduces the algorithm complexity and minimizes the message communication cost. The proposed Multi-Agent based algorithm complies with the Foundation for Intelligent Physical Agents (FIPA), which is considered as the global standards in communication between agents, thus, enabling the proposed algorithm agents to cooperate with other standard agents. Second, the BitTable Multi-Agent Association Rules algorithm (BMAAR) is proposed. BMAAR includes an efficient BitTable data structure which helps in compressing the database thus can easily fit into the memory of the local sites. It also includes two BitWise AND/OR operations for quick candidate itemsets generation and support counting. Moreover, the algorithm includes three transaction trimming techniques to reduce the size of the mined data. Third, we propose the Pruning Multi-Agent Association Rules algorithm (PMAAR) which includes three candidate itemsets pruning techniques for reducing the large number of generated candidate itemsets, consequently, reducing the total time for the mining process. The proposed PMAAR algorithm has been compared with existing Association Rules algorithms against different benchmark datasets and has proved to have better performance and execution time. Moreover, PMAAR has been implemented on real world distributed medical databases obtained from more than one hospital in Egypt to discover the hidden Association Rules in patients' records to demonstrate the merits and capabilities of the proposed model further. Medical data was anonymously obtained without the patients' personal details. The analysis helped to identify the existence or the absence of the disease based on minimum number of effective examinations and tests. Thus, the proposed algorithm can help in providing accurate medical decisions based on cost effective treatments, improving the medical service for the patients, reducing the real time response for the health system and improving the quality of clinical decision making.

Αλγόριθμοι εξαγωγής κανόνων συσχέτισης και εφαρμογές

Μουσουρούλη, Ιωάννα 24 October 2008 (has links)
Η παρούσα μεταπτυχιακή εργασία έχει στόχο τη μελέτη προβλημάτων «κρυμμένης γνώσης» από συστήματα και εφαρμογές ηλεκτρονικού εμπορίου (e-commerce) και ηλεκτρονικής μάθησης (e-learning) με κύριο στόχο τη βελτίωση της ποιότητας και της απόδοσης των παρεχόμενων υπηρεσιών προς τους τελικούς χρήστες. Στο πρώτο κεφάλαιο παρουσιάζεται ένα σενάριο για σημασιολογικά εξατομικευμένο e-learning. Ο προτεινόμενος αλγόριθμος βασίζεται σε μια οντολογία (ontology) η οποία βοηθά στη δόμηση και στη διαχείριση του περιεχομένου που σχετίζεται με μια δεδομένη σειρά μαθημάτων, ένα μάθημα ή ένα θεματικό. Η διαδικασία χωρίζεται σε δύο στάδια: στο offline στάδιο το οποίο περιλαμβάνει τις ενέργειες προετοιμασίας των δεδομένων, δημιουργίας της οντολογίας και εξόρυξης από δεδομένα χρήσης (usage mining) και στο online στάδιο το οποίο περιλαμβάνει την εξαγωγή των εξατομικευμένων συστάσεων. Το προτεινόμενο σύστημα σε πρώτη φάση βρίσκει ένα αρχικό σύνολο συστάσεων βασισμένο στην οντολογία του πεδίου και στη συνέχεια χρησιμοποιεί τα frequent itemsets (συχνά εμφανιζόμενα σύνολα στοιχείων) για να το εμπλουτίσει, λαμβάνοντας υπόψη την πλοήγηση άλλων παρόμοιων χρηστών (similar users). Με τον τρόπο αυτό, μειώνεται ο χρόνος που απαιτείται για την ανάλυση όλων των frequent itemsets και των κανόνων συσχέτισης. Η ανάλυση εστιάζεται μόνο σε εκείνα τα σύνολα που προέρχονται από το συνδυασμό της ενεργούς συνόδου (current session) του χρήστη και των συστάσεων της οντολογίας. Αν και η εξατομίκευση απαιτεί αρκετά βήματα επεξεργασίας και ανάλυσης, στη συγκεκριμένη προσέγγιση το εμπόδιο αυτό αποφεύγεται με την εκτέλεση σημαντικού μέρους της διαδικασίας offline. Στο δεύτερο κεφάλαιο μελετάται το πρόβλημα της παραγωγής προτάσεων σε μια εφαρμογή e-commerce. Η προτεινόμενη υβριδική προσέγγιση στοχεύει στην παραγωγή αποτελεσματικών συστάσεων για τους πελάτες ενός online καταστήματος που ενοικιάζει κινηματογραφικές ταινίες. Η γνώση για τους πελάτες και τα προϊόντα προκύπτει από τα δεδομένα χρήσης και τη δομή της οντολογίας σε συνδυασμό με τις εκτιμήσεις-βαθμολογίες των πελατών για τις ταινίες καθώς και την εφαρμογή τεχνικών ταιριάσματος «όμοιων» πελατών. Όταν ένα ή περισσότερα κριτήρια ταιριάσματος ικανοποιούνται, τότε άλλες ταινίες μπορούν να προσδιοριστούν σύμφωνα με το οντολογικό σχήμα που έχουν παρόμοια χαρακτηριστικά με αυτές που ο πελάτης έχει ήδη νοικιάσει. Στην περίπτωση ενός νέου πελάτη όπου το ιστορικό του είναι κενό, αναλύονται πληροφορίες από την αίτηση εγγραφής του ώστε να ταξινομηθεί σε μια συγκεκριμένη κλάση πελατών και να παραχθούν προτάσεις με βάση το οντολογικό σχήμα. Αυτή η ενοποίηση παρέχει πρόσθετη γνώση για τις προτιμήσεις των πελατών και επιτρέπει την παραγωγή επιτυχημένων συστάσεων. Ακόμη και στην περίπτωση του «cold-start problem» όπου δεν είναι διαθέσιμη αρχική πληροφορία για τη συμπεριφορά του πελάτη, η προσέγγιση προβαίνει σε σχετικές συστάσεις. Στο τρίτο κεφάλαιο παρουσιάζεται μία νέα προσέγγιση στο πρόβλημα της δημιουργίας συστάσεων. Οι προηγούμενες προσεγγίσεις δεν λαμβάνουν υπόψη τους τη σειρά με την οποία ο χρήστης προσπελαύνει τα δεδομένα, είτε πρόκειται για e-learning είτε πρόκειται για e-commerce δεδομένα. Στο κεφάλαιο αυτό προτείνεται μία τεχνική η οποία λαμβάνει υπόψη τη σειρά με την οποία ο χρήστης προσπελαύνει τα δεδομένα (ordering). Πιο συγκεκριμένα μελετάται η τεχνική αυτή σε e-commerce συστήματα και καλάθια αγορών. Παρουσιάζεται και αναλύεται η υλοποίηση του προτεινόμενου αλγορίθμου. Επιπλέον γίνεται αξιολόγηση των αποτελεσμάτων του αλγορίθμου σε testing input data τα οποία και δείχνουν την ποιότητα των παραγόμενων συστάσεων. / -

Didelių duomenų sekų analizės problemos / Data mining problems

Ambraziūnas, Valdas 11 June 2004 (has links)
The main goal of these thesis is to compare association rules finding algorithms and to indicate the usability of finding association rules in business area. In order to achieve this goal, the theoretical analysis of three algorithms is done: 1. The Apriori algorithm – the most well known association rule algorithm – based on the property: “Any subset of a large itemset must be large”. This algorithm assumes that the database is memory-resident. The maximum number of database scans is one more than the cardinality of the largest large itemset. 2. The Sampling algorithm deals with the database sample prior the full database scan. The database sample is drawn such that it can be memory-resident. The Sampling algorithm reduces the number of database scans to one in the best case and two in the worst case. 3. The Partitioning algorithm divides database into partitions and bases on the property: “A large itemset must be large in at least one of the partitions”. This algorithm reduces the number of database scans to two and divides the database into partitions such that each partition can be placed into main memory. There are created programs for all three algorithms plus the program for the full set of itemsets algorithm. Programs are created in C++ language. In order to achieve topmost performance, the GUI is missed. Nine test data sets are created to compare the algorithms. Six of them contains real life data from telecommunications business area. Datasets varies from the... [to full text]

Association Rule Interactive Post-processing using Rule Schemas and Ontologies - ARIPSO

Marinica, Claudia 26 October 2010 (has links) (PDF)
This thesis is concerned with the merging of two active research domains: Knowledge Discovery in Databases (KDD), more precisely the Association Rule Mining technique, and Knowledge Engineering (KE) with a main interest in knowledge representation languages developed around the Semantic Web. In Data Mining, the usefulness of association rule technique is strongly limited by the huge amount and the low quality of delivered rules. Experiments show that rules become almost impossible to use when their number exceeds 100. At the same time, nuggets are often represented by those rare (low support) unexpected association rules which are surprising to the user. Unfortunately, the lower the support is, the larger the volume of rules becomes. Thus, it is crucial to help the decision maker with an efficient technique to reduce the number of rules. To overcome this drawback, several methods have been proposed in the literature such as itemset concise representations, redundancy reduction, filtering, ranking and post-processing. Even though rule interestingness strongly depends on user knowledge and goals, most of the existing methods are generally based on data structure. For instance, if the user looks for unexpected rules, all the already known rules should be pruned. Or, if the user wants to focus on specific family of rules, only this subset of rules should be selected. In this context, we address two main issues: the integration of user knowledge in the discovery process and the interactivity with the user. The first issue requires defining an adapted formalism to express user knowledge with accuracy and flexibility such as ontologies in the Semantic Web. Second, the interactivity with the user allows a more iterative mining process where the user can successively test different hypotheses or preferences and focus on interesting rules. The main contributions of this work can be summarized as follows: (i) A model to represent user knowledge. First, we propose a new rule-like formalism, called Rule Schema, which allows the user to define his/her expectations regarding the rules through ontology concepts. Second, ontologies allow the user to express his/her domain knowledge by means of a high semantic model. Last, the user can choose among a set of Operators for interactive processing the one to be applied over each Rule Schema (i.e. pruning, conforming, unexpectedness, . . . ). (ii) A new post-processing approach, called ARIPSO (Association Rule Interactive Post-processing using rule Schemas and Ontologies), which helps the user to reduce the volume of the discovered rules and to improve their quality. It consists in an interactive process integrating user knowledge and expectations by means of the proposed model. At each step of ARIPSO, the interactive loop allows the user to change the provided information and to reiterate the post-processing phase which produces new results. (iii) The implementation in post-processing of the proposed approach. The developed tool is complete and operational, and it implements all the functionalities described in the approach. Also, it makes the connection between different elements like the set of rules and rule schemas stored in PMML/XML files, and the ontologies stored in OWL files and inferred by the Pellet reasoner. (iv) An adapted implementation without post-processing, called ARLIUS (Association Rule Local mining Interactive Using rule Schemas), consisting in an interactive local mining process guided by the user. It allows the user to focus on interesting rules without the necessity to extract all of them, and without minimum support limit. In this way, the user may explore the rule space incrementally, a small amount at each step, starting from his/her own expectations and discovering their related rules. (v) The experimental study analyzing the approach efficiency and the discovered rule quality. For this purpose, we used a real-life and large questionnaire database concerning customer satisfaction. For ARIPSO, the experimentation was carried out in complete cooperation with the domain expert. For different scenarios, from an input set of nearly 400 thousand association rules, ARIPSO filtered between 3 and 200 rules validated by the expert. Clearly, ARIPSO allows the user to significantly and efficiently reduce the input rule set. For ARLIUS, we experimented different scenarios over the same questionnaire database and we obtained reduced sets of rules (less than 100) with very low support.

Current practices in athletic training clinical education

Pipkin, Jennifer B. January 2001 (has links)
Many reforms in athletic training education requirements have been taking place in order to strengthen the profession. This research project may help make educational institutions aware of the current changes and requirements in clinical education. The purpose of this study was to determine the current practices in athletic training clinical education at National Collegiate Athletic Association (NCAA) institutions and accredited versus non-accredited athletic training institutions.The participants (N = 93) consisted of a purposeful sample of head athletic trainers representing accredited and non-accredited athletic training education programs in the three divisions of the NCAA. The on-line survey instrument developed for this study, Current Practices in Athletic Training Clinical Education, obtained information about the demographics, clinical education of athletic training students, and the certified athletic trainer coverage of sports at NCAA institutions. The instrument was posted on an Internet website through the inQsit computer program. The respondents consisted of 28 (30.4%) head athletic trainers from Division I, 34 (37.0%) from Division II, and 30 from Division III (32.6%). Thirty-four (38.2%) respondents represented accredited athletic training education programs, 20 (22.5%) from athletic training programs in candidacy, and 35 (39.3%) from non-accredited or internship programs. Data was analyzed through percentages and frequency counts, and trend analysis and nonparametric Pearson chi square analyses. Pearson chi-square analyses revealed that Division I permits athletic training students to cover individual skill sessions and informal summer workouts unsupervised more often than the other NCAA divisions. Chi-square analyses also found that athletic training students at accredited athletic training education programs were more likely to possess CPR and first-aid certification and education on the prevention of disease transmission. A trend analysis was performed to determine the amount of time freshmen, sophomores, juniors, and seniors spent in direct clinical supervision, supervised field experience, and unsupervised field experience. A linear relationship was found with respect to direct clinical supervision, and quadratic relationships were found with respect to supervised and unsupervised field experience. The results also revealed that athletic training programs that are accredited or candidacy were more likely to respond to 81 to 100% of the moderate risk sports within four minutes or less than those programs that are internship. The common perception of many athletic trainers regarding clinical education and the misuse of athletic training students is inconsistent with the current practices. Overall, athletic training students were seldom unsupervised for team practices and home events. In conclusion, the results of this study indicate that collegiate athletic trainers have adjusted well to the recent changes in clinical education requirements and to the medical health care coverage recommended guidelines. Future research should address athletic training student and athletic training program director responses relative to their head athletic trainers' responses. / School of Physical Education

Aplicando algoritmos de mineração de regras de associação para recuperação de informações multilíngues. / Cross-language information retrieval using algorithms for mining association rules

Geraldo, André Pinto January 2009 (has links)
Este trabalho propõe a utilização de algoritmos de mineração de regras de associação para a Recuperação de Informações Multilíngues. Esses algoritmos têm sido amplamente utilizados para analisar transações de registro de vendas. A ideia é mapear o problema de encontrar associações entre itens vendidos para o problema de encontrar termos equivalentes entre idiomas diferentes em um corpus paralelo. A proposta foi validada por meio de experimentos com diferentes idiomas, conjuntos de consultas e corpora. Os resultados mostram que a eficácia da abordagem proposta é comparável ao estado da arte, ao resultado monolíngue e à tradução automática de consultas, embora este utilize técnicas mais complexas de processamento de linguagem natural. Foi criado um protótipo que faz consultas à Web utilizando o método proposto. O sistema recebe palavras-chave em português, as traduz para o inglês e submete a consulta a diversos sites de busca. / This work proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyze market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using different languages, queries and corpora. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. A prototype for cross-language web querying was implemented to test the proposed method. The system accepts keywords in Portuguese, translates them into English and submits the query to several web-sites that provide search functionalities.

Aplicando algoritmos de mineração de regras de associação para recuperação de informações multilíngues. / Cross-language information retrieval using algorithms for mining association rules

Geraldo, André Pinto January 2009 (has links)
Este trabalho propõe a utilização de algoritmos de mineração de regras de associação para a Recuperação de Informações Multilíngues. Esses algoritmos têm sido amplamente utilizados para analisar transações de registro de vendas. A ideia é mapear o problema de encontrar associações entre itens vendidos para o problema de encontrar termos equivalentes entre idiomas diferentes em um corpus paralelo. A proposta foi validada por meio de experimentos com diferentes idiomas, conjuntos de consultas e corpora. Os resultados mostram que a eficácia da abordagem proposta é comparável ao estado da arte, ao resultado monolíngue e à tradução automática de consultas, embora este utilize técnicas mais complexas de processamento de linguagem natural. Foi criado um protótipo que faz consultas à Web utilizando o método proposto. O sistema recebe palavras-chave em português, as traduz para o inglês e submete a consulta a diversos sites de busca. / This work proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyze market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using different languages, queries and corpora. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. A prototype for cross-language web querying was implemented to test the proposed method. The system accepts keywords in Portuguese, translates them into English and submits the query to several web-sites that provide search functionalities.

3D matrix-based visualization system of association rules

Wang, Biying January 2016 (has links)
With the growing number of association rules, it becomes moreand more difficult for users to explore interesting rules due toits nature complexity. Studies base on human perception andintuition show that graphical representation could be a betterillustration of how to handle data by using the capabilities ofthe human visual system to seek information. The 3D matrixbasedapproach visualization system of association rules called3DMVS was implemented in present study. The main visualrepresentation employed the extended matrix-based approachwith rule-to-items mapping to general transaction data set. Anovel method merging rules and assigning weight is proposedto generate new rules to reduce the dimension of theassociation rules, which will help users to find more importantitems in the new rule. Additionally, several interactions suchas sorting, filtering, zoom and rotation, facilitate decisionmakers to explore the rules they are interested in variousaspects. Finally, various evaluation techniques have beenemployed to assess the system from a logical reasoning pointof view.

Pós-processamento de regras de associação via redes e propagação de rótulos / Post-processing association rules using networks and label propagation

Renan de Padua 27 February 2015 (has links)
Dentre as técnicas de mineração existentes encontra-se a associação, responsável por identificar relações que ocorrem no conjunto de dados. Embora a associação seja uma das técnicas mais utilizadas, a quantidade de padrões extraídos pode vir a sobrecarregar o usuário de tal maneira que encontrar algo interessante dentre a imensidão de padrões obtidos passa a ser um novo desafio. Para solucionar esse problema, uma grande parte dos trabalhos relacionados à associação está voltada a etapa de pós-processamento. Esses trabalhos geralmente propõem abordagens de pós-processamento que visam, segundo determinada estratégia, facilitar a busca pelos padrões interessantes ao domínio. Nos últimos anos, essas abordagens têm incluído no processo o conhecimento e/ou interesse do usuário sobre o domínio. Contudo, nas abordagens atualmente existentes, o usuário deve, por meio de algum formalismo descrever explicitamente seu conhecimento e/ou interesse, requerendo do usuário um tempo considerável, podendo levar, inclusive, a especificações incompletas e/ou incorretas. Além disso, na maioria das vezes, o usuário não tem ideia do que é provavelmente interessante, nem a partir de quais relações iniciar a busca. Nota-se, portanto, que um dos desafios dessas abordagens é considerar o conhecimento e/ou interesse do usuário. Além disso, é necessário considerar também o número de regras que o usuário analisará. A análise de regras feita por um especialista é custosa e, na maioria dos casos, o usuário quer explorar as regras geradas sem limitar a exploração ao conhecimento que ele já possui. Portanto, é importante que o usuário avalie o menor número de regras possível e, com base nessa avaliação, abordagens de pós-processamento consigam o auxiliar na busca pelas regras que ele poderá considerar interessante. Para tanto, é proposto neste trabalho que o pós-processamento seja tratado como um problema de classificação semissupervisionada transdutiva, uma vez que permite que o usuário rotule, considerando classes pré-definidas (por exemplo, \"Interessante\" ou \"Não Interessante\"), apenas algumas regras do conjunto a ser explorado para que todas as outras regras sejam automaticamente rotuladas. Além disso, por meio da definição dos rótulos de algumas regras, é possível capturar implicitamente o conhecimento e/ou interesse do usuário sobre o domínio. Para tanto, é necessário que as regras sejam modeladas de maneira a permitir: (a) selecionar as regras a serem rotuladas pelo usuário a fim de capturar implicitamente seu conhecimento e/ou interesse; (b) propagar os rótulos das regras já classificadas pelo usuário a todas as outras regras não rotuladas. Desse modo, neste trabalho, as regras foram modeladas via redes, uma vez que: (i) uma vasta quantidade de medidas de exploração de redes pode ser utilizada, em conjunto com as informações fornecidas pelo usuário, a fim de viabilizar o item (a); (ii) algoritmos de propagação de rótulos podem ser utilizados a fim de viabilizar o item (b). Diante do apresentado, ressalta-se que as contribuições deste trabalho estão na capacidade de se extrair o conhecimento e/ou interesse do usuário de acordo com as características da base de dados e direcionar sua exploração sem a necessidade de se definir previamente o que será explorado. Além disso, os resultados obtidos demonstram a capacidade da PARLP em direcionar o usuário para o conhecimento considerado interessante, reduzindo, para tanto, a quantidade de regras a serem exploradas. Por fim, este trabalho contribui também para demonstrar que é possível tratar o pós-processamento de regras de associação como um problema de propagação de rótulos. / One of the existing data mining techniques is association rules, responsible for identifying relationships that occur in the data set. Although the association rule is one of the most widely used techniques, the amount of extracted patterns can overload the user in such a way that finding interesting patterns among the large amount of obtained patterns becomes a challenge. To solve this problem, a large part of the association-related work is focused on the post-processing step. These works generally propose a post-processing approaches that, according to a certain strategy, aims facilitating the search for interesting patterns. Nowadays, approaches have included the user knowledge in the domain and / or interests on the process. However, in the current existing approaches, the user knowledge and/or interest must be explicitly described by some formalism, requiring a considerable time and may even lead to incomplete and / or incorrect specifications. In addition, the user has no idea what probably is interesting or which patterns to begin the searching. Notice that one of the challenges of these approaches is to consider the knowledge and / or user interest. In addition, consider the number of rules the user will examine is necessary. The analysis of the rules by an expert is expensive and, in most cases, the user wants to explore the rules generated without limiting exploration to the knowledge he already has. Therefore, the user evaluate the fewest amount of rules possible is important and, based on this assessment, the post-processing approaches be able to assist in the search for the rules that he may consider interesting. So, in this work is proposed that the post-processing is treated as a transductive semi supervised classification problem, since it allows the user to label some rules based on two predefined classes (e.g. \"interesting\"or \"not interesting\"), in a way that just a small amount of the rule set needs to be explored and all other association rules are automatically labeled. Furthermore, you can implicitly capture the knowledge and / or user interest in the domain by labeling some rules. Thus, the rules need to be modeled to allow: (a) select the rules to be labeled by the user to implicitly capture their knowledge and / or interest; (b) propagate the rules\' labels classified by the user to all not labeled rules. To do so, the rules were modeled via networks in this work, due to: (i) a large amount of network measures can be used in conjunction with the information provided by the user, to make item (a) possible; (ii) label propagation algorithms can be used in order to make item (b) possible. Therefore, we highlight that the contributions of this work are the ability to extract knowledge and / or user interest according to database characteristics and direct the user exploration without previously defining what will be explored. In addition, the results demonstrate that the proposed approach is able to direct the user to the knowledge considered interesting, reducing the amount of rules to be explored. Finally, this work also contributes to demonstrate that treat the post-processing of association rules as a problem of propagation of labels is possible.

Generalização de regras de associação utilizando conhecimento de domínio e avaliação do conhecimento generalizado / Generalization of association rules through domain knowledge and generalized knoeledge evaliation

Veronica Oliveira de Carvalho 23 August 2007 (has links)
Dentre as técnicas de mineração de dados encontra-se a associação, a qual identifica todas as associações intrínsecas contidas na base de dados. Entretanto, essa característica, vantajosa por um lado, faz com que um grande número de padrões seja gerado, sendo que muito deles, mesmo sendo estatisticamente aceitos, são triviais, falsos, ou irrelevantes à aplicação. Além disso, a técnica de associação tradicional gera padrões compostos apenas por itens contidos na base de dados, o que leva à extração, em geral, de um conhecimento muito específico. Essa especificidade dificulta a obtenção de uma visão geral do domínio pelos usuários finais, que visam a utilização/exploração de conhecimentos úteis e compreensíveis. Assim, o pós-processamento das regras descobertas se torna um importante tópico, uma vez que há a necessidade de se validar as regras obtidas. Diante do exposto, este trabalho apresenta uma abordagem de pós-processamento de regras de associação que utiliza conhecimento de domínio, expresso via taxonomias, para obter um conjunto de regras de associação generalizadas compacto e representativo. Além disso, a fim de avaliar a representatividade de padrões generalizados, é apresentado também neste trabalho um estudo referente à utilização de medidas de interesse objetivas quando aplicadas a regras de associação generalizadas. Nesse estudo, a semântica da generalização é levada em consideração, já que cada uma delas fornece uma visão distinta do domínio. Como resultados desta tese, foi possível observar que: um conjunto de regras de associação pode ser compactado na presença de um conjunto de taxonomias; para cada uma das semânticas de generalização existe um conjunto de medidas mais apropriado para ser utilizado na avaliação de regras generalizadas / The association technique, one of the data mining techniques, identifies all the intrinsic associations in database. This characteristic, which can be advantageous on the one hand, generates a large number of patterns. Many of these patterns, even statistically accepted, are trivial, spurious, or irrelevant to the application. In addition, the association technique generates patterns composed only by items in database, which in general implies a very specific knowledge. This specificity makes it difficult to obtain a general view of the domain by the final users, who aims the utilization/exploration of useful and comprehensible knowledge . Thus, the post-processing of the discovered rules becomes an important topic, since it is necessary to validate the obtained rules. In this context, this work presents an approach for post-processing association rules that uses domain knowledge, expressed by taxonomies, to obtain a reduced and representative generalized association rule set. In addition, in order to evaluate the representativeness of generalized patterns, a study referent to the use of objective interest measures when applied to generalized association rules is presented. In this study, the generalization semantics is considered, since each semantic provides a distinct view of the domain. As results of this thesis, it was possible to observe that: an association rule set can be compacted with a taxonomy set; for each generalization semantic there is a measure set that is more appropriate to be used in the generalized rules evaluation

Page generated in 0.0482 seconds