Global ETD Search

51	"Pós-processamento de regras de associação" / Post-processing of association rules Edson Augusto Melanda 30 November 2004 (has links) A demanda por métodos de análise e descoberta de conhecimento em grandes bases de dados tem fortalecido a pesquisa em Mineração de Dados. Dentre as tarefas associadas a essa área, tem-se Regras de Associação. Vários algoritmos foram propostos para tratamento de Regras de Associação, que geralmente tem como resultado um elevado número de regras, tornando o Pós-processamento do conhecimento uma etapa bastante complexa e desafiadora. Existem medidas para auxiliar essa etapa de avaliação de regras, porém existem lacunas referentes a inexistência de um método intuitivo para priorizar e selecionar regras. Além disso, não é possível encontrar metodologias específicas para seleção de regras considerando mais de uma medida simultaneamente. Esta tese tem como objetivo a proposição, desenvolvimento e implementação de uma metodologia para o Pós-processamento de Regras de Associação. Na metodologia proposta, pequenos grupos de regras identificados como potencialmente interessantes são apresentados ao usuário especialista para avaliação. Para tanto, foram analisados métodos e técnicas utilizadas em Pós-processamento de conhecimento, medidas objetivas para avaliação de Regras de Associação e algoritmos que geram regras. Dessa perspectiva foram realizados experimentos para identificar o potencial das medidas a serem empregadas como filtros de Regras de Associação. Uma avaliação gráfica apoiou o estudo das medidas e a especificação da metodologia proposta. Aspecto inovador da metodologia proposta é a utilização do método de Pareto e a combinação de medidas para selecionar as Regras de Associação. Por fim foi implementado um ambiente para avaliação de Regras de Associação, denominado ARInE, viabilizando o uso da metodologia proposta. / The large demand of methods for knowledge discovery and analysis in large databases has continously increased the research in data mining area. Among the tasks associated to this area, one can find Association Rules. Several algorithms have been proposed for treating Association Rules. However, these algorithms give as results a huge amount of rules, making the knowledge post-processing phase very complex and challeging. There are several measures that can be used in this evaluation phase, but there are also some limitations regarding to the ausence of an intuitive method to rank and select rules. Moreover, it is not possible to find especific methodologies for selecting rules, considering more than one measure simultaneously. This thesis has as objective the proposal, development and implementation of a postprocessing methodology for Association Rules. In the proposed methodology, small groups of rules, which have been identified as potentialy interesting, are presented to the expert for evaluation. In this sense, methods and techniques for knowledge post-processing, objective measures for rules evaluation, and Association Rules algorithms have been analized. From this point of view, several experiments have been realized for identifying the potential of such measures to be used to filter Association Rules. The study of measures and the specification of the proposed methodology have been supported by a graphical evaluation. The novel aspect of the proposed methodology consists on using the Paretos method and combining measures for selecting Association Rules. Finally, an enviroment for evaluating Association Rules, named as ARInE, has been implemented according to the proposed methodology. Mineração de dados Pós-processamento Regras de Associação Association Rules Data mining post-processing
52	Método para mapeamento entre terminologias em saúde, visando a interoperabilidade entre sistemas de informação / Method for the mapping between health terminologies aiming systems interoperability Thiago Fernandes de Freitas Dias 11 September 2014 (has links) A alta disponibilidade de informações em saúde por meio de sistemas de informação só pode ser proporcionada com a utilização de sistemas que sejam capazes de trocar dados de forma segura e consistente. Para isso, estes sistemas necessitam ser interoperáveis, capazes de trocar informações. Uma das características mais importantes de tais sistemas é a utilização de terminologias em saúde, permitindo a codificação dos termos clínicos de maneira robusta e consistente. Algumas das terminologias mais conhecidas e utilizadas são: SNOMED-CT, ICD-CM, ICD, LOINC, NANDA, TUSS, CBHPM, Tabela de Procedimentos SUS, entre outras. Quando os sistemas não se utilizam de uma mesma terminologia para codificação de um mesmo conceito é necessário a realização de mapeamentos e traduções entre as terminologias. O mapeamento entre terminologias consiste em estabelecer as associações pertinentes às terminologias para que cada termo pertencente a uma possa ser associado a algum termo da outra. Este mapeamento, geralmente, é criado por especialistas de domínio, que atuam analisando as duas terminologias em questão e estabelecendo manualmente estas associações. Neste trabalho, propomos uma metodologia que visa facilitar a realização deste tipo de mapeamento, por meio da utilização de dois recursos: Regras de Associação, para extração das associações preexistentes entre as terminologias em registros clínicos; e Busca Textual, para pareamento entre conceitos das duas terminologias baseado na identificação de termos comuns. O auxílio à criação destes mapeamentos é proporcionado por meio de sugestões de relações existentes entre as terminologias. Como resultado deste trabalho obtivemos uma metodologia genérica de mapeamento entre terminologias capaz de auxiliar com sucesso os especialistas. Em aproximadamente 40% dos casos os especialistas concordaram com uma das sugestões apresentadas. De forma complementar, obtivemos o mapeamento parcial entre duas terminologias: a ICD9-CM e a TUSS, utilizadas como caso de uso para validação da metodologia. / The high availability of health information through information systems can be provided only with the use of systems that are able to exchange data securely and consistently. To this end, these systems need to be interoperable, capable of exchanging information that is understood both at one end as the other. One of the most important characteristics of such systems is the use of terminologies in health, allowing the coding of clinical terms in a robust and consistent manner. Some of the most known and used terminologies are: SNOMED-CT, ICD-CM, ICD, LOINC, NANDA, TUSS, CBHPM, and SUS Procedures Table, among others. When systems do not use the same terminology for encoding the same concept, it is necessary to perform mappings and translations between the terminologies. The mapping between terminologies consists on establishing the relevant associations present in terminologies, so that each term belonging to one can be associated unambiguously to the terms belonging to another. This mapping is typically created by domain experts who work analyzing the two terms in question and manually setting these associations. In this paper, we propose a methodology that aims to facilitate this type of mapping, through the use of two frameworks: Association Rules, for the extraction of preexisting associations between the terminologies in clinical records; and Textual Search, for pairing between the two terminologies concepts based on the identification of common terms. The creation of these mappings by experts is aided by the method suggesting links between the terminologies through the Association Rules or Textual Search. As a result of this work we obtained a generic methodology for mapping between terminologies able to successfully assist the experts. In approximately 40% of cases the experts agreed with the suggestions. As a complement, we obtained a partial mapping between two specific terminologies for coding surgical procedures: the ICD9-CM and TUSS, used as use case to validate the methodology. Interoperabilidade Regras de associação Terminologias em saúde Association rules Health terminologies Interoperability
53	FINDING TEMPORAL ASSOCIATION RULES BETWEEN FREQUENT PATTERNS IN MULTIVARIATE TIME SERIES TATAVARTY, GIRIDHAR 03 April 2006 (has links) No description available. Computer Science multivariate time series data mining temporal association rules suffix trees summarization
54	資料挖掘在房地產價格上之運用 / Data Mining Technique with an Application to the Real Estate Price Estimation 高健維 Unknown Date (has links) 在現今資訊潮流中，企業的龐大資料庫可藉由統計及人工智慧的科學技術尋找出有價值的隱藏事件。利用資料做深入分析，找出其中的知識，並根據企業的問題，建立不同的模型，進而提供企業進行決策時的參考依據。資料挖掘的工作是近年來資料庫應用領域中相當熱門的議題。它雖是個神奇又時髦的技術，卻不是一門創新的學問。美國政府在第二次世界大戰前，就於人口普查以及軍事方面使用資料挖掘的分析方法。隨著資訊科技的進展，新工具的出現，以及網路通訊技術的發展，常常能超越歸納範圍的關係來執行資料挖掘，而由資料堆中挖掘寶藏，使資料挖掘成為企業智慧的一部份。在本篇論文當中，將資料挖掘技術中的關聯法則 ( Association Rule ) 運用至房地產的價格分析，進而提供有效的關聯法則，對於複雜之房價與週邊環境因素作一整合探討。購屋者將有一適當依循的投資計畫，房產業者亦可針對適當的族群做出適當的銷售企畫。 / At this technological stream of time, it is able to extract the value of corporations’ large data sets by applying the knowledge of statistics and the scientific techniques from artificial intelligence. Through the use of these algorithms, the database will be analyzed and its knowledge will be generated. In addition to these, data models will be sorted by different corporation issues resulting in the reference for any strategic decision processes. More advantages are the predictions of future events and how much public is willing to contribute and feedback to new products or promotions. The probability of outcomes will be helpful as references since this information is referable to ensure companies providing quality services at the right time. In another words, companies will have clues in attempts to understand and familiarize their customers’ needs, wants and behaviors, as a result of delivering best services for customers’ satisfactions. Data mining is such a new knowledge that is commonly discussed in the field of database applications. Although it is a relatively new term, the technology is not exactly due to the analysis methods used. Before World War II, the analysis techniques were used in particular to the statistics in census or cases related to military affairs by the US government. Knowledge discovery has been one part of business intelligence in current corporations because these new techniques are inherently geared towards explicit information, rather than just simple analysis. By applying association rules from knowledge discovery technology, this dissertation will provide a discussion of price estimation in real estates. This discussion is involved in investigations into diverse housing prices resulting from the factors of surrounding environment. By referring to this association rule, buyers will acquire information about investment plans while housing agents will gain knowledge for their plans or projects in particular to their target markets. 資料挖掘 Apriori演算法關聯法則複合維度關聯法則 data mining Apriori algorithm association rules multi-dimensional association rules
55	Fuzzy GUHA / Fuzzy GUHA Ralbovský, Martin January 2006 (has links) The GUHA method is one of the oldest methods of exploratory data analysis, which is regarded as part of the data mining or knowledge discovery in databases (KDD) scienti_c area. Unlike many other methods of data mining, the GUHA method has firm theoretical foundations in logic and statistics. In scope of the method, finding interesting knowledge corresponds to finding special formulas in satisfactory rich logical calculus, which is called observational calculus. The main topic of the thesis is application of the "fuzzy paradigm" to the GUHA method By the term "fuzzy paradigm" we mean approaches that use many-valued membership degrees or truth values, namely fuzzy set theory and fuzzy logic. The thesis does not aim to cover all the aspects of this application, it emphasises mainly on: - Association rules as the most prevalent type of formulas mined by the GUHA method - Usage of fuzzy data - Logical aspects of fuzzy association rules mining - Comparison of the GUHA theory to the mainstream fuzzy association rules - Implementation of the theory using the bit string approach The thesis throughoutly elaborates the theory of fuzzy association rules, both using the theoretical apparatus of fuzzy set theory and fuzzy logic. Fuzzy set theory is used mainly to compare the GUHA method to existing mainstream approaches to formalize fuzzy association rules, which were studied in detail. Fuzzy logic is used to define novel class of logical calculi called logical calculi of fuzzy association rules (LCFAR) for logical representation of fuzzy association rules. The problem of existence of deduction rules in LCFAR is dealt in depth. Suitable part of the proposed theory is implemented in the Ferda system using the bit string approach. In the approach, characteristics of examined objects are represented as strings of bits, which in the crisp case enables efficient computation. In order to maintain this feature also in the fuzzy case, a profound low level testing of data structures and algoritms for fuzzy bit strings have been carried out as a part of the thesis.
56	Association Rule Based Classification Palanisamy, Senthil Kumar 03 May 2006 (has links) In this thesis, we focused on the construction of classification models based on association rules. Although association rules have been predominantly used for data exploration and description, the interest in using them for prediction has rapidly increased in the data mining community. In order to mine only rules that can be used for classification, we modified the well known association rule mining algorithm Apriori to handle user-defined input constraints. We considered constraints that require the presence/absence of particular items, or that limit the number of items, in the antecedents and/or the consequents of the rules. We developed a characterization of those itemsets that will potentially form rules that satisfy the given constraints. This characterization allows us to prune during itemset construction itemsets such that neither they nor any of their supersets will form valid rules. This improves the time performance of itemset construction. Using this characterization, we implemented a classification system based on association rules and compared the performance of several model construction methods, including CBA, and several model deployment modes to make predictions. Although the data mining community has dealt only with the classification of single-valued attributes, there are several domains in which the classification target is set-valued. Hence, we enhanced our classification system with a novel approach to handle the prediction of set-valued class attributes. Since the traditional classification accuracy measure is inappropriate in this context, we developed an evaluation method for set-valued classification based on the E-Measure. Furthermore, we enhanced our algorithm by not relying on the typical support/confidence framework, and instead mining for the best possible rules above a user-defined minimum confidence and within a desired range for the number of rules. This avoids long mining times that might produce large collections of rules with low predictive power. For this purpose, we developed a heuristic function to determine an initial minimum support and then adjusted it using a binary search strategy until a number of rules within the given range was obtained. We implemented all of our techniques described above in WEKA, an open source suite of machine learning algorithms. We used several datasets from the UCI Machine Learning Repository to test and evaluate our techniques. Itemset Pruning Association Rules Adaptive Minimal Support Associative Classification Classification Association rule mining Data mining Classification Data processing
57	DARM: Distance-Based Association Rule Mining Icev, Aleksandar 06 May 2003 (has links) The main goal of this thesis work was to develop, implement and evaluate an algorithm that enables mining association rules from datasets that contain quantified distance information among the items. This was accomplished by extending and enhancing the Apriori Algorithm, which is the standard algorithm to mine association rules. The Apriori algorithm is not able to mine association rules that contain distance information among the items that construct the rules. This thesis enhances the main Apriori property by requiring itemsets forming rules to“deviate properly" in addition to satisfying the minimal support threshold. We say that an itemset deviates properly if all combinations of pair-wise distances among the items are highly conserved in the dataset instances where these items occur. This thesis introduces the notion of proper deviation and provides the precise procedure and measures that characterize it. Integrating the notion of distance preserving frequent itemset and proper deviation into the standard Apriori algorithm leads to the construction of our Distance-Based Association Rule Mining (DARM) algorithm. DARM can be applied in data mining and knowledge discovery from genetic, financial, retail, time sequence data, or any domain where the distance information between items is of importance. This thesis chose the area of gene expression and regulation in eukaryotic organisms as the application domain. The data from the domain was used to produce DARM rules. Sets of those rules were used for building predictive models. The accuracy of those models was tested. In addition, predictive accuracies of the models built with and without distance information were compared. spatial data mining distance-based association rules distance-based Apriori algorithm Data mining Gene expression Data processing Eukaryotic cells
58	Association Rule Mining for Collaborative Recommender Systems Lin, Weiyang 15 May 2000 (has links) This thesis provides a novel approach to using data mining for e-commerce. The focus of our work is to apply association rule mining to collaborative recommender systems, which recommend articles to a user on the basis of other users' ratings for these articles as well as the similarities between this user's and other users' tastes. In this work, we propose a new algorithm for association rule mining specially tailored for use in collaborative recommendation. We make recommendations by exploring associations between users, associations between articles, and a combination of the two. We experimentally evaluated our approach on real data for many different parameter settings and compared its performance with that of other approaches under similar experimental conditions. Through our analysis and experiments, we have found that association rules are quite appropriate for collaborative recommendation domains and that they can achieve a performance that is comparable to current state of the art in recommender systems research. Data Mining Association Rules Electronic Commerce Collaborative Recommender Systems Association marketing Internet marketing Electronic commerce Recommender systems (Computer science)
59	Une approche de recherche d'images basée sur la sémantique et les descripteurs visuels / An Image Retrieval approach based on semantics and visual features Allani Atig, Olfa 27 June 2017 (has links) La recherche d’image est une thématique de recherche très active. Plusieurs approches permettant d'établir un lien entre les descripteurs de bas niveau et la sémantique ont été proposées. Parmi celles-là, nous citons la reconnaissance d'objets, les ontologies et le bouclage de pertinence. Cependant, leur limitation majeure est la haute dépendance d’une ressource externe et l'incapacité à combiner efficacement l'information visuelle et sémantique. Cette thèse propose un système basé sur un graphe de patrons, la sélection ciblée des descripteurs pour la phase en ligne et l'amélioration de la visualisation des résultats. L'idée est de (1) construire un graphe de patrons composé d'une ontologie modulaire et d'un modèle basé graphe pour l'organisation de l'information sémantique, (2) de construire un ensemble de collections de descripteurs pour guider la sélection des descripteurs à appliquer durant la recherche et (3) améliorer la visualisation des résultats en intégrant les relations sémantiques déduite du graphe de patrons.Durant la construction de graphe de patrons, les modules ontologiques associés à chaque domaine sont automatiquement construits. Le graphe de régions résume l'information visuelle en un format plus condensé et la classifie selon son domaine. Le graphe de patrons est déduit par composition de modules ontologiques. Notre système a été testé sur trois bases d’images. Les résultats obtenus montrent une amélioration au niveau du processus de recherche, une meilleure adaptation des descripteurs visuels utilisés aux domaines couverts et une meilleure visualisation des résultats qui diminue le niveau d’abstraction par rapport à leur logique de génération. / Image retrieval is a very active search area. Several image retrieval approaches that allow mapping between low-level features and high-level semantics have been proposed. Among these, one can cite object recognition, ontologies, and relevance feedback. However, their main limitation concern their high dependence on reliable external resources and lack of capacity to combine semantic and visual information.This thesis proposes a system based on a pattern graph combining semantic and visual features, relevant visual feature selection for image retrieval and improvement of results visualization. The idea is (1) build a pattern graph composed of a modular ontology and a graph-based model, (2) to build visual feature collections to guide feature selection during online retrieval phase and (3) improve the retrieval results visualization with the integration of semantic relations.During the pattern graph building, ontology modules associated to each domain are automatically built using textual corpuses and external resources. The region's graphs summarize the visual information in a condensed form and classify it given its semantics. The pattern graph is obtained using modules composition. In visual features collections building, association rules are used to deduce the best practices on visual features use for image retrieval. Finally, results visualization uses the rich information on images to improve the results presentation.Our system has been tested on three image databases. The results show an improvement in the research process, a better adaptation of the visual features to the domains and a richer visualization of the results. Recherche d’images Ontologie Ontologie modulaire Patron Graphe Règles d’association Image retrieval Ontology Modular ontology Pat- tern Graph Association rules
60	Classificação linear de bovinos: criação de um modelo de decisão baseado na conformação de tipo “true type” como auxiliar a tomada de decisão na seleção de bovinos leiteiros Sousa, Rogério Pereira de 29 August 2016 (has links) Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-11-01T15:54:48Z No. of bitstreams: 1 Rogério Pereira de Sousa_.pdf: 946780 bytes, checksum: ceb6c981273e15ecc58fe661bd02a34a (MD5) / Made available in DSpace on 2016-11-01T15:54:48Z (GMT). No. of bitstreams: 1 Rogério Pereira de Sousa_.pdf: 946780 bytes, checksum: ceb6c981273e15ecc58fe661bd02a34a (MD5) Previous issue date: 2016-08-29 / IFTO - Instituto Federal de Educação, Ciência e Tecnologia do Tocantins / A seleção de bovinos leiteiros, através da utilização do sistema de classificação com características lineares de tipo, reflete no ganho de produção, na vida produtiva do animal, na padronização do rebanho, entre outros. Esta pesquisa operacional obteve suas informações através de pesquisas bibliográficas e análise de base de dados de classificações reais. O presente estudo, objetivou a geração de um modelo de classificação de bovinos leiteiros baseado em “true type”, para auxiliar os avaliadores no processamento e análise dos dados, ajudando na tomada de decisão quanto a seleção da vaca para aptidão leiteira, tornando os dados seguros para futuras consultas. Nesta pesquisa, aplica-se métodos computacionais à classificação de vacas leiteiras mediante a utilização mineração de dados e lógica fuzzy. Para tanto, realizou-se a análise em uma base de dado com 144 registros de animais classificados entre as categorias boa e excelente. A análise ocorreu com a utilização da ferramenta WEKA para extração de regras de associação com o algoritmo apriori, utilizando como métricas objetivas, suporte / confiança, e lift para determinar o grau de dependência da regra. Para criação do modelo de decisão com lógica fuzzy, fez-se uso da ferramenta R utilizando o pacote sets. Por meio dos resultados obtidos na mineração de regras, foi possível identificar regras relevantes ao modelo de classificação com confiança acima de 90%, indicando que as características avaliadas (antecedente) implicam em outras características (consequente), com uma confiança alta. Quanto aos resultados obtidos pelo modelo de decisão fuzzy, observa-se que, o modelo de classificação baseado em avaliações subjetivas fica suscetível a erros de classificação, sugerindo então o uso de resultados obtidos por regras de associação como forma de auxílio objetivo na classificação final da vaca para aptidão leiteira. / The selection of dairy cattle through the use of the rating system with linear type traits, reflected in increased production, the productive life of the animal, the standardization of the flock, among others. This operational research obtained their information through library research and basic analysis of actual ratings data. This study aimed to generate a dairy cattle classification model based on "true type" to assist the evaluators in the processing and analysis of data, helping in decision making and the selection of the cow to milk fitness, making the data safe for future reference. In this research, applies computational methods to the classification of dairy cows by using data mining and fuzzy logic. Therefore, we conducted the analysis on a data base with 144 animals records classified between good and excellent categories. Analysis is made with the use of WEKA tool for extraction of association rules with Apriori algorithm, using as objective metrics, support / confidence and lift to determine the degree of dependency rule. To create the decision model with fuzzy logic, it was made use of R using the tool sets package. Through the results obtained in the mining rules, it was possible to identify the relevant rules with confidence classification model above 90%, indicating that the characteristics assessed (antecedent) imply other characteristics (consequent), with a high confidence. As for the results obtained by the fuzzy decision model, it is observed that the classification model based on subjective assessments is susceptible to misclassification, suggesting then the use of results obtained by association rules as a way to aid goal in the final classification cow for dairy fitness Classificação linear Regras de associação Lógica fuzzy Data mining Linear classification Association rules Fuzzy logic

Search results