Global ETD Search

71	資料挖掘在房地產價格上之運用 / Data Mining Technique with an Application to the Real Estate Price Estimation 高健維 Unknown Date (has links) 在現今資訊潮流中，企業的龐大資料庫可藉由統計及人工智慧的科學技術尋找出有價值的隱藏事件。利用資料做深入分析，找出其中的知識，並根據企業的問題，建立不同的模型，進而提供企業進行決策時的參考依據。資料挖掘的工作是近年來資料庫應用領域中相當熱門的議題。它雖是個神奇又時髦的技術，卻不是一門創新的學問。美國政府在第二次世界大戰前，就於人口普查以及軍事方面使用資料挖掘的分析方法。隨著資訊科技的進展，新工具的出現，以及網路通訊技術的發展，常常能超越歸納範圍的關係來執行資料挖掘，而由資料堆中挖掘寶藏，使資料挖掘成為企業智慧的一部份。在本篇論文當中，將資料挖掘技術中的關聯法則 ( Association Rule ) 運用至房地產的價格分析，進而提供有效的關聯法則，對於複雜之房價與週邊環境因素作一整合探討。購屋者將有一適當依循的投資計畫，房產業者亦可針對適當的族群做出適當的銷售企畫。 / At this technological stream of time, it is able to extract the value of corporations’ large data sets by applying the knowledge of statistics and the scientific techniques from artificial intelligence. Through the use of these algorithms, the database will be analyzed and its knowledge will be generated. In addition to these, data models will be sorted by different corporation issues resulting in the reference for any strategic decision processes. More advantages are the predictions of future events and how much public is willing to contribute and feedback to new products or promotions. The probability of outcomes will be helpful as references since this information is referable to ensure companies providing quality services at the right time. In another words, companies will have clues in attempts to understand and familiarize their customers’ needs, wants and behaviors, as a result of delivering best services for customers’ satisfactions. Data mining is such a new knowledge that is commonly discussed in the field of database applications. Although it is a relatively new term, the technology is not exactly due to the analysis methods used. Before World War II, the analysis techniques were used in particular to the statistics in census or cases related to military affairs by the US government. Knowledge discovery has been one part of business intelligence in current corporations because these new techniques are inherently geared towards explicit information, rather than just simple analysis. By applying association rules from knowledge discovery technology, this dissertation will provide a discussion of price estimation in real estates. This discussion is involved in investigations into diverse housing prices resulting from the factors of surrounding environment. By referring to this association rule, buyers will acquire information about investment plans while housing agents will gain knowledge for their plans or projects in particular to their target markets. 資料挖掘 Apriori演算法關聯法則複合維度關聯法則 data mining Apriori algorithm association rules multi-dimensional association rules
72	Fuzzy GUHA / Fuzzy GUHA Ralbovský, Martin January 2006 (has links) The GUHA method is one of the oldest methods of exploratory data analysis, which is regarded as part of the data mining or knowledge discovery in databases (KDD) scienti_c area. Unlike many other methods of data mining, the GUHA method has firm theoretical foundations in logic and statistics. In scope of the method, finding interesting knowledge corresponds to finding special formulas in satisfactory rich logical calculus, which is called observational calculus. The main topic of the thesis is application of the "fuzzy paradigm" to the GUHA method By the term "fuzzy paradigm" we mean approaches that use many-valued membership degrees or truth values, namely fuzzy set theory and fuzzy logic. The thesis does not aim to cover all the aspects of this application, it emphasises mainly on: - Association rules as the most prevalent type of formulas mined by the GUHA method - Usage of fuzzy data - Logical aspects of fuzzy association rules mining - Comparison of the GUHA theory to the mainstream fuzzy association rules - Implementation of the theory using the bit string approach The thesis throughoutly elaborates the theory of fuzzy association rules, both using the theoretical apparatus of fuzzy set theory and fuzzy logic. Fuzzy set theory is used mainly to compare the GUHA method to existing mainstream approaches to formalize fuzzy association rules, which were studied in detail. Fuzzy logic is used to define novel class of logical calculi called logical calculi of fuzzy association rules (LCFAR) for logical representation of fuzzy association rules. The problem of existence of deduction rules in LCFAR is dealt in depth. Suitable part of the proposed theory is implemented in the Ferda system using the bit string approach. In the approach, characteristics of examined objects are represented as strings of bits, which in the crisp case enables efficient computation. In order to maintain this feature also in the fuzzy case, a profound low level testing of data structures and algoritms for fuzzy bit strings have been carried out as a part of the thesis.
73	Association Rule Based Classification Palanisamy, Senthil Kumar 03 May 2006 (has links) In this thesis, we focused on the construction of classification models based on association rules. Although association rules have been predominantly used for data exploration and description, the interest in using them for prediction has rapidly increased in the data mining community. In order to mine only rules that can be used for classification, we modified the well known association rule mining algorithm Apriori to handle user-defined input constraints. We considered constraints that require the presence/absence of particular items, or that limit the number of items, in the antecedents and/or the consequents of the rules. We developed a characterization of those itemsets that will potentially form rules that satisfy the given constraints. This characterization allows us to prune during itemset construction itemsets such that neither they nor any of their supersets will form valid rules. This improves the time performance of itemset construction. Using this characterization, we implemented a classification system based on association rules and compared the performance of several model construction methods, including CBA, and several model deployment modes to make predictions. Although the data mining community has dealt only with the classification of single-valued attributes, there are several domains in which the classification target is set-valued. Hence, we enhanced our classification system with a novel approach to handle the prediction of set-valued class attributes. Since the traditional classification accuracy measure is inappropriate in this context, we developed an evaluation method for set-valued classification based on the E-Measure. Furthermore, we enhanced our algorithm by not relying on the typical support/confidence framework, and instead mining for the best possible rules above a user-defined minimum confidence and within a desired range for the number of rules. This avoids long mining times that might produce large collections of rules with low predictive power. For this purpose, we developed a heuristic function to determine an initial minimum support and then adjusted it using a binary search strategy until a number of rules within the given range was obtained. We implemented all of our techniques described above in WEKA, an open source suite of machine learning algorithms. We used several datasets from the UCI Machine Learning Repository to test and evaluate our techniques. Itemset Pruning Association Rules Adaptive Minimal Support Associative Classification Classification Association rule mining Data mining Classification Data processing
74	DARM: Distance-Based Association Rule Mining Icev, Aleksandar 06 May 2003 (has links) The main goal of this thesis work was to develop, implement and evaluate an algorithm that enables mining association rules from datasets that contain quantified distance information among the items. This was accomplished by extending and enhancing the Apriori Algorithm, which is the standard algorithm to mine association rules. The Apriori algorithm is not able to mine association rules that contain distance information among the items that construct the rules. This thesis enhances the main Apriori property by requiring itemsets forming rules to“deviate properly" in addition to satisfying the minimal support threshold. We say that an itemset deviates properly if all combinations of pair-wise distances among the items are highly conserved in the dataset instances where these items occur. This thesis introduces the notion of proper deviation and provides the precise procedure and measures that characterize it. Integrating the notion of distance preserving frequent itemset and proper deviation into the standard Apriori algorithm leads to the construction of our Distance-Based Association Rule Mining (DARM) algorithm. DARM can be applied in data mining and knowledge discovery from genetic, financial, retail, time sequence data, or any domain where the distance information between items is of importance. This thesis chose the area of gene expression and regulation in eukaryotic organisms as the application domain. The data from the domain was used to produce DARM rules. Sets of those rules were used for building predictive models. The accuracy of those models was tested. In addition, predictive accuracies of the models built with and without distance information were compared. spatial data mining distance-based association rules distance-based Apriori algorithm Data mining Gene expression Data processing Eukaryotic cells
75	Association Rule Mining for Collaborative Recommender Systems Lin, Weiyang 15 May 2000 (has links) This thesis provides a novel approach to using data mining for e-commerce. The focus of our work is to apply association rule mining to collaborative recommender systems, which recommend articles to a user on the basis of other users' ratings for these articles as well as the similarities between this user's and other users' tastes. In this work, we propose a new algorithm for association rule mining specially tailored for use in collaborative recommendation. We make recommendations by exploring associations between users, associations between articles, and a combination of the two. We experimentally evaluated our approach on real data for many different parameter settings and compared its performance with that of other approaches under similar experimental conditions. Through our analysis and experiments, we have found that association rules are quite appropriate for collaborative recommendation domains and that they can achieve a performance that is comparable to current state of the art in recommender systems research. Data Mining Association Rules Electronic Commerce Collaborative Recommender Systems Association marketing Internet marketing Electronic commerce Recommender systems (Computer science)
76	Une approche de recherche d'images basée sur la sémantique et les descripteurs visuels / An Image Retrieval approach based on semantics and visual features Allani Atig, Olfa 27 June 2017 (has links) La recherche d’image est une thématique de recherche très active. Plusieurs approches permettant d'établir un lien entre les descripteurs de bas niveau et la sémantique ont été proposées. Parmi celles-là, nous citons la reconnaissance d'objets, les ontologies et le bouclage de pertinence. Cependant, leur limitation majeure est la haute dépendance d’une ressource externe et l'incapacité à combiner efficacement l'information visuelle et sémantique. Cette thèse propose un système basé sur un graphe de patrons, la sélection ciblée des descripteurs pour la phase en ligne et l'amélioration de la visualisation des résultats. L'idée est de (1) construire un graphe de patrons composé d'une ontologie modulaire et d'un modèle basé graphe pour l'organisation de l'information sémantique, (2) de construire un ensemble de collections de descripteurs pour guider la sélection des descripteurs à appliquer durant la recherche et (3) améliorer la visualisation des résultats en intégrant les relations sémantiques déduite du graphe de patrons.Durant la construction de graphe de patrons, les modules ontologiques associés à chaque domaine sont automatiquement construits. Le graphe de régions résume l'information visuelle en un format plus condensé et la classifie selon son domaine. Le graphe de patrons est déduit par composition de modules ontologiques. Notre système a été testé sur trois bases d’images. Les résultats obtenus montrent une amélioration au niveau du processus de recherche, une meilleure adaptation des descripteurs visuels utilisés aux domaines couverts et une meilleure visualisation des résultats qui diminue le niveau d’abstraction par rapport à leur logique de génération. / Image retrieval is a very active search area. Several image retrieval approaches that allow mapping between low-level features and high-level semantics have been proposed. Among these, one can cite object recognition, ontologies, and relevance feedback. However, their main limitation concern their high dependence on reliable external resources and lack of capacity to combine semantic and visual information.This thesis proposes a system based on a pattern graph combining semantic and visual features, relevant visual feature selection for image retrieval and improvement of results visualization. The idea is (1) build a pattern graph composed of a modular ontology and a graph-based model, (2) to build visual feature collections to guide feature selection during online retrieval phase and (3) improve the retrieval results visualization with the integration of semantic relations.During the pattern graph building, ontology modules associated to each domain are automatically built using textual corpuses and external resources. The region's graphs summarize the visual information in a condensed form and classify it given its semantics. The pattern graph is obtained using modules composition. In visual features collections building, association rules are used to deduce the best practices on visual features use for image retrieval. Finally, results visualization uses the rich information on images to improve the results presentation.Our system has been tested on three image databases. The results show an improvement in the research process, a better adaptation of the visual features to the domains and a richer visualization of the results. Recherche d’images Ontologie Ontologie modulaire Patron Graphe Règles d’association Image retrieval Ontology Modular ontology Pat- tern Graph Association rules
77	Classificação linear de bovinos: criação de um modelo de decisão baseado na conformação de tipo “true type” como auxiliar a tomada de decisão na seleção de bovinos leiteiros Sousa, Rogério Pereira de 29 August 2016 (has links) Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-11-01T15:54:48Z No. of bitstreams: 1 Rogério Pereira de Sousa_.pdf: 946780 bytes, checksum: ceb6c981273e15ecc58fe661bd02a34a (MD5) / Made available in DSpace on 2016-11-01T15:54:48Z (GMT). No. of bitstreams: 1 Rogério Pereira de Sousa_.pdf: 946780 bytes, checksum: ceb6c981273e15ecc58fe661bd02a34a (MD5) Previous issue date: 2016-08-29 / IFTO - Instituto Federal de Educação, Ciência e Tecnologia do Tocantins / A seleção de bovinos leiteiros, através da utilização do sistema de classificação com características lineares de tipo, reflete no ganho de produção, na vida produtiva do animal, na padronização do rebanho, entre outros. Esta pesquisa operacional obteve suas informações através de pesquisas bibliográficas e análise de base de dados de classificações reais. O presente estudo, objetivou a geração de um modelo de classificação de bovinos leiteiros baseado em “true type”, para auxiliar os avaliadores no processamento e análise dos dados, ajudando na tomada de decisão quanto a seleção da vaca para aptidão leiteira, tornando os dados seguros para futuras consultas. Nesta pesquisa, aplica-se métodos computacionais à classificação de vacas leiteiras mediante a utilização mineração de dados e lógica fuzzy. Para tanto, realizou-se a análise em uma base de dado com 144 registros de animais classificados entre as categorias boa e excelente. A análise ocorreu com a utilização da ferramenta WEKA para extração de regras de associação com o algoritmo apriori, utilizando como métricas objetivas, suporte / confiança, e lift para determinar o grau de dependência da regra. Para criação do modelo de decisão com lógica fuzzy, fez-se uso da ferramenta R utilizando o pacote sets. Por meio dos resultados obtidos na mineração de regras, foi possível identificar regras relevantes ao modelo de classificação com confiança acima de 90%, indicando que as características avaliadas (antecedente) implicam em outras características (consequente), com uma confiança alta. Quanto aos resultados obtidos pelo modelo de decisão fuzzy, observa-se que, o modelo de classificação baseado em avaliações subjetivas fica suscetível a erros de classificação, sugerindo então o uso de resultados obtidos por regras de associação como forma de auxílio objetivo na classificação final da vaca para aptidão leiteira. / The selection of dairy cattle through the use of the rating system with linear type traits, reflected in increased production, the productive life of the animal, the standardization of the flock, among others. This operational research obtained their information through library research and basic analysis of actual ratings data. This study aimed to generate a dairy cattle classification model based on "true type" to assist the evaluators in the processing and analysis of data, helping in decision making and the selection of the cow to milk fitness, making the data safe for future reference. In this research, applies computational methods to the classification of dairy cows by using data mining and fuzzy logic. Therefore, we conducted the analysis on a data base with 144 animals records classified between good and excellent categories. Analysis is made with the use of WEKA tool for extraction of association rules with Apriori algorithm, using as objective metrics, support / confidence and lift to determine the degree of dependency rule. To create the decision model with fuzzy logic, it was made use of R using the tool sets package. Through the results obtained in the mining rules, it was possible to identify the relevant rules with confidence classification model above 90%, indicating that the characteristics assessed (antecedent) imply other characteristics (consequent), with a high confidence. As for the results obtained by the fuzzy decision model, it is observed that the classification model based on subjective assessments is susceptible to misclassification, suggesting then the use of results obtained by association rules as a way to aid goal in the final classification cow for dairy fitness Classificação linear Regras de associação Lógica fuzzy Data mining Linear classification Association rules Fuzzy logic
78	Interestingness Measures for Association Rules in a KDD Process : PostProcessing of Rules with ARQAT Tool Huynh, Xuan-Hiep 07 December 2006 (has links) (PDF) This work takes place in the framework of Knowledge Discovery in Databases (KDD), often called "Data Mining". This domain is both a main research topic and an application ¯eld in companies. KDD aims at discovering previously unknown and useful knowledge in large databases. In the last decade many researches have been published about association rules, which are frequently used in data mining. Association rules, which are implicative tendencies in data, have the advantage to be an unsupervised model. But, in counter part, they often deliver a large number of rules. As a consequence, a postprocessing task is required by the user to help him understand the results. One way to reduce the number of rules - to validate or to select the most interesting ones - is to use interestingness measures adapted to both his/her goals and the dataset studied. Selecting the right interestingness measures is an open problem in KDD. A lot of measures have been proposed to extract the knowledge from large databases and many authors have introduced the interestingness properties for selecting a suitable measure for a given application. Some measures are adequate for some applications but the others are not. In our thesis, we propose to study the set of interestingness measure available in the literature, in order to evaluate their behavior according to the nature of data and the preferences of the user. The ¯nal objective is to guide the user's choice towards the measures best adapted to its needs and in ¯ne to select the most interesting rules. For this purpose, we propose a new approach implemented in a new tool, ARQAT (Association Rule Quality Analysis Tool), in order to facilitate the analysis of the behavior about 40 interest- ingness measures. In addition to elementary statistics, the tool allows a thorough analysis of the correlations between measures using correlation graphs based on the coe±cients suggested by Pear- son, Spearman and Kendall. These graphs are also used to identify the clusters of similar measures. Moreover, we proposed a series of comparative studies on the correlations between interestingness measures on several datasets. We discovered a set of correlations not very sensitive to the nature of the data used, and which we called stable correlations. Finally, 14 graphical and complementary views structured on 5 levels of analysis: ruleset anal- ysis, correlation and clustering analysis, most interesting rules analysis, sensitivity analysis, and comparative analysis are illustrated in order to show the interest of both the exploratory approach and the use of complementary views. [INFO] Computer Science Knowledge Discovery in Databases (KDD) interestingness measures postprocessing of association rules clustering correlation graph stability analysis
79	Mining Association Rules For Quality Related Data In An Electronics Company Kilinc, Yasemin 01 March 2009 (has links) (PDF) Quality has become a central concern as it has been observed that reducing defects will lower the cost of production. Hence, companies generate and store vast amounts of quality related data. Analysis of this data is critical in order to understand the quality problems and their causes, and to take preventive actions. In this thesis, we propose a methodology for this analysis based on one of the data mining techniques, association rules. The methodology is applied for quality related data of an electronics company. Apriori algorithm used in this application generates an excessively large number of rules most of which are redundant. Therefore we implement a three phase elimination process on the generated rules to come up with a reasonably small set of interesting rules. The approach is applied for two different data sets of the company, one for production defects and one for raw material non-conformities. We then validate the resultant rules using a test data set for each problem type and analyze the final set of rules.
80	An Improved Organization Method For Association Rules And A Basis For Comparison Of Methods Jabarnejad, Masood 01 June 2010 (has links) (PDF) In large data, set of mined association rules are typically large in number and hard to interpret. Some grouping and pruning methods have been developed to make rules more understandable. In this study, one of these methods is modified to be more effective and more efficient in applications including low thresholds for support or confidence, such as association analysis of product/process quality improvement. Results of experiments on benchmark datasets show that the proposed method groups and prunes more rules. In the literature, many rule reduction methods, including grouping and pruning methods, have been proposed for different applications. The variety in methods makes it hard to select the right method for applications such those of quality improvement. In this study a novel performance comparison basis is introduced to address this problem. It is applied here to compare the improved method to the original one. The introduced basis is tailored for quality data, but is flexible and can be changed to be applicable in other application domains.

Search results