Global ETD Search

41	Data Mining For Rule Discovery In Relational Databases Toprak, Serkan 01 September 2004 (has links) (PDF) Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering multi-relational association rules in relational databases. The implementation is based on a framework providing a representation of patterns in relational databases, refinement methods of patterns, and primitives for obtaining necessary record counts from database to calculate measures for patterns. The framework exploits meta-data of relational databases for pruning search space of patterns. The implementation extends the framework by employing Apriori algorithm for further pruning the search space and discovering relational recursive patterns. Apriori algorithm is used for finding large itemsets of tables, which are used to refine patterns. Apriori algorithm is modified by changing support calculation method for itemsets. A method for determining recursive relations is described and a solution is provided for handling recursive patterns using aliases. Additionally, continuous attributes of tables are discretized utilizing equal-depth partitioning. The implementation is tested with gene localization prediction task of KDD Cup 2001 and results are compared to those of the winner approach.
42	Otimização computacional e estudo comparativo das técnicas de extração de conhecimento de grandes repositórios de dados. / Comparative study of techniques for extracting knowledge from large data repository. Fernando Luiz Coelho Senra 16 September 2009 (has links) Ao se realizar estudo em qualquer área do conhecimento, quanto mais dados se dispuser, maior a dificuldade de se extrair conhecimento útil deste banco de dados. A finalidade deste trabalho é apresentar algumas ferramentas ditas inteligentes, de extração de conhecimento destes grandes repositórios de dados. Apesar de ter várias conotações, neste trabalho, irá se entender extração de conhecimento dos repositórios de dados a ocorrência combinada de alguns dados com freqüência e confiabilidade que se consideram interessantes, ou seja, na medida e que determinado dado ou conjunto de dados aparece no repositório de dados, em freqüência considerada razoável, outro dado ou conjunto de dados irá aparecer. Executada sobre repositórios de dados referentes a informações georreferenciadas dos alunos da UERJ (Universidade do Estado do Rio de Janeiro), irá se analisar os resultados de duas ferramentas de extração de dados, bem como apresentar possibilidades de otimização computacional destas ferramentas. / Comparative Study of Techniques for Extracting knowledge from large data repositories. When conducting the study in any field of knowledge, the more data is available, the greater the difficulty in extracting useful knowledge from this database. The purpose of this paper is to present some tools called intelligent, knowledge extraction of these large data repositories. Although many connotations, this work will understand knowledge extraction from data repositories on the combined occurrence of some data with frequency and reliability that are considered interesting, ie, the extent and specific data or data set appears in the data, at a rate deemed reasonable, other data or data set will appear. Runs on repositories of data on georeferenced data of students UERJ (Universidade do Estado do Rio de Janeiro), will analyze the results of two tools to extract data and present opportunities for optimization of these computational tools. Engenharia da Computação Apriori Fuzzy Regras de Associação Lógica Nebulosa Computer Engineering Large data repositories Extracting knowledge Fuzzy system ENGENHARIAS
43	Otimização computacional e estudo comparativo das técnicas de extração de conhecimento de grandes repositórios de dados. / Comparative study of techniques for extracting knowledge from large data repository. Fernando Luiz Coelho Senra 16 September 2009 (has links) Ao se realizar estudo em qualquer área do conhecimento, quanto mais dados se dispuser, maior a dificuldade de se extrair conhecimento útil deste banco de dados. A finalidade deste trabalho é apresentar algumas ferramentas ditas inteligentes, de extração de conhecimento destes grandes repositórios de dados. Apesar de ter várias conotações, neste trabalho, irá se entender extração de conhecimento dos repositórios de dados a ocorrência combinada de alguns dados com freqüência e confiabilidade que se consideram interessantes, ou seja, na medida e que determinado dado ou conjunto de dados aparece no repositório de dados, em freqüência considerada razoável, outro dado ou conjunto de dados irá aparecer. Executada sobre repositórios de dados referentes a informações georreferenciadas dos alunos da UERJ (Universidade do Estado do Rio de Janeiro), irá se analisar os resultados de duas ferramentas de extração de dados, bem como apresentar possibilidades de otimização computacional destas ferramentas. / Comparative Study of Techniques for Extracting knowledge from large data repositories. When conducting the study in any field of knowledge, the more data is available, the greater the difficulty in extracting useful knowledge from this database. The purpose of this paper is to present some tools called intelligent, knowledge extraction of these large data repositories. Although many connotations, this work will understand knowledge extraction from data repositories on the combined occurrence of some data with frequency and reliability that are considered interesting, ie, the extent and specific data or data set appears in the data, at a rate deemed reasonable, other data or data set will appear. Runs on repositories of data on georeferenced data of students UERJ (Universidade do Estado do Rio de Janeiro), will analyze the results of two tools to extract data and present opportunities for optimization of these computational tools. Engenharia da Computação Apriori Fuzzy Regras de Associação Lógica Nebulosa Computer Engineering Large data repositories Extracting knowledge Fuzzy system ENGENHARIAS
44	Modul víceúrovňových asociačních pravidel systému pro dolování z dat / Multi-Level Association Rules Module of a Data Mining System Pospíšil, Jan January 2010 (has links) This thesis focuses on the problematics of implementing a multilevel association rules mining module, for existing data mining project. There are two main algorithms explained, Apriori and MLT2L1. The thesis continues with the datamining module implementation, as well as the DMSL elements design. In the final chapters deal with an example dataminig task and its result comparison as well as the whole thesis achievement description.
45	Evaluation of parametric CAD models from a manufacturing perspective to aid simulation driven design Satish Prabhu, Nachiketh, Sarapady, Ranjan Tunga January 2019 (has links) Scania are known among to be the world’s leading supplier of transport solutions for heavy trucks and buses. Scania’s goal is to develop combustion engines that achieve low-pollutant emissions as well as lower carbon-footprint with higher efficiency. To achieve the above Scania has invested resources in Simulation Driven Design of parametric CAD models which drives design innovation rather than following the design. This enables in creating flexible and robust models in their design process. This master thesis is being conducted in collaboration with Scania exhaust after treatment systems department, focusing on developing a methodology to automatically evaluate the cost and manufacturability of a parametric model, which is intended for an agile working environment with fast iterations within Scania. From the thesis methodology’s data collection process literature study, former thesis work and interviews with designers and cost engineers at Scania, a proposed method is developed that can be implemented during the design process. The method involved four different phase they are Design phase, Analysis phase, Validation phase and Improvement phase. The proposed method is evaluated to check the method feasibility for evaluation on parameteric CAD parts for manufacturability and costing. This proposed method is applied on two different parts of a silencer as part of a case study which is mainly to evaluate the results from Improvement phase. The focus of this thesis is to realise the proposed method through simulation software like sheet metal stamping/forming simulation, cost evaluating tool where the simulation driven design process is achieved. This is done with the help of collaboration between parameteric CAD models and the above simulation software under a common MDO framework through DOE study run or optimisation study runs. The resultant designs is later considered to be improved design in terms of manufacturability and costing. Simulation Driven Design Parameteric CAD models CAD DOE MDO InspireForm Apriori CATIA V5 Other Mechanical Engineering Annan maskinteknik
46	Integrace Business Inteligence nástrojů do IS / Integration of Business Intelligence Tools into IS Novák, Josef January 2009 (has links) This Master's Thesis deals with the integration of Business Intelligence tools into an information system. There are concepts of BI, data warehouses, the OLAP analysis introduced as well as the knowledge discovery from databases, especially the association rule mining. In the chapters focused on practical part of the thesis, the design and implementation of resultant application are depicted. There are also the applied technologies like i.e. Microsoft SQL Server 2005 described.
47	Development of a data-driven marketing strategy for an online pharmacy Holmér, Gelaye Worku, Gamage, Ishara H. January 2022 (has links) The term electronic commerce (e-commerce) refers to a business model that allows companies and individuals to buy and sell goods and services over the internet. The focus of this thesis is on online pharmacies, a segment of the ecommerce market. Even though internet pharmacies are still subject to the same stringent rules imposed on pharmacies that limit the scope for their market growth, it has shown a notable increase in the past decades. The main goal of this thesis is to develop a data-driven marketing strategy based on a Swedish based online pharmacy’s daily sales data. The methodology of the data analysis includes exploratory data analysis (EDA) and market basket analysis (MBA) using the Apriori algorithm and the application of marketing frameworks and theories from a data-driven standpoint. In addition to the data analysis, this paper proposes a conceptual framework of a digital marketing strategy based on the RACE framework (reach, act, convert, and engage). The result of the analysis has led to the following data-driven marketing strategy: Special attention should be paid to association rules with a high lift ration value; high gross profit margin percentile (GPMP) products should have a volume-based marketing strategy that focuses on lower prices on subsequent items; and price bundling is the best marketing strategy for low GPMP products. Some of the practical ideas mentioned in this thesis paper include optimizing keyword search for a high GPMP product type and sending reminder emails and push alerts to avoid cart abandonment. The findings and recommendations presented in this thesis can be used by online pharmacies to extract knowledge that may support several decisions ranging from raising overall order size, marketing campaigns, to increasing the sales of products with a high gross profit margin. Online pharmacies data-driven marketing strategies market basket analysis exploratory analysis Apriori algorithm Computer and Information Sciences Data- och informationsvetenskap
48	以雲端運算之概念建構資料採礦中關聯規則與集群分析系統 / Construct a concept of cloud computing and data mining system with association rules and clustering analysis 賴建佑 Unknown Date (has links) 雲端運算和資料採礦已成為這二十一世紀的重要發展方向，綜觀現今各個生活層面，已漸漸的融合雲端計算的技術，故結合雲端運算已是一種趨勢。簡而談之，雲端運算是一種讓使用者更加地快速、便利又省成本的一種技術。而資料採礦方面，也已從先前的專門挖掘數字型態的資料，到現在多元的挖掘，像是文字、圖像採礦。資料採礦雖然比雲端運算發展的早，但是其功用是可以相輔相成的，有鑑於此，本研究係要發展出一資料採礦分析系統，使得使用者方便又簡易的操作。並針對特定的資料採礦分析方法-關聯規則及集群分析去研究，並利用Apriori 演算法及K-means方法，和Microsoft Excel VBA和R軟體共同結合出此資料採礦系統。雲端運算資料採礦關聯規則 Apriori演算法集群分析 K-means
49	A New Hybrid Multi-relational Data Mining Technique Daglar Toprak, Seda 01 July 2005 (has links) (PDF) Multi-relational learning has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. As patterns involve multiple relations, the search space of possible hypotheses becomes intractably complex. Many relational knowledge discovery systems have been developed employing various search strategies, search heuristics and pattern language limitations in order to cope with the complexity of hypothesis space. In this work, we propose a relational concept learning technique, which adopts concept descriptions as associations between the concept and the preconditions to this concept and employs a relational upgrade of association rule mining search heuristic, APRIORI rule, to effectively prune the search space. The proposed system is a hybrid predictive inductive logic system, which utilizes inverse resolution for generalization of concept instances in the presence of background knowledge and refines these general patterns into frequent and strong concept definitions with a modified APRIORI-based specialization operator. Two versions of the system are tested for three real-world learning problems: learning a linearly recursive relation, predicting carcinogenicity of molecules within Predictive Toxicology Evaluation (PTE) challenge and mesh design. Results of the experiments show that the proposed hybrid method is competitive with state-of-the-art systems. ZA Databases 4450-4460
50	Aplicação do processo de descoberta de conhecimento em dados do poder judiciário do estado do Rio Grande do Sul / Applying the Knowledge Discovery in Database (KDD) Process to Data of the Judiciary Power of Rio Grande do Sul Schneider, Luís Felipe January 2003 (has links) Para explorar as relações existentes entre os dados abriu-se espaço para a procura de conhecimento e informações úteis não conhecidas, a partir de grandes conjuntos de dados armazenados. A este campo deu-se o nome de Descoberta de Conhecimento em Base de Dados (DCBD), o qual foi formalizado em 1989. O DCBD é composto por um processo de etapas ou fases, de natureza iterativa e interativa. Este trabalho baseou-se na metodologia CRISP-DM . Independente da metodologia empregada, este processo tem uma fase que pode ser considerada o núcleo da DCBD, a “mineração de dados” (ou modelagem conforme CRISP-DM), a qual está associado o conceito “classe de tipo de problema”, bem como as técnicas e algoritmos que podem ser empregados em uma aplicação de DCBD. Destacaremos as classes associação e agrupamento, as técnicas associadas a estas classes, e os algoritmos Apriori e K-médias. Toda esta contextualização estará compreendida na ferramenta de mineração de dados escolhida, Weka (Waikato Environment for Knowledge Analysis). O plano de pesquisa está centrado em aplicar o processo de DCBD no Poder Judiciário no que se refere a sua atividade fim, julgamentos de processos, procurando por descobertas a partir da influência da classificação processual em relação à incidência de processos, ao tempo de tramitação, aos tipos de sentenças proferidas e a presença da audiência. Também, será explorada a procura por perfis de réus, nos processos criminais, segundo características como sexo, estado civil, grau de instrução, profissão e raça. O trabalho apresenta nos capítulos 2 e 3 o embasamento teórico de DCBC, detalhando a metodologia CRISP-DM. No capítulo 4 explora-se toda a aplicação realizada nos dados do Poder Judiciário e por fim, no capítulo 5, são apresentadas as conclusões. / With the purpose of exploring existing connections among data, a space has been created for the search of Knowledge an useful unknown information based on large sets of stored data. This field was dubbed Knowledge Discovery in Databases (KDD) and it was formalized in 1989. The KDD consists of a process made up of iterative and interactive stages or phases. This work was based on the CRISP-DM methodology. Regardless of the methodology used, this process features a phase that may be considered as the nucleus of KDD, the “data mining” (or modeling according to CRISP-DM) which is associated with the task, as well as the techniques and algorithms that may be employed in an application of KDD. What will be highlighted in this study is affinity grouping and clustering, techniques associated with these tasks and Apriori and K-means algorithms. All this contextualization will be embodied in the selected data mining tool, Weka (Waikato Environment for Knowledge Analysis). The research plan focuses on the application of the KDD process in the Judiciary Power regarding its related activity, court proceedings, seeking findings based on the influence of the procedural classification concerning the incidence of proceedings, the proceduring time, the kind of sentences pronounced and hearing attendance. Also, the search for defendants’ profiles in criminal proceedings such as sex, marital status, education background, professional and race. In chapters 2 and 3, the study presents the theoretical grounds of KDD, explaining the CRISP-DM methodology. Chapter 4 explores all the application preformed in the data of the Judiciary Power, and lastly, in Chapter conclusions are drawn Banco : Dados Descoberta : Conhecimento Mineracao : Dados Armazem : Dados Informática jurídica Knowledge Discovery in Databases (KDD) Data mining Clustering Affinity grouping Apriori K-means Weka Judiciary power Procedural Classification

Search results