• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • Tagged with
  • 6
  • 6
  • 6
  • 5
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Interesting Association Rules Mining Based on Improved Rarity Algorithm

Xiang, Lan January 2018 (has links)
With the rapid development of science and technology, our society has been in the big data era. In human activities, we produce a lot of data in every second and every minute, what contain much information. Then, how to select the useful information from those complicated data is a significant issue. So the association rules mining, a technique of mining patterns or associations between itemsets, comes into being. And this technique aims to find some important associations in data to get useful knowledge. Nowadays, most scholars at home and abroad focus on the frequent pattern mining. However, it is undeniable that the rare pattern mining also plays an important role in many areas, such as the medical, financial, and scientific field. Comparing with frequent pattern mining, studying rare pattern mining is more valuable, because it tends to find unknown, unexpected, and more interesting rules. But the study of rare pattern mining is little difficult because of the scarcity of data used for verifying rules. In the frequent pattern mining, there are two general algorithms of discovering frequent itemsets, i.e., Apriori, the earliest algorithm which is proposed by R.Agrawal in 1994, and FP-Tree, the improved algorithm which reduced the time complexity. And in rare pattern mining, there are also two algorithms, Arima and Rarity, what are similar to Apriori and FP-Tree algorithms, but they still exist some problems, for example, Arima is time-consuming because of repeatedly scanning the large database, and Rarity is space-consuming because of the establishment of the full-combination tree. Therefore, based on the Rarity algorithm, this report presents an improved method to efficiently discover interesting association rules among rare itemsets and aims to get a balance between time and space. It is a top-down strategy which uses the graph structure to indicate all combinations of existing items, defines pattern matrix to record itemsets, and combines the hash table to accelerate calculation process. This method decreases both the time cost and the space cost when comparing with Arima, and reduces the space waste to solve the problem of Rarity, but its searching time of mining rare itemsets is more than Rarity, and we verified the feasibility of this algorithm only on abstract and small databases. Thus in the future, on the one hand, we will continue improving our method to explore how to decrease the searching time in the process and adjust the hash function to optimize the space utilization. And on the other hand, we will apply our method to actual large databases, such as the clinical database of the diabetic patients to mine association rules in diabetic complications.
2

Multi-Purpose Boundary-Based Clustering on Proximity Graphs for Geographical Data Mining

Lee, Ickjai Lee January 2002 (has links)
With the growth of geo-referenced data and the sophistication and complexity of spatial databases, data mining and knowledge discovery techniques become essential tools for successful analysis of large spatial datasets. Spatial clustering is fundamental and central to geographical data mining. It partitions a dataset into smaller homogeneous groups due to spatial proximity. Resulting groups represent geographically interesting patterns of concentrations for which further investigations should be undertaken to find possible causal factors. In this thesis, we propose a spatial-dominant generalization approach that mines multivariate causal associations among geographical data layers using clustering analysis. First, we propose a generic framework of multi-purpose exploratory spatial clustering in the form of the Template-Method Pattern. Based on an object-oriented framework, we design and implement an automatic multi-purpose exploratory spatial clustering tool. The first instance of this framework uses the Delaunay diagram as an underlying proximity graph. Our spatial clustering incorporates the peculiar characteristics of spatial data that make space special. Thus, our method is able to identify high-quality spatial clusters including clusters of arbitrary shapes, clusters of heterogeneous densities, clusters of different sizes, closely located high-density clusters, clusters connected by multiple chains, sparse clusters near to high-density clusters and clusters containing clusters within O(n log n) time. It derives values for parameters from data and thus maximizes user-friendliness. Therefore, our approach minimizes user-oriented bias and constraints that hinder exploratory data analysis and geographical data mining. Sheer volume of spatial data stored in spatial databases is not the only concern. The heterogeneity of datasets is a common issue in data-rich environments, but left open by exploratory tools. Our spatial clustering extends to the Minkowski metric in the absence or presence of obstacles to deal with situations where interactions between spatial objects are not adequately modeled by the Euclidean distance. The genericity is such that our clustering methodology extends to various spatial proximity graphs beyond the default Delaunay diagram. We also investigate an extension of our clustering to higher-dimensional datasets that robustly identify higher-dimensional clusters within O(n log n) time. The versatility of our clustering is further illustrated with its deployment to multi-level clustering. We develop a multi-level clustering method that reveals hierarchical structures hidden in complex datasets within O(n log n) time. We also introduce weighted dendrograms to effectively visualize the cluster hierarchies. Interpretability and usability of clustering results are of great importance. We propose an automatic pattern spotter that reveals high level description of clusters. We develop an effective and efficient cluster polygonization process towards mining causal associations. It automatically approximates shapes of clusters and robustly reveals asymmetric causal associations among data layers. Since it does not require domain-specific concept hierarchies, its applicability is enhanced. / PhD Doctorate
3

Integrating network analysis and data mining techniques into effective framework for Web mining and recommendation : a framework for Web mining and recommendation

Nagi, Mohamad January 2015 (has links)
The main motivation for the study described in this dissertation is to benefit from the development in technology and the huge amount of available data which can be easily captured, stored and maintained electronically. We concentrate on Web usage (i.e., log) mining and Web structure mining. Analysing Web log data will reveal valuable feedback reflecting how effective the current structure of a web site is and to help the owner of a web site in understanding the behaviour of the web site visitors. We developed a framework that integrates statistical analysis, frequent pattern mining, clustering, classification and network construction and analysis. We concentrated on the statistical data related to the visitors and how they surf and pass through the various pages of a given web site to land at some target pages. Further, the frequent pattern mining technique was used to study the relationship between the various pages constituting a given web site. Clustering is used to study the similarity of users and pages. Classification suggests a target class for a given new entity by comparing the characteristics of the new entity to those of the known classes. Network construction and analysis is also employed to identify and investigate the links between the various pages constituting a Web site by constructing a network based on the frequency of access to the Web pages such that pages get linked in the network if they are identified in the result of the frequent pattern mining process as frequently accessed together. The knowledge discovered by analysing a web site and its related data should be considered valuable for online shoppers and commercial web site owners. Benefitting from the outcome of the study, a recommendation system was developed to suggest pages to visitors based on their profiles as compared to similar profiles of other visitors. The conducted experiments using popular datasets demonstrate the applicability and effectiveness of the proposed framework for Web mining and recommendation. As a by product of the proposed method, we demonstrate how it is effective in another domain for feature reduction by concentrating on gene expression data analysis as an application with some interesting results reported in Chapter 5.
4

Método para identificação de perfis de produtos : estudo de caso automobilístico / Method of identification of product profiles : automotive case study

Miguel, Carlos Henrique, 1983- 27 August 2018 (has links)
Orientador: Antônio Batocchio / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Mecânica / Made available in DSpace on 2018-08-27T18:08:13Z (GMT). No. of bitstreams: 1 Miguel_CarlosHenrique_M.pdf: 3528187 bytes, checksum: 165344ab93862eb94649f13d1f4a8626 (MD5) Previous issue date: 2015 / Resumo: O objetivo do trabalho foi elaborar um método de identificação de perfis de produto que representa os grupos de características frequentes do produto nas compras efetuadas por seus clientes. Foi feita uma revisão de literatura sobre quais áreas de gestão são influenciadas pela identificação de perfis de produtos, dentre elas: Planejamento de Demanda, Cadeia de Valor, Cadeia de Suprimentos e Cadeia Logística. Mais especificamente, as subáreas mais afetadas são Entrega de Fornecedores Chaves em base no Just In Time e Sistema de Reposição Contínua. As tecnologias de identificação eletrônica de produtos produzidos em série (e. g. RF ID, código de barras e código QR) são formas de identificar cada venda de produto a ser utilizado pelo método. Dentre as técnicas aplicadas no método, os Conjuntos Fuzzy foram utilizados para categorizar as características quantitativas dos produtos, que passaram a ser a entrada para a Análise de Carrinho de Compras, possibilitando determinar cada perfil de produto através de mineração de dados por regras de associação. O Apriori foi um algoritmo apropriado para realizar a Análise de Carrinho de Compras, pois realiza mineração por regras de associação de conjunto de itens frequentes utilizando as regras de interesse: suporte, confiança e lift. O algoritmo está presente no pacote Arules do programa estatístico R. O pacote ArulesViz, que está presente no programa estatístico R, permite visualizar de forma gráfica os relacionamentos entre os itens do produto. O método foi aplicado a uma base de dados de pesquisa do setor automobilístico, retornando com sucesso os perfis de automóvel frequentes dentre as compras efetuadas pelos clientes / Abstract: This study aimed to prepare a product profile identification method representing the groups of common characteristics of the product in the purchases made by its customers. A literature review was made on which areas of management are influenced by the identification of product profiles, such as: Demand Planning, Value Chain, Supply Chain and Logistic Chain. Specifically the Keys Suppliers Delivery sub-areas based on Just in Time and Continuous Replacement System are the most affected. The electronic identification technologies of products produced in series (e.g. RF ID, barcode and QR code) are ways to identify each product sale to be used by the method. Among the techniques applied in the method, Fuzzy Sets were used to categorize the quantitative characteristics of the products, which are now the entrance to the Market Basket Analysis, allowing to find each product profile through data mining for association rules. The Apriori was an appropriate algorithm to perform Market Basket Analysis, as done by mining association rule set of frequent item sets using the rules of interest: support, confidence and lift. The algorithm is present in Arules package of statistical software R. The ArulesViz package, which is present in the R statistical software, displays graphically the relationships between the items of the product. The method was applied to a research database of the automotive sector successfully returning the frequent car profiles from purchases made by customers / Mestrado / Materiais e Processos de Fabricação / Mestre em Engenharia Mecânica
5

Integrating Network Analysis and Data Mining Techniques into Effective Framework for Web Mining and Recommendation. A Framework for Web Mining and Recommendation

Nagi, Mohamad January 2015 (has links)
The main motivation for the study described in this dissertation is to benefit from the development in technology and the huge amount of available data which can be easily captured, stored and maintained electronically. We concentrate on Web usage (i.e., log) mining and Web structure mining. Analysing Web log data will reveal valuable feedback reflecting how effective the current structure of a web site is and to help the owner of a web site in understanding the behaviour of the web site visitors. We developed a framework that integrates statistical analysis, frequent pattern mining, clustering, classification and network construction and analysis. We concentrated on the statistical data related to the visitors and how they surf and pass through the various pages of a given web site to land at some target pages. Further, the frequent pattern mining technique was used to study the relationship between the various pages constituting a given web site. Clustering is used to study the similarity of users and pages. Classification suggests a target class for a given new entity by comparing the characteristics of the new entity to those of the known classes. Network construction and analysis is also employed to identify and investigate the links between the various pages constituting a Web site by constructing a network based on the frequency of access to the Web pages such that pages get linked in the network if they are identified in the result of the frequent pattern mining process as frequently accessed together. The knowledge discovered by analysing a web site and its related data should be considered valuable for online shoppers and commercial web site owners. Benefitting from the outcome of the study, a recommendation system was developed to suggest pages to visitors based on their profiles as compared to similar profiles of other visitors. The conducted experiments using popular datasets demonstrate the applicability and effectiveness of the proposed framework for Web mining and recommendation. As a by product of the proposed method, we demonstrate how it is effective in another domain for feature reduction by concentrating on gene expression data analysis as an application with some interesting results reported in Chapter 5.
6

OLAP Recommender: Supporting Navigation in Data Cubes Using Association Rule Mining / OLAP Recommender

Koukal, Bohuslav January 2017 (has links)
Manual data exploration in data cubes and searching for potentially interesting and useful information starts to be time-consuming and ineffective from certain volume of the data. In my thesis, I designed, implemented and tested a system, automating the data cube exploration and offering potentially interesting views on OLAP data to the end user. The system is based on integration of two data analytics methods - OLAP analysis data visualisation and data mining, represented by GUHA association rules mining. Another contribution of my work is a research of possibilities how to solve differences between OLAP analysis and association rule mining. Implemented solutions of the differences include data discretization, dimensions commensurability, design of automatic data mining task algorithm based on the data structure and mapping definition between mined association rules and corresponding OLAP visualisation. The system was tested with real retail sales data and with EU structural funds data. The experiments proved that complementary usage of the association rule mining together with OLAP analysis identifies relationships in the data with higher success rate than the isolated use of both techniques.

Page generated in 0.3129 seconds