Global ETD Search

1	arules - A Computational Environment for Mining Association Rules and Frequent Item Sets Hornik, Kurt, Grün, Bettina, Hahsler, Michael January 2005 (has links) (PDF) Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules. (authors' abstract)
2	A computational environment for mining association rules and frequent item sets Hahsler, Michael, Grün, Bettina, Hornik, Kurt January 2005 (has links) (PDF) Mining frequent itemsets and association rules is a popular and well researched approach to discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
3	Εξόρυξη και διαχείριση κανόνων συσχέτισης με χρήση τεχνικών ανάκτησης πληροφορίας Βαρσάμης, Θεόδωρος 11 June 2013 (has links) Σε έναν κόσμο που κατακλύζεται από δεδομένα, καθίσταται αναγκαία η αποδοτική οργάνωσή τους και η μετέπειτα επεξεργασία τους, με σκοπό την εύρεση και την ανάκτηση πληροφορίας για λήψη αποφάσεων. Στα πλαίσια της προσπάθειας αυτής έχουν δημοσιευθεί διάφορες μελέτες που στοχεύουν στην ανεύρεση σχέσεων μεταξύ των δεδομένων, οι οποίες μπορούν να αναδείξουν άγνωστες μέχρι πρότινος εξαρτήσεις και να επιτρέψουν την πρόγνωση και την πρόβλεψη μελλοντικών αποτελεσμάτων και αποφάσεων. Στην εργασία αυτή μελετάμε τους πιο διαδεδομένους αλγορίθμους εύρεσης κανόνων συσχετίσεων και ακολούθως προτείνουμε ένα σχήμα που χρησιμοποιεί ως βασική δομή για την ανάκτηση πληροφορίας από βάσεις δεδομένων συναλλαγών τα αντεστραμμένα αρχεία. Στόχος μας είναι η εύκολη παραγωγή κανόνων συσχέτισης αντικειμένων, βασιζόμενη στην αποδοτική αποθήκευση και ανάκτηση των Συχνών Συνόλων Αντικειμένων (Frequent Itemsets). Αρχικά επικεντρωνόμαστε στον τρόπο εύρεσης και αποθήκευσης ενός ελάχιστου συνόλου συναλλαγών, εκμεταλλευόμενοι την πληροφορία που εμπεριέχουν τα Κλειστά Συχνά Σύνολα Αντικειμένων (Closed Frequent Itemsets) και τα Μέγιστα Συχνά Σύνολα Αντικειμένων (Maximum Frequent Itemsets). Στη συνέχεια, αξιοποιώντας την αποθηκευμένη πληροφορία στα MFI και με ελάχιστο υπολογιστικό κόστος, προτείνουμε τον αλγόριθμο MFI-drive που απαντάει σε ερωτήματα εύρεσης υπερσυνόλου και υποσυνόλου αντικειμένων, καθώς και συνόλων αντικειμένων με προκαθορισμένο βαθμό ομοιότητας σε σχέση με ένα δεδομένο σύνολο. / -- Εξόρυξη πληροφορίας Ανεστραμμένα αρχεία 006.312 Data mining Closed frequent itemsets Maximum frequent itemsets Invert index files MFI-drive
4	[pt] MINERAÇÃO DE ITENS FREQUENTES EM SEQUÊNCIAS DE DADOS: UMA IMPLEMENTAÇÃO EFICIENTE USANDO VETORES DE BITS / [en] MINING FREQUENT ITEMSETS IN DATA STREAMS: AN EFFICIENT IMPLEMENTATION USING BIT VECTORS FRANKLIN ANDERSON DE AMORIM 11 February 2016 (has links) [pt] A mineração de conjuntos de itens frequentes em sequências de dados possui diversas aplicações práticas como, por exemplo, análise de comportamento de usuários, teste de software e pesquisa de mercado. Contudo, a grande quantidade de dados gerada pode representar um obstáculo para o processamento dos mesmos em tempo real e, consequentemente, na sua análise e tomada de decisão. Sendo assim, melhorias na eficiência dos algoritmos usados para estes fins podem trazer grandes benefícios para os sistemas que deles dependem. Esta dissertação apresenta o algoritmo MFI-TransSWmais, uma versão otimizada do algoritmo MFI-TransSW, que utiliza vetores de bits para processar sequências de dados em tempo real. Além disso, a dissertação descreve a implementação de um sistema de recomendação de matérias jornalísticas, chamado ClickRec, baseado no MFI-TransSWmais, para demonstrar o uso da nova versão do algoritmo. Por último, a dissertação descreve experimentos com dados reais e apresenta resultados da comparação de performance dos dois algoritmos e dos acertos do sistema de recomendações ClickRec. / [en] The mining of frequent itemsets in data streams has several practical applications, such as user behavior analysis, software testing and market research. Nevertheless, the massive amount of data generated may pose an obstacle to processing then in real time and, consequently, in their analysis and decision making. Thus, improvements in the efficiency of the algorithms used for these purposes may bring great benefits for systems that depend on them. This thesis presents the MFI-TransSWplus algorithm, an optimized version of MFI-TransSW algorithm, which uses bit vectors to process data streams in real time. In addition, this thesis describes the implementation of a news articles recommendation system, called ClickRec, based on the MFI-TransSWplus, to demonstrate the use of the new version of the algorithm. Finally, the thesis describes experiments with real data and presents results of performance and a comparison between the two algorithms in terms of performance and the hit rate of the ClickRec recommendation system. [pt] MINERACAO DE DADOS [pt] CONJUNTOS DE ITENS FREQUENTES [pt] SEQUENCIAS DE DADOS [en] DATA MINING [en] FREQUENT ITEMSETS [en] DATASTREAM
5	Efficient Frequent Closed Itemset Algorithms With Applications To Stream Mining And Classification Ranganath, B N 09 1900 (has links) Data mining is an area to find valid, novel, potentially useful, and ultimately understandable abstractions in a data. Frequent itemset mining is one of the important data mining approaches to find those abstractions in the form of patterns. Frequent Closed itemsets provide complete and condensed information for non-redundant association rules generation. For many applications mining all the frequent itemsets is not necessary, and mining frequent Closed itemsets are adequate. Compared to frequent itemset mining, frequent Closed itemset mining generates less number of itemsets, and therefore improves the efficiency and effectiveness of these tasks. Recently, much research has been done on Closed itemsets mining, but it is mainly for traditional databases where multiple scans are needed, and whenever new transactions arrive, additional scans must be performed on the updated transaction database; therefore, they are not suitable for data stream mining. Mining frequent itemsets from data streams has many potential and broad applications. Some of the emerging applications of data streams that require association rule mining are network traffic monitoring and web click streams analysis. Different from data in traditional static databases, data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. Recent works on data stream mining based on sliding window method slide the window by one transaction at a time. But when the window size is large and support threshold is low, the existing methods consume significant time and lead to a large increase in user response time. In our first work, we propose a novel algorithm Stream-Close based on sliding window model to mine frequent Closed itemsets from the data streams within the current sliding window. We enhance the scalabality of the algorithm by introducing several optimization techniques such as sliding the window by multiple transactions at a time and novel pruning techniques which lead to a considerable reduction in the number of candidate itemsets to be examined for closure checking. Our experimental studies show that the proposed algorithm scales well with large data sets. Still the notion of frequent closed itemsets generates a huge number of closed itemsets in some applications. This drawback makes frequent closed itemsets mining infeasible in many applications since users cannot interpret the large volume of output (which sometimes will be greater than the data itself when support threshold is low) and may lead to an overhead to develop extra applications which post processes the output of original algorithm to reduce the size of the output. Recent work on clustering of itemsets considers strictly either expression(consists of items present in itemset) or support of the itemsets or partially both to reduce the number of itemsets. But the drawback of the above approaches is that in some situations, number of itemsets does not reduce due to their restricted view of either considering expressions or support. So we propose a new notion of frequent itemsets called clustered itemsets which considers both expressions and support of the itemsets in summarizing the output. We introduce a new distance measure w.r.t expressions and also prove the problem of mining clustered itemsets to be NP-hard. In our second work, we propose a deterministic locality sensitive hashing based classifier using clustered itemsets. Locality sensitive hashing(LSH)is a technique for efficiently finding a nearest neighbour in high dimensional data sets. The idea of locality sensitive hashing is to hash the points using several hash functions to ensure that for each function the probability of collision is much higher for objects that are close to each other than those that are far apart. We propose a LSH based approximate nearest neighbour classification strategy. But the problem with LSH is, it randomly chooses hash functions and the estimation of a large number of hash functions could lead to an increase in query time. From Classification point of view, since LSH chooses randomly from a family of hash functions the buckets may contain points belonging to other classes which may affect classification accuracy. So, in order to overcome these problems we propose to use class association rules based hash functions which ensure that buckets corresponding to the class association rules contain points from the same class. But associative classification involves generation and examination of large number of candidate class association rules. So, we use the clustered itemsets which reduce the number of class association rules to be examined. We also establish formal connection between clustering parameter(delta used in the generation of clustered frequent itemsets) and discriminative measure such as Information gain. Our experimental studies show that the proposed method achieves an increase in accuracy over LSH based near neighbour classification strategy. Data Mining Classification - Algorithms Frequent Itemset Mining Clustered Itemsets Data Stream Mining Locality Sensitive Hashing Stream-Close Algorithm Associative Classification Clustered Frequent Itemsets Closed Frequent Itemsets Stream Mining Computer Science
6	[en] APPLYING PROCESS MINING TO THE ACADEMIC ADMINISTRATION DOMAIN / [pt] APLICAÇÃO DE MINERAÇÃO DE PROCESSOS AO DOMÍNIO ACADÊMICO ADMINISTRATIVO HAYDÉE GUILLOT JIMÉNEZ 12 December 2017 (has links) [pt] As instituições de ensino superior mantêm uma quantidade considerável de dados que incluem tanto os registros dos alunos como a estrutura dos currículos dos cursos de graduação. Este trabalho, adotando uma abordagem de mineração de processos, centra-se no problema de identificar quão próximo os alunos seguem a ordem recomendada das disciplinas em um currículo de graduação, e até que ponto o desempenho de cada aluno é afetado pela ordem que eles realmente adotam. O problema é abordado aplicando-se duas técnicas já existentes aos registros dos alunos: descoberta de processos e verificação de conformidade; e frequência de conjuntos de itens. Finalmente, a dissertação cobre experimentos realizados aplicando-se essas técnicas a um estudo de caso com mais de 60.000 registros de alunos da PUC-Rio. Os experimentos indicam que a técnica de frequência de conjuntos de itens produz melhores resultados do que as técnicas de descoberta de processos e verificação de conformidade. E confirmam igualmente a relevância de análises baseadas na abordagem de mineração de processos para ajudar coordenadores acadêmicos na busca de melhores currículos universitários. / [en] Higher Education Institutions keep a sizable amount of data, including student records and the structure of degree curricula. This work, adopting a process mining approach, focuses on the problem of identifying how closely students follow the recommended order of the courses in a degree curriculum, and to what extent their performance is affected by the order they actually adopt. It addresses this problem by applying to student records two already existing techniques: process discovery and conformance checking, and frequent itemsets. Finally, the dissertation covers experiments performed by applying these techniques to a case study involving over 60,000 student records from PUC-Rio. The experiments show that the frequent itemsets technique performs better than the process discovery and conformance checking techniques. They equally confirm the relevance of analyses based on the process mining approach to help academic coordinators in their quest for better degree curricula. [pt] ORGANIZACAO CURRICULAR [en] CURRICULUM ORGANIZATION [pt] ANALISE ACADEMICA [en] ACADEMIC ANALYTICS [pt] FREQUENCIA DE CONJUNTOS [en] FREQUENT ITEMSETS [pt] MINERACAO DE PROCESSOS [en] PROCESS MINING
7	Získávání znalostí z obchodních procesů / Business Process Mining Skácel, Jan January 2015 (has links) This thesis explains business process mining and it's principles. A substantial part is devoted to the problems of process discovery. Further, based on the analysis of specific manufacturing process are proposed three methods that are trying to identify shortcomings in the process. First discovers the manufacturing process and renders it into a graph. The second method uses simulator of production history to obtain products that may caused delays in the process. Acquired data are used to mine frequent itemsets. The third method tries to predict processing time on the selected workplace using asociation rules. Last two mentioned methods employ an algorithm Frequent Pattern Growth. The knowledge obtained from this thesis improve efficiency of the manufacturing process and enables better production planning.
8	Data Mining in a Multidimensional Environment Günzel, Holger, Albrecht, Jens, Lehner, Wolfgang 12 January 2023 (has links) Data Mining and Data Warehousing are two hot topics in the database research area. Until recently, conventional data mining algorithms were primarily developed for a relational environment. But a data warehouse database is based on a multidimensional model. In our paper we apply this basis for a seamless integration of data mining in the multidimensional model for the example of discovering association rules. Furthermore, we propose this method as a userguided technique because of the clear structure both of model and data. We present both the theoretical basis and efficient algorithms for data mining in the multidimensional data model. Our approach uses directly the requirements of dimensions, classifications and sparsity of the cube. Additionally we give heuristics for optimizing the search for rules. info:eu-repo/classification/ddc/004 ddc:004
9	Leveraging formal concept analysis and pattern mining for moving object trajectory analysis / Exploitation de l'analyse formelle de concepts et de l'extraction de motifs pour l'analyse de trajectoires d'objets mobiles Almuhisen, Feda 10 December 2018 (has links) Cette thèse présente un cadre de travail d'analyse de trajectoires contenant une phase de prétraitement et un processus d’extraction de trajectoires d’objets mobiles. Le cadre offre des fonctions visuelles reflétant le comportement d'évolution des motifs de trajectoires. L'originalité de l’approche est d’allier extraction de motifs fréquents, extraction de motifs émergents et analyse formelle de concepts pour analyser les trajectoires. A partir des données de trajectoires, les méthodes proposées détectent et caractérisent les comportements d'évolution des motifs. Trois contributions sont proposées : Une méthode d'analyse des trajectoires, basée sur les concepts formels fréquents, est utilisée pour détecter les différents comportements d’évolution de trajectoires dans le temps. Ces comportements sont “latents”, "emerging", "decreasing", "lost" et "jumping". Ils caractérisent la dynamique de la mobilité par rapport à l'espace urbain et le temps. Les comportements détectés sont visualisés sur des cartes générées automatiquement à différents niveaux spatio-temporels pour affiner l'analyse de la mobilité dans une zone donnée de la ville. Une deuxième méthode basée sur l'extraction de concepts formels séquentiels fréquents a également été proposée pour exploiter la direction des mouvements dans la détection de l'évolution. Enfin, une méthode de prédiction basée sur les chaînes de Markov est présentée pour prévoir le comportement d’évolution dans la future période pour une région. Ces trois méthodes sont évaluées sur ensembles de données réelles . Les résultats expérimentaux obtenus sur ces données valident la pertinence de la proposition et l'utilité des cartes produites / This dissertation presents a trajectory analysis framework, which includes both a preprocessing phase and trajectory mining process. Furthermore, the framework offers visual functions that reflect trajectory patterns evolution behavior. The originality of the mining process is to leverage frequent emergent pattern mining and formal concept analysis for moving objects trajectories. These methods detect and characterize pattern evolution behaviors bound to time in trajectory data. Three contributions are proposed: (1) a method for analyzing trajectories based on frequent formal concepts is used to detect different trajectory patterns evolution over time. These behaviors are "latent", "emerging", "decreasing", "lost" and "jumping". They characterize the dynamics of mobility related to urban spaces and time. The detected behaviors are automatically visualized on generated maps with different spatio-temporal levels to refine the analysis of mobility in a given area of the city, (2) a second trajectory analysis framework that is based on sequential concept lattice extraction is also proposed to exploit the movement direction in the evolution detection process, and (3) prediction method based on Markov chain is presented to predict the evolution behavior in the future period for a region. These three methods are evaluated on two real-world datasets. The obtained experimental results from these data show the relevance of the proposal and the utility of the generated maps Trajectoires Données spatio-Temporelles Motifs fréquents Motifs séquentiels Motifs émergents Concepts formels Treillis de concepts Comportement Visualisation Prédiction Modèle de Markov Trajectories Spatio-Temporal data Frequent itemsets Sequential patterns Emerging patterns Formal concepts Concept lattice Behavior Visualization Prediction Markov model.
10	Získávání znalostí z textových dat / Knowledge Discovery in Text Smékal, Luděk January 2007 (has links) This MSc Thesis handles with so-called data mining. Data mining is about obtaining some data or informations from databases, where these data or informations are not directly visible, but they are accessible by using special algorithms. This MSc Thesis mainly aims documents clasifying by selected method in scope of digital library. The selected method is based on sets of items called "itemsets method". This method extends Apriori algorithm application field originally designed for transaction databases processing and generation of sets of frequented items.

Search results