Global ETD Search

1	New approaches to weighted frequent pattern mining Yun, Unil 25 April 2007 (has links) Researchers have proposed frequent pattern mining algorithms that are more efficient than previous algorithms and generate fewer but more important patterns. Many techniques such as depth first/breadth first search, use of tree/other data structures, top down/bottom up traversal and vertical/horizontal formats for frequent pattern mining have been developed. Most frequent pattern mining algorithms use a support measure to prune the combinatorial search space. However, support-based pruning is not enough when taking into consideration the characteristics of real datasets. Additionally, after mining datasets to obtain the frequent patterns, there is no way to adjust the number of frequent patterns through user feedback, except for changing the minimum support. Alternative measures for mining frequent patterns have been suggested to address these issues. One of the main limitations of the traditional approach for mining frequent patterns is that all items are treated uniformly when, in reality, items have different importance. For this reason, weighted frequent pattern mining algorithms have been suggested that give different weights to items according to their significance. The main focus in weighted frequent pattern mining concerns satisfying the downward closure property. In this research, frequent pattern mining approaches with weight constraints are suggested. Our main approach is to push weight constraints into the pattern growth algorithm while maintaining the downward closure property. We develop WFIM (Weighted Frequent Itemset Mining with a weight range and a minimum weight), WLPMiner (Weighted frequent Pattern Mining with length decreasing constraints), WIP (Weighted Interesting Pattern mining with a strong weight and/or support affinity), WSpan (Weighted Sequential pattern mining with a weight range and a minimum weight) and WIS (Weighted Interesting Sequential pattern mining with a similar level of support and/or weight affinity) The extensive performance analysis shows that suggested approaches are efficient and scalable in weighted frequent pattern mining. Data mining Frequent pattern mining
2	Contrasting sequence groups by emerging sequences Deng, Kang. January 2009 (has links) Thesis (M. Sc.)--University of Alberta, 2009. / Title from PDF file main screen (viewed on Nov. 27, 2009). "A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Science, Department of Computing Science, University of Alberta." Includes bibliographical references.
3	Mining simple and complex patterns efficiently using Binary Decision Diagrams Loekito, E. January 2009 (has links) Pattern mining is a knowledge discovery task which is useful for finding interesting data characteristics. Existing mining techniques sometimes suffer from limited performance in challenging situations, such as when finding patterns in high-dimensional datasets. Binary Decision Diagrams and their variants are a compact and efficient graph data structure for representing and manipulating boolean functions and they are potentially attractive for solving many problems in pattern mining. This thesis explores techniques for the use of binary decision diagrams for mining both simple and complex types of patterns. / Firstly, we investigate the use of Binary Decision Diagrams for mining the fundamental types of patterns. These include frequent patterns, also known as frequent itemsets. We introduce a structure called the Weighted Zero-suppressed Binary Decision Diagram and evaluate its use on high dimensional data. This type of Decision Diagram is extremely useful for re-using intermediate patterns during computation. / Secondly, we study the problem of mining patterns in sequential databases. Here, we introduce a new structure called the Sequence Binary Decision Diagram, which can be used for mining frequent subsequences. We show that our technique is competitive with the state of the art and identify situations where it is superior. / Thirdly, we show how Weighted Zero-suppressed Binary Decision Diagrams can be used for discovering new and complex types of patterns. We introduce new types of highly expressive patterns for capturing contrasts, which express disjunctions of attribute values. Moreover, to investigate the usefulness of disjunctive patterns for knowledge discovery, we employ a statistical methodology for testing their significance, and study their use for solving classification problems. Our findings show that classifiers based on significant disjunctive patterns can be more robust than those which are only based on simple patterns. / Finally, we introduce patterns for capturing second-order differences between two groups of classes, which can provide useful insights for human experts. Again, we show how binary decision diagrams can be deployed for efficiently discovering this type of knowledge. / In summary, we demonstrate that Binary Decision Diagrams, are a powerful and scalable tool in pattern mining. We believe their use is very promising for a range of current and future tasks in the data mining context. Binary Decision Diagrams, Pattern Mining
4	Efficient frequent pattern mining from big data and its applications Jiang, Fan January 2016 (has links) Frequent pattern mining is an important research areas in data mining. Since its introduction, it has drawn attention of many researchers. Consequently, many algorithms have been proposed. Popular algorithms include level-wise Apriori based algorithms, tree based algorithms, and hyperlinked array structure based algorithms. While these algorithms are popular and beneficial due to some nice properties, they also suffer from some drawbacks such as multiple database scans, recursive tree constructions, or multiple hyperlink adjustments. In the current era of big data, high volumes of a wide variety of valuable data of different veracities can be easily collected or generated at high velocity in various real-life applications. Among these 5V's of big data, I focus on handling high volumes of big data in my Ph.D. thesis. Specifically, I design and implement a new efficient frequent pattern mining algorithmic technique called B-mine, which overcomes some of the aforementioned drawbacks and achieves better performance when compared with existing algorithms. I also extend my B-mine algorithm into a family of algorithms that can perform big data mining efficiently. Moreover, I design four different frameworks that apply this family of algorithms to the real-life application of social network mining. Evaluation results show the efficiency and practicality of all these algorithms. / February 2017 Frequent Pattern Mining Social Network Mining
5	On the discovery of relevant structures in dynamic and heterogeneous data Preti, Giulia 22 October 2019 (has links) We are witnessing an explosion of available data coming from a huge amount of sources and domains, which is leading to the creation of datasets larger and larger, as well as richer and richer. Understanding, processing, and extracting useful information from those datasets requires specialized algorithms that take into consideration both the dynamism and the heterogeneity of the data they contain. Although several pattern mining techniques have been proposed in the literature, most of them fall short in providing interesting structures when the data can be interpreted differently from user to user, when it can change from time to time, and when it has different representations. In this thesis, we propose novel approaches that go beyond the traditional pattern mining algorithms, and can effectively and efficiently discover relevant structures in dynamic and heterogeneous settings. In particular, we address the task of pattern mining in multi-weighted graphs, pattern mining in dynamic graphs, and pattern mining in heterogeneous temporal databases. In pattern mining in multi-weighted graphs, we consider the problem of mining patterns for a new category of graphs called emph{multi-weighted graphs}. In these graphs, nodes and edges can carry multiple weights that represent, for example, the preferences of different users or applications, and that are used to assess the relevance of the patterns. We introduce a novel family of scoring functions that assign a score to each pattern based on both the weights of its appearances and their number, and that respect the anti-monotone property, pivotal for efficient implementations. We then propose a centralized and a distributed algorithm that solve the problem both exactly and approximately. The approximate solution has better scalability in terms of the number of edge weighting functions, while achieving good accuracy in the results found. An extensive experimental study shows the advantages and disadvantages of our strategies, and proves their effectiveness. Then, in pattern mining in dynamic graphs, we focus on the particular task of discovering structures that are both well-connected and correlated over time, in graphs where nodes and edges can change over time. These structures represent edges that are topologically close and exhibit a similar behavior of appearance and disappearance in the snapshots of the graph. To this aim, we introduce two measures for computing the density of a subgraph whose edges change in time, and a measure to compute their correlation. The density measures are able to detect subgraphs that are silent in some periods of time but highly connected in the others, and thus they can detect events or anomalies happened in the network. The correlation measure can identify groups of edges that tend to co-appear together, as well as edges that are characterized by similar levels of activity. For both variants of density measure, we provide an effective solution that enumerates all the maximal subgraphs whose density and correlation exceed given minimum thresholds, but can also return a more compact subset of representative subgraphs that exhibit high levels of pairwise dissimilarity. Furthermore, we propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework with an extensive set of experiments on both real and synthetic datasets, and compare its performance with the main competitor algorithm. The results confirm the correctness of the exact solution, the high accuracy of the approximate, and the superiority of our framework over the existing solutions. In addition, they demonstrate the scalability of the framework and its applicability to networks of different nature. Finally, we address the problem of entity resolution in heterogeneous temporal data-ba-se-s, which are datasets that contain records that give different descriptions of the status of real-world entities at different periods of time, and thus are characterized by different sets of attributes that can change over time. Detecting records that refer to the same entity in such scenario requires a record similarity measure that takes into account the temporal information and that is aware of the absence of a common fixed schema between the records. However, existing record matching approaches either ignore the dynamism in the attribute values of the records, or assume that all the records share the same set of attributes throughout time. In this thesis, we propose a novel time-aware schema-agnostic similarity measure for temporal records to find pairs of matching records, and integrate it into an exact and an approximate algorithm. The exact algorithm can find all the maximal groups of pairwise similar records in the database. The approximate algorithm, on the other hand, can achieve higher scalability with the size of the dataset and the number of attributes, by relying on a technique called meta-blocking. This algorithm can find a good-quality approximation of the actual groups of similar records, by adopting an effective and efficient clustering algorithm.
6	Using Association Analysis for Medical Diagnoses Nunna, Shinjini 01 January 2016 (has links) In order to fully examine the application of association analysis to medical data for the purpose of deriving medical diagnoses, we survey classical association analysis and approaches, the current challenges faced by medical association analysis and proposed solutions, and finally culminate this knowledge in a proposition for the application of medical association analysis to the identification of food intolerance. The field of classical association analysis has been well studied since its introduction in the seminal paper on market basket research in the 1990's. While the theory itself is relatively simple, the brute force approach is prohibitively expensive and thus, creative approaches utilizing various data structures and strategies must be explored for efficiency. Medical association analysis is a burgeoning field with various focuses, including diagnosis systems and gene analysis. There are a number of challenges faced in the field, primarily stemming from characteristics of analysis of complex, voluminous and high dimensional medical data. We examine the challenges faced in the pre-processing, analysis and post-processing phases, and corresponding solutions. Additionally, we survey proposed measures for ensuring the results of medical association analysis will hold up to medical diagnosis standards. Finally, we explore how medical association analysis can be utilized to identify food intolerances. The proposed analysis system is based upon a current method of diagnosis used by medical professionals, and seeks to eliminate manual analysis, while more efficiently and intelligently identifying interesting, and less obvious patterns between patients' food consumption and symptoms to propose a food intolerance diagnosis. Association Analysis Frequent Pattern Mining Databases and Information Systems
7	Parallel itemset mining in massively distributed environments / Fouille de motifs en parallèle dans des environnements massivement distribués Salah, Saber 20 April 2016 (has links) Le volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes. Les méthodes d'analyse de données ont toujours été confrontées à des quantités qui mettent en difficulté les capacités de traitement, ou qui les dépassent. Pour franchir les verrous technologiques associés à ces questions d'analyse, la communauté peut se tourner vers les techniques de calcul distribué. En particulier, l'extraction de motifs, qui est un des problèmes les plus abordés en fouille de données, présente encore souvent de grandes difficultés dans le contexte de la distribution massive et du parallélisme. Dans cette thèse, nous abordons deux sujets majeurs liés à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie). / Le volume des données ne cesse de croître. À tel point qu'on parle aujourd'hui de "Big Data". La principale raison se trouve dans les progrès des outils informatique qui ont offert une grande flexibilité pour produire, mais aussi pour stocker des quantités toujours plus grandes.à l'extraction de motifs : les motifs fréquents, et les motifs informatifs (i.e., de forte entropie). Extraction de motifs Données distribuées Classification Pattern Mining Data distribution Classification
8	Pattern Mining and Concept Discovery for Multimodal Content Analysis Li, Hongzhi January 2016 (has links) With recent advances in computer vision, researchers have been able to demonstrate impressive performance at near-human-level capabilities in difficult tasks such as image recognition. For example, for images taken under typical conditions, computer vision systems now have the ability to recognize if a dog, cat, or car appears in an image. These advances are made possible by utilizing the massive volume of image datasets and label annotations, which include category labels and sometimes bounding boxes around the objects of interest within the image. However, one major limitation of the current solutions is that when users apply recognition models to new domains, users need to manually define the target classes and label the training data in order to prepare labeled annotations required for the process of training the recognition models. Manually identifying the target classes and constructing the concept ontology for a new domain are time-consuming tasks, as they require the users to be familiar with the content of the image collection, and the manual process of defining target classes is difficult to scale up to generate a large number of classes. In addition, there has been significant interest in developing knowledge bases to improve content analysis and information retrieval. Knowledge base is an object model (ontology) with classes, subclasses, attributes, instances, and relations among them. The knowledge base generation problem is to identify the (sub)classes and their structured relations for a given domain of interest. Similar to ontology construction, Knowledge base is usually generated by human experts manually, and it is usually a time-consuming and difficult task. Thus, it is important and necessary to find a way to explore the semantic concepts and their structural relations that are important for a target data collection or domain of interest, so that we can construct an ontology or knowledge base for visual data or multimodal content automatically or semi-automatically. Visual patterns are the discriminative and representative image content found in objects or local image regions seen in an image collection. Visual patterns can also be used to summarize the major visual concepts in an image collection. Therefore, automatic discovery of visual patterns can help users understand the content and structure of a data collection and in turn help users construct the ontology and knowledge base mentioned earlier. In this dissertation, we aim to answer the following question: given a new target domain and associated data corpora, how do we rapidly discover nameable content patterns that are semantically coherent, visually consistent, and can be automatically named with semantic concepts related to the events of interest in the target domains? We will develop pattern discovery methods that focus on visual content as well as multimodal data including text and visual. Traditional visual pattern mining methods only focus on analysis of the visual content, and do not have the ability to automatically name the patterns. To address this, we propose a new multimodal visual pattern mining and naming method that specifically addresses this shortcoming. The named visual patterns can be used as discovered semantic concepts relevant to the target data corpora. By combining information from multiple modalities, we can ensure that the discovered patterns are not only visually similar, but also have consistent meaning, as well. The capability of accurately naming the visual patterns is also important for finding relevant classes or attributes in the knowledge base construction process mentioned earlier. Our framework contains a visual model and a text model to jointly represent the text and visual content. We use the joint multimodal representation and the association rule mining technique to discover semantically coherent and visually consistent visual patterns. To discover better visual patterns, we further improve the visual model in the multimodal visual pattern mining pipeline, by developing a convolutional neural network (CNN) architecture that allows for the discovery of scale-invariant patterns. In this dissertation, we use news as an example domain and image caption pairs as example multimodal corpora to demonstrate the effectiveness of the proposed methods. However, the overall proposed framework is general and can be easily extended to other domains. The problem of concept discovery is made more challenging if the target application domain involves fine-grained object categories (e.g., highly related dog categories or consumer product categories). In such cases, the content of different classes could be quite similar, making automatic separation of classes difficult. In the proposed multimodal pattern mining framework, representation models for visual and text data play an important role, as they shape the pool of candidates that are fed to the pattern mining process. General models like the CNN models trained on ImageNet, though shown to be generalizable to various domains, are unable to capture the small differences in the fine-grained dataset. To address this problem, we propose a new representation model that uses an end-to-end artificial neural network architecture to discover visual patterns. This model can be fine-tuned on a fine-grained dataset so that the convolutional layers can be optimized to capture the features and patterns from the fine-trained image set. It has the ability to discover visual patterns from fine-grained image datasets because its convolutional layers of the CNN can be optimized to capture the features and patterns from the fine-grained images. Finally, to demonstrate the advantage of the proposed multimodal visual pattern mining and naming framework, we apply the proposed technique to two applications. In the first application, we use the visual pattern mining technique to find visual anchors to summarize video news events. In the second application, we use the visual patterns as important cues to link video news events to social media events. The contributions of this dissertation can be summarized as follows: (1) We develop a novel multimodal mining framework for discovering visual patterns and nameable concepts from a collection of multimodal data and automatically naming the discovered patterns, producing a large pool of semantic concepts specifically relevant to a high-level event. The framework combines visual representation based on CNN and text representation based on embedding. The named visual patterns can be required for construct event schema needed in the knowledge base construction process. (2) We propose a scale-invariant visual pattern mining model to improve the multimodal visual pattern mining framework. The improved visual model leads to better overall performance in discovering and naming concepts. To localize the visual patterns discovered in this framework, we propose a deconvolutional neural network model to localize the visual pattern patterns within the image. (3) To directly learn from data in the target domain, we propose a novel end-to-end neural network architecture called PatternNet for finding high-quality visual patterns even for datsets that consistent of fine-grained classes. (4) We demonstrate novel applications of visual pattern mining in two applications: video news event summarization and video news event linking. Data mining Computer science Computer vision Sequential pattern mining
9	Mining Mobile Group Patterns: A Trajectory-based Approach Liu, Ying-Han 30 July 2004 (has links) In recent years, with the popularization of the mobile devices, more and more location-based applications have been developed. As a result, location data of various objects is widely available. Identifying object groups that tend to move together is an emerging research topic. Existing approaches for identifying mobile group patterns assume the existence of raw location data which records a given object¡¦s position at every equal-spaced time point. However, a moving object may become disconnected voluntarily or involuntarily from time to time, and thus this assumption may not always valid. In this research, we describe the locations of moving object as a (non-continuous) trajectory function. Based on the new model, we re-define the mobile group mining problem and develop efficient algorithms for mining mobile groups. The proposed algorithms are evaluated via synthetic data generated by IBM City Simulator. mobile data mining group pattern mining mobile group pattern trajectory
10	The Discovery of Calendar-Based Mobile Group Patterns in Spatial-Temporal Databases Lee, Chung-Han 01 August 2006 (has links) In the past few years, due to the development of the mobile devices and the improvement of database technology, the geometric information has become widely available. Identifying object groups based on spatial-temporal dimension is an emerging research topic. Previous work has incorporating the spatial and temporal information pertaining to moving objects in finding mobile groups. Considering that mobile groups tend to exhibit some calendar-like temporal features, we define a new temporal presentation mechanism called flexible calendar pattern, which allows users to specify the desired calendar patterns at a coarse level. In addition, we developed efficient algorithms for mining mobile groups pertaining some user-specified flexible calendar pattern. The proposed algorithms are evaluated via the synthetic data generated by IBM City Simulator. The results show that our approaches prove to perform more efficiently than other intuitive approaches. calendar pattern group pattern mining mobile group pattern

Search results