Global ETD Search

1	An Efficient Subset-Lattice Algorithm for Mining Closed Frequent Itemsets in Data Streams Peng, Wei-hau 25 June 2009 (has links) Online mining association rules over data streams is an important issue in the area of data mining, where an association rule means that the presence of some items in a transaction will imply the presence of other items in the same transaction. There are many applications of using association rules in data streams, such as market analysis, network security, sensor networks and web tracking. Mining closed frequent itemsets is a further work of mining association rules, which aims to find the subsets of frequent itemsets that could extract all frequent itemsets. Formally, a closed frequent itemset is an frequent itemset which has no superset with the same support as it. Since data streams are continuous, high-speed, and unbounded, archiving everything from data streams is impossible. That is, we can only scan once for the data streams and it is a main-memory database. Therefore, previous algorithms to mine closed frequent itemsets in the traditional database are not suitable for data streams. On the other hand, many applications are interested in the most recent data, and there is a model to deal with the most recent data in data streams, called emph{Sliding Window Model}, which acquires the recent data with a window size meets this characteristic. One of well-known algorithms for mining closed frequent itemsets which based on the sliding window model is the NewMoment algorithm. However, the NewMoment algorithm could not efficiently mine closed frequent itemsets in data streams, since they will generate closed frequent itemsets and many unclosed frequent itemsets. Moreover, when data in the sliding window is incrementally updated, the NewMoment algorithm needs to reconstruct the whole tree structure. Therefore, in this thesis, we propose a sliding window approach, the Subset-Lattice algorithm, which embeds the subset property into the lattice structure to efficiently mine closed frequent itemsets. Basically, Our proposed algorithm considers five kinds of set concepts : (1) equivalent, (2) superset, (3) subset, (4) intersection, (5) empty relation, when data items are inserted. We judge closed frequent itemsets without generating unclosed frequent itemsets by these five kinds of set concepts. Moreover, when data in the sliding window is incrementally updated, our Subset-Lattice algorithm will not reconstruct the whole lattice structure. Therefore, our Subset-Lattice algorithm is more efficient than the Moment algorithm. Furthermore, we use the bit-pattern to represent the itemsets, and use bit-operations to speed up the set-checking. From our simulation results, we show that our Subset-Lattice algorithm needs less memory and less processing time than the NewMoment algorithm. When window slides, the execution time could be saved up to 50\%. frequent itemset closed frequent itemset Data Streams Association Rules
2	New approaches to weighted frequent pattern mining Yun, Unil 25 April 2007 (has links) Researchers have proposed frequent pattern mining algorithms that are more efficient than previous algorithms and generate fewer but more important patterns. Many techniques such as depth first/breadth first search, use of tree/other data structures, top down/bottom up traversal and vertical/horizontal formats for frequent pattern mining have been developed. Most frequent pattern mining algorithms use a support measure to prune the combinatorial search space. However, support-based pruning is not enough when taking into consideration the characteristics of real datasets. Additionally, after mining datasets to obtain the frequent patterns, there is no way to adjust the number of frequent patterns through user feedback, except for changing the minimum support. Alternative measures for mining frequent patterns have been suggested to address these issues. One of the main limitations of the traditional approach for mining frequent patterns is that all items are treated uniformly when, in reality, items have different importance. For this reason, weighted frequent pattern mining algorithms have been suggested that give different weights to items according to their significance. The main focus in weighted frequent pattern mining concerns satisfying the downward closure property. In this research, frequent pattern mining approaches with weight constraints are suggested. Our main approach is to push weight constraints into the pattern growth algorithm while maintaining the downward closure property. We develop WFIM (Weighted Frequent Itemset Mining with a weight range and a minimum weight), WLPMiner (Weighted frequent Pattern Mining with length decreasing constraints), WIP (Weighted Interesting Pattern mining with a strong weight and/or support affinity), WSpan (Weighted Sequential pattern mining with a weight range and a minimum weight) and WIS (Weighted Interesting Sequential pattern mining with a similar level of support and/or weight affinity) The extensive performance analysis shows that suggested approaches are efficient and scalable in weighted frequent pattern mining. Data mining Frequent pattern mining
3	Efficient frequent pattern mining from big data and its applications Jiang, Fan January 2016 (has links) Frequent pattern mining is an important research areas in data mining. Since its introduction, it has drawn attention of many researchers. Consequently, many algorithms have been proposed. Popular algorithms include level-wise Apriori based algorithms, tree based algorithms, and hyperlinked array structure based algorithms. While these algorithms are popular and beneficial due to some nice properties, they also suffer from some drawbacks such as multiple database scans, recursive tree constructions, or multiple hyperlink adjustments. In the current era of big data, high volumes of a wide variety of valuable data of different veracities can be easily collected or generated at high velocity in various real-life applications. Among these 5V's of big data, I focus on handling high volumes of big data in my Ph.D. thesis. Specifically, I design and implement a new efficient frequent pattern mining algorithmic technique called B-mine, which overcomes some of the aforementioned drawbacks and achieves better performance when compared with existing algorithms. I also extend my B-mine algorithm into a family of algorithms that can perform big data mining efficiently. Moreover, I design four different frameworks that apply this family of algorithms to the real-life application of social network mining. Evaluation results show the efficiency and practicality of all these algorithms. / February 2017 Frequent Pattern Mining Social Network Mining
4	arules - A Computational Environment for Mining Association Rules and Frequent Item Sets Hornik, Kurt, Grün, Bettina, Hahsler, Michael January 2005 (has links) (PDF) Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules. (authors' abstract)
5	A computational environment for mining association rules and frequent item sets Hahsler, Michael, Grün, Bettina, Hornik, Kurt January 2005 (has links) (PDF) Mining frequent itemsets and association rules is a popular and well researched approach to discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
6	Vocabulary Learning Strategies: A Study of Congolese English Language Learners Kaya, Jean 01 August 2014 (has links) The present study investigated the most and least frequent vocabulary learning strategies that English language teachers in Congo encourage students to use, and the strategies that Congolese students actually use to build their vocabulary. Finding out whether the students' most used strategies were teacher-encouraged or independently learned was another point of interest. A Likert-scale of 34 statements and four short-answer questions was designed to collect data. The participants included 20 male and 23 female Congolese learners of English of ages 18 to 22, all of them students in the Arts program at the Reconciliation High School in Brazzaville, Congo. Statistical and content analysis methods were employed. Attention to suffixes was the only strategy that showed a significant difference between the teacher encouraged and student used strategies. Two other strategies, guessing word meanings from context and learning words in collocations approached significance, but the difference between teacher encouragement and student use was not of practical importance. This strong correspondence between the strategies that teachers frequently encourage and students' use provided evidence about the important role that language teachers play in students' learning in general, and in strategy in particular. Quantitative results revealed contextual guessing and dictionary use to be the most frequently used strategies, whereas pronunciation was the least frequently used. Participants' narrative descriptions revealed that notebooks and notepads were frequently used in participants' independent learning of vocabulary. Furthermore, 52.38% (N= 22) of the participants attributed their frequently-used strategies to their teachers' practices and advice while 38.10% (N= 16) claimed that their strategies were independently learned. In view of theory and empirical research, the present study provided evidence that Congolese learners of English are taking responsibilities about their vocabulary learning progress by employing a variety of strategies, some of them acquired as a result of classroom learning, whereas others developed in their independent learning outside of school. Congolese EFL frequent learning strategies vocabulary
7	Using geo-spatial analysis for effective community paramedicine Leyenaar, Matthew 11 1900 (has links) Paramedic services are developing a new model of service delivery known as community paramedicine (CP). This service delivery model seeks to build on existing paramedic skills, establish collaboration with non-traditional health care partners, and create alternative pathways for accessing care. Frequent users of paramedic services represent patients that are of particular interest to CP programs. Chapters 2 and 3 of this thesis address questions of effective delivery of these programs. The second chapter is a spatial-temporal analysis of frequent users in Hamilton, ON. Drawing on concepts of time-geography and dynamic ambulance deployment, this analysis identifies space-time patterns in paramedic service utilization by frequent users. Data were aggregated to represent daily demand in terms of space and time. Analysis employed generalized linear mixed models that included a random slope effect for time intervals for each geographic unit. Fixed effects included distance to emergency department, proportion of residential addresses, and proportion of older adult population. Locations and times that had greater or less than expected daily demand from frequent users were identified. The findings can be used to tailor deployment of community paramedics in dual-capacity roles to address the system demand of frequent users. The third chapter analyzes the geographic influence of CP service delivery in Renfrew County, ON. This research draws on concepts of spatial accessibility and geographic profiling to estimate spatially defined probabilities of paramedic service use by frequent users. Due to ongoing CP programs within the county, the resultant community health profiles serve as an evaluation of the benefit of these programs. The community health profiles can also be used to assess community level probabilities of patient needs for future interventions. This analysis can serve as a new way to assess spatial accessibility to health care services and identify locations with increased risk of frequent use of paramedic services. / Thesis / Master of Arts (MA) Community Paramedicine Frequent Users Spatial Analysis Paramedic
8	Frequent Itemset Hiding Algorithm Using Frequent Pattern Tree Approach Alnatsheh, Rami H. 01 January 2012 (has links) A problem that has been the focus of much recent research in privacy preserving data-mining is the frequent itemset hiding (FIH) problem. Identifying itemsets that appear together frequently in customer transactions is a common task in association rule mining. Organizations that share data with business partners may consider some of the frequent itemsets sensitive and aim to hide such sensitive itemsets by removing items from certain transactions. Since such modifications adversely affect the utility of the database for data mining applications, the goal is to remove as few items as possible. Since the frequent itemset hiding problem is NP-hard and practical instances of this problem are too large to be solved optimally, there is a need for heuristic methods that provide good solutions. This dissertation developed a new method called Min_Items_Removed, using the Frequent Pattern Tree (FP-Tree) that outperforms extant methods for the FIH problem. The FP-Tree enables the compression of large databases into significantly smaller data structures. As a result of this compression, a search may be performed with increased speed and efficiency. To evaluate the effectiveness and performance of the Min_Items_Removed algorithm, eight experiments were conducted. The results showed that the Min_Items_Removed algorithm yields better quality solutions than extant methods in terms of minimizing the number of removed items. In addition, the results showed that the newly introduced metric (normalized number of leaves) is a very good indicator of the problem size or difficulty of the problem instance that is independent of the number of sensitive itemsets. FIH Frequent itemset hiding Frequent pattern tree Hiding patterns Sanitization Sensitive itemsets Computer Sciences
9	EMERGENCY DEPARTMENT FREQUENT USERS: A LATENT CLASS ANALYSIS AND ECONOMIC EVALUATION TO POTENTIALLY GUIDE UTILIZATION MANAGEMENT INTERVENTIONS Birmingham, Lauren E. 21 July 2017 (has links) No description available. Public Health Medicine Emergency department frequent user healthcare utilization utilization management frequent flier
10	A Contrast Pattern based Clustering Algorithm for Categorical Data Fore, Neil Koberlein 13 October 2010 (has links) No description available. Computer Science clustering contrast pattern frequent pattern frequent itemset categorical data categorical attributes discrete data

Search results