Spelling suggestions: "subject:"rule minining"" "subject:"rule chanining""
21 |
Proximity based association rules for spatial data mining in genomesSaha, Surya 08 August 2009 (has links)
Our knowledge discovery algorithm employs a combination of association rule mining and graph mining to identify frequent spatial proximity relationships in genomic data where the data is viewed as a one-dimensional space. We apply mining techniques and metrics from association rule mining to identify frequently co-occurring features in genomes followed by graph mining to extract sets of co-occurring features. Using a case study of ab initio repeat finding, we have shown that our algorithm, ProxMiner, can be successfully applied to identify weakly conserved patterns among features in genomic data. The application of pairwise spatial relationships increases the sensitivity of our algorithm while the use of a confidence threshold based on false discovery rate reduces the noise in our results. Unlike available defragmentation algorithms, ProxMiner discovers associations among ab initio repeat families to identify larger more complete repeat families. ProxMiner will increase the effectiveness of repeat discovery techniques for newly sequenced genomes where ab initio repeat finders are only able to identify partial repeat families. In this dissertation, we provide two detailed examples of ProxMiner-discovered novel repeat families and one example of a known rice repeat family that has been extended by ProxMiner. These examples encompass some of the different types of repeat families that can be discovered by our algorithm. We have also discovered many other potentially interesting novel repeat families that can be further studied by biologists.
|
22 |
Improving the Scalability of an Exact Approach for Frequent Item Set HidingLaMacchia, Carolyn 01 January 2013 (has links)
Technological advances have led to the generation of large databases of organizational data recognized as an information-rich, strategic asset for internal analysis and sharing with trading partners. Data mining techniques can discover patterns in large databases including relationships considered strategically relevant to the owner of the data. The frequent item set hiding problem is an area of active research to study approaches for hiding the sensitive knowledge patterns before disclosing the data outside the organization. Several methods address hiding sensitive item sets including an exact approach that generates an extension to the original database that, when combined with the original database, limits the discovery of sensitive association rules without impacting other non-sensitive information. To generate the database extension, this method formulates a constraint optimization problem (COP). Solving the COP formulation is the dominant factor in the computational resource requirements of the exact approach. This dissertation developed heuristics that address the scalability of the exact hiding method. The heuristics are directed at improving the performance of COP solver by reducing the size of the COP formulation without significantly affecting the quality of the solutions generated. The first heuristic decomposes the COP formulation into multiple smaller problem instances that are processed separately by the COP solver to generate partial extensions of the database. The smaller database extensions are then combined to form a database extension that is close to the database extension generated with the original, larger COP formulation. The second heuristic evaluates the revised border used to formulate the COP and reduces the number of variables and constraints by selectively substituting multiple item sets with composite variables. Solving the COP with fewer variables and constraints reduces the computational cost of the processing. Results of heuristic processing were compared with an existing exact approach based on the size of the database extension, the ability to hide sensitive data, and the impact on nonsensitive data.
|
23 |
A data mining framework for targeted category promotionsReutterer, Thomas, Hornik, Kurt, March, Nicolas, Gruber, Kathrin 06 1900 (has links) (PDF)
This research presents a new approach to derive recommendations for
segment-specific, targeted marketing campaigns on the product category level. The
proposed methodological framework serves as a decision support tool for customer
relationship managers or direct marketers to select attractive product categories for
their target marketing efforts, such as segment-specific rewards in loyalty programs,
cross-merchandising activities, targeted direct mailings, customized supplements in
catalogues, or customized promotions. The proposed methodology requires cus-
tomers' multi-category purchase histories as input data and proceeds in a stepwise
manner. It combines various data compression techniques and integrates an opti-
mization approach which suggests candidate product categories for segment-specific
targeted marketing such that cross-category spillover effects for non-promoted
categories are maximized. To demonstrate the empirical performance of our pro-
posed procedure, we examine the transactions from a real-world loyalty program of
a major grocery retailer. A simple scenario-based analysis using promotion
responsiveness reported in previous empirical studies and prior experience by
domain experts suggests that targeted promotions might boost profitability between
15 % and 128 % relative to an undifferentiated standard campaign.
|
24 |
A Formal Concept Analysis Approach to Association Rule Mining: The QuICL AlgorithmsSmith, David T. 01 January 2009 (has links)
Association rule mining (ARM) is the task of identifying meaningful implication rules exhibited in a data set. Most research has focused on extracting frequent item (FI) sets and thus fallen short of the overall ARM objective. The FI miners fail to identify the upper covers that are needed to generate a set of association rules whose size can be exploited by an end user. An alternative to FI mining can be found in formal concept analysis (FCA), a branch of applied mathematics. FCA derives a concept lattice whose concepts identify closed FI sets and connections identify the upper covers. However, most FCA algorithms construct a complete lattice and therefore include item sets that are not frequent. An iceberg lattice, on the other hand, is a concept lattice whose concepts contain only FI sets. Only three algorithms to construct an iceberg lattice were found in literature. Given that an iceberg concept lattice provides an analysis tool to succinctly identify association rules, this study investigated additional algorithms to construct an iceberg concept lattice. This report presents the development and analysis of the Quick Iceberg Concept Lattice (QuICL) algorithms. These algorithms provide incremental construction of an iceberg lattice. QuICL uses recursion instead of iteration to navigate the lattice and establish connections, thereby eliminating costly processing incurred by past algorithms. The QuICL algorithms were evaluated against leading FI miners and FCA construction algorithms using benchmarks cited in literature. Results demonstrate that QuICL provides performance on the order of FI miners yet additionally derive the upper covers. QuICL, when combined with known algorithms to extract a basis of association rules from a lattice, offer a "best known" ARM solution. Beyond this, the QuICL algorithms have proved to be very efficient, providing an order of magnitude gains over other incremental lattice construction algorithms. For example, on the Mushroom data set, QuICL completes in less than 3 seconds. Past algorithms exceed 200 seconds. On T10I4D100k, QuICL completes in less than 120 seconds. Past algorithms approach 10,000 seconds. QuICL is proved to be the "best known" all around incremental lattice construction algorithm. Runtime complexity is shown to be O(l d i) where l is the cardinality of the lattice, d is the average degree of the lattice, and i is a mean function on the frequent item extents.
|
25 |
Apriori Sets And Sequences: Mining Association Rules from Time Sequence AttributesPray, Keith A 06 May 2004 (has links)
We introduce an algorithm for mining expressive temporal relationships from complex data. Our algorithm, AprioriSetsAndSequences (ASAS), extends the Apriori algorithm to data sets in which a single data instance may consist of a combination of attribute values that are nominal sequences, time series, sets, and traditional relational values. Datasets of this type occur naturally in many domains including health care, financial analysis, complex system diagnostics, and domains in which multi-sensors are used. AprioriSetsAndSequences identifies predefined events of interest in the sequential data attributes. It then mines for association rules that make explicit all frequent temporal relationships among the occurrences of those events and relationships of those events and other data attributes. Our algorithm inherently handles different levels of time granularity in the same data set. We have implemented AprioriSetsAndSequences within the Weka environment and have applied it to computer performance, stock market, and clinical sleep disorder data. We show that AprioriSetsAndSequences produces rules that express significant temporal relationships that describe patterns of behavior observed in the data set.
|
26 |
Combined map personalisation algorithm for delivering preferred spatial features in a map to everyday mobile device usersBookwala, Avinash Turab January 2009 (has links)
In this thesis, we present an innovative and novel approach to personalise maps/geo-spatial services for mobile users. With the proposed map personalisation approach, only relevant data will be extracted from detailed maps/geo-spatial services on the fly, based on a user’s current location, preferences and requirements. This would result in dramatic improvements in the legibility of maps on mobile device screens, as well as significant reductions in the amount of data being transmitted; which, in turn, would reduce the download time and cost of transferring the required geo-spatial data across mobile networks. Furthermore, the proposed map personalisation approach has been implemented into a working system, based on a four-tier client server architecture, wherein fully detailed maps/services are stored on the server, and upon a user’s request personalised maps/services, extracted from the fully detailed maps/services based on the user’s current location, preferences, are sent to the user’s mobile device through mobile networks. By using open and standard system development tools, our system is open to everyday mobile devices rather than smart phones and Personal Digital Assistants (PDA) only, as is prevalent in most current map personalisation systems. The proposed map personalisation approach combines content-based information filtering and collaborative information filtering techniques into an algorithmic solution, wherein content-based information filtering is used for regular users having a user profile stored on the system, and collaborative information filtering is used for new/occasional users having no user profile stored on the system. Maps/geo-spatial services are personalised for regular users by analysing the user’s spatial feature preferences automatically collected and stored in their user profile from previous usages, whereas, map personalisation for new/occasional users is achieved through analysing the spatial feature preferences of like-minded users in the system in order to make an inference for the target user. Furthermore, with the use of association rule mining, an advanced inference technique, the spatial features retrieved for new/occasional users through collaborative filtering can be attained. The selection of spatial features through association rule mining is achieved by finding interesting and similar patterns in the spatial features most commonly retrieved by different user groups, based on their past transactions or usage sessions with the system.
|
27 |
DS-ARM: An Association Rule Based Predictor that Can Learn from Imperfect DataSooriyaarachchi Wickramaratna, Kasun Jayamal 13 January 2010 (has links)
Over the past decades, many industries have heavily spent on computerizing their work environments with the intention to simplify and expedite access to information and its processing. Typical of real-world data are various types of imperfections, uncertainties, ambiguities, that have complicated attempts at automated knowledge discovery. Indeed, it soon became obvious that adequate methods to deal with these problems were critically needed. Simple methods such as "interpolating" or just ignoring data imperfections being found often to lead to inferences of dubious practical value, the search for appropriate modification of knowledge-induction techniques began. Sometimes, rather non-standard approaches turned out to be necessary. For instance, the probabilistic approaches by earlier works are not sufficiently capable of handling the wider range of data imperfections that appear in many new applications (e.g., medical data). Dempster-Shafer theory provides a much stronger framework, and this is why it has been chosen as the fundamental paradigm exploited in this dissertation. The task of association rule mining is to detect frequently co-occurring groups of items in transactional databases. The majority of the papers in this field concentrate on how to expedite the search. Less attention has been devoted to how to employ the identified frequent itemsets for prediction purposes; worse still, methods to tailor association-mining techniques so that they can handle data imperfections are virtually nonexistent. This dissertation proposes a technique referred to by the acronym DS-ARM (Dempster-Shafer based Association Rule Mining) where the DS-theoretic framework is used to enhance a more traditional association-mining mechanism. Of particular interest is here a method to employ the knowledge of partial contents of a "shopping cart" for the prediction of what else the customer is likely to add to it. This formalized problem has many applications in the analysis of medical databases. A recently-proposed data structure, an itemset tree (IT-tree), is used to extract association rules in a computationally efficient manner, thus addressing the scalability problem that has disqualified more traditional techniques from real-world applications. The proposed algorithm is based on the Dempster-Shafer theory of evidence combination. Extensive experiments explore the algorithm's behavior; some of them use synthetically generated data, others relied on data obtained from a machine-learning repository, yet others use a movie ratings dataset or a HIV/AIDS patient dataset.
|
28 |
A Text Mining Framework for Discovering Technological Intelligence to Support Science and Technology ManagementKongthon, Alisa 07 April 2004 (has links)
Science and Technology (S and T) information presents a rich resource, essential for managing research and development (R and D) programs. Management of R and D has long been a labor-intensive process, relying extensively on the accumulated knowledge of experts within the organization. Furthermore, the rapid pace of S and T growth has increased the complexity of R and D management significantly. Fortunately, the parallel growth of information and of analytical tools offers the promise of advanced decision aids to support R and D management more effectively. Information retrieval, data mining and other information-based technologies are receiving increased attention.
In this thesis, a framework based on text mining techniques is proposed to discover useful intelligence implicit in large bodies of electronic text sources. This intelligence is a prime requirement for successful R and D management. This research extends the approach called Technology Opportunities Analysis (developed by the Technology Policy and Assessment Center, Georgia Institute of Technology, in conjunction with Search Technology, Inc.) to create the proposed framework. The commercialized software, called VantagePoint, is mainly used to perform basic analyses. In addition to utilizing functions in VantagePoint, this thesis also implements a novel text association rule mining algorithm for gathering related concepts among text data. Two algorithms based on text association rule mining are also implemented. The first algorithm called tree-structured networks is used to capture important aspects of both parent-child (hierarchical structure) and sibling relations (non-hierarchical structure) among related terms. The second algorithm called concept-grouping is used to construct term thesauri for data preprocessing. Finally, the framework is applied to Thai S and T publication abstracts toward the objective of improving R and D management. The results of the study can help support strategic decision-making on the direction of S and T programs in Thailand.
|
29 |
A context-aware system to predict user's intention on smartphone based on ECA ModelLee, Ko-han 21 August 2012 (has links)
With the development of artificial intelligence , the application of recommender systems has been extended to fields such as e-commerce shopping cart analysis or video recommendation system. These systems provide user a recommended resource set based on their habits or behavior patterns to help users saving searching cost. However, these techniques have not been successfully adopted to help users search functions on smart-phones more efficiency. This research is designated to build the context-aware system, which can generate the list of operations predicting which function user might use under certain contexts through continuously learning users operation patterns and related device perceived scenario. The system utilize event-condition-action patterns to describe user frequent behaviors, and the research will focus on developing innovative Action-Condition-Fit algorithm to figure the similarity between action pattern sets and real-time scenario. Proposed system and algorithm will then be built on Google App Engine and Android device to empirically validate its performance through field test.
|
30 |
Temporal Data Mining with a Hierarchy of Time GranulesWu, Pei-Shan 28 August 2012 (has links)
Data mining techniques have been widely applied to extract desirable knowledge from existing databases for specific purposes. In real-world applications, a database usually involves the time periods when transactions occurred and exhibition periods of items, in addition to the items bought in the transactions. To handle this kind of data, temporal data mining techniques are thus proposed to find temporal association rules from a database with time. Most of the existing studies only consider different item lifespans to find general temporal association rules, and this may neglect some useful information. For example, while an item within the whole exhibition period may not be a frequent one, it may be frequent within part of this time. To deal with this, the concept of a hierarchy of time is thus applied to temporal data mining along with suitable time granules, as defined by users. In this thesis, we thus handle the problem of mining temporal association rules with a hierarchy of time granules from a temporal database, and also propose three novel mining algorithms for different item lifespan definitions. In the first definition, the lifespan of an item in a time granule is calculated from the first appearance time to the end time in the time granule. In the second definition, the lifespan of an item in a time granule is evaluated from the publication time of the item to the end time in the time granule. Finally, in the third definition, the lifespan of an item in a time granule is measured by its entire exhibition period. The experimental results on a simulation dataset show the performance of the three proposed algorithms under different item lifespan definitions, and compare the mined temporal association rules with and without consideration of the hierarchy of time granules under different parameter settings.
|
Page generated in 0.3151 seconds