31 |
Apriori Sets And Sequences: Mining Association Rules from Time Sequence AttributesPray, Keith A 06 May 2004 (has links)
We introduce an algorithm for mining expressive temporal relationships from complex data. Our algorithm, AprioriSetsAndSequences (ASAS), extends the Apriori algorithm to data sets in which a single data instance may consist of a combination of attribute values that are nominal sequences, time series, sets, and traditional relational values. Datasets of this type occur naturally in many domains including health care, financial analysis, complex system diagnostics, and domains in which multi-sensors are used. AprioriSetsAndSequences identifies predefined events of interest in the sequential data attributes. It then mines for association rules that make explicit all frequent temporal relationships among the occurrences of those events and relationships of those events and other data attributes. Our algorithm inherently handles different levels of time granularity in the same data set. We have implemented AprioriSetsAndSequences within the Weka environment and have applied it to computer performance, stock market, and clinical sleep disorder data. We show that AprioriSetsAndSequences produces rules that express significant temporal relationships that describe patterns of behavior observed in the data set.
|
32 |
Combined map personalisation algorithm for delivering preferred spatial features in a map to everyday mobile device usersBookwala, Avinash Turab January 2009 (has links)
In this thesis, we present an innovative and novel approach to personalise maps/geo-spatial services for mobile users. With the proposed map personalisation approach, only relevant data will be extracted from detailed maps/geo-spatial services on the fly, based on a user’s current location, preferences and requirements. This would result in dramatic improvements in the legibility of maps on mobile device screens, as well as significant reductions in the amount of data being transmitted; which, in turn, would reduce the download time and cost of transferring the required geo-spatial data across mobile networks. Furthermore, the proposed map personalisation approach has been implemented into a working system, based on a four-tier client server architecture, wherein fully detailed maps/services are stored on the server, and upon a user’s request personalised maps/services, extracted from the fully detailed maps/services based on the user’s current location, preferences, are sent to the user’s mobile device through mobile networks. By using open and standard system development tools, our system is open to everyday mobile devices rather than smart phones and Personal Digital Assistants (PDA) only, as is prevalent in most current map personalisation systems. The proposed map personalisation approach combines content-based information filtering and collaborative information filtering techniques into an algorithmic solution, wherein content-based information filtering is used for regular users having a user profile stored on the system, and collaborative information filtering is used for new/occasional users having no user profile stored on the system. Maps/geo-spatial services are personalised for regular users by analysing the user’s spatial feature preferences automatically collected and stored in their user profile from previous usages, whereas, map personalisation for new/occasional users is achieved through analysing the spatial feature preferences of like-minded users in the system in order to make an inference for the target user. Furthermore, with the use of association rule mining, an advanced inference technique, the spatial features retrieved for new/occasional users through collaborative filtering can be attained. The selection of spatial features through association rule mining is achieved by finding interesting and similar patterns in the spatial features most commonly retrieved by different user groups, based on their past transactions or usage sessions with the system.
|
33 |
DS-ARM: An Association Rule Based Predictor that Can Learn from Imperfect DataSooriyaarachchi Wickramaratna, Kasun Jayamal 13 January 2010 (has links)
Over the past decades, many industries have heavily spent on computerizing their work environments with the intention to simplify and expedite access to information and its processing. Typical of real-world data are various types of imperfections, uncertainties, ambiguities, that have complicated attempts at automated knowledge discovery. Indeed, it soon became obvious that adequate methods to deal with these problems were critically needed. Simple methods such as "interpolating" or just ignoring data imperfections being found often to lead to inferences of dubious practical value, the search for appropriate modification of knowledge-induction techniques began. Sometimes, rather non-standard approaches turned out to be necessary. For instance, the probabilistic approaches by earlier works are not sufficiently capable of handling the wider range of data imperfections that appear in many new applications (e.g., medical data). Dempster-Shafer theory provides a much stronger framework, and this is why it has been chosen as the fundamental paradigm exploited in this dissertation. The task of association rule mining is to detect frequently co-occurring groups of items in transactional databases. The majority of the papers in this field concentrate on how to expedite the search. Less attention has been devoted to how to employ the identified frequent itemsets for prediction purposes; worse still, methods to tailor association-mining techniques so that they can handle data imperfections are virtually nonexistent. This dissertation proposes a technique referred to by the acronym DS-ARM (Dempster-Shafer based Association Rule Mining) where the DS-theoretic framework is used to enhance a more traditional association-mining mechanism. Of particular interest is here a method to employ the knowledge of partial contents of a "shopping cart" for the prediction of what else the customer is likely to add to it. This formalized problem has many applications in the analysis of medical databases. A recently-proposed data structure, an itemset tree (IT-tree), is used to extract association rules in a computationally efficient manner, thus addressing the scalability problem that has disqualified more traditional techniques from real-world applications. The proposed algorithm is based on the Dempster-Shafer theory of evidence combination. Extensive experiments explore the algorithm's behavior; some of them use synthetically generated data, others relied on data obtained from a machine-learning repository, yet others use a movie ratings dataset or a HIV/AIDS patient dataset.
|
34 |
A Text Mining Framework for Discovering Technological Intelligence to Support Science and Technology ManagementKongthon, Alisa 07 April 2004 (has links)
Science and Technology (S and T) information presents a rich resource, essential for managing research and development (R and D) programs. Management of R and D has long been a labor-intensive process, relying extensively on the accumulated knowledge of experts within the organization. Furthermore, the rapid pace of S and T growth has increased the complexity of R and D management significantly. Fortunately, the parallel growth of information and of analytical tools offers the promise of advanced decision aids to support R and D management more effectively. Information retrieval, data mining and other information-based technologies are receiving increased attention.
In this thesis, a framework based on text mining techniques is proposed to discover useful intelligence implicit in large bodies of electronic text sources. This intelligence is a prime requirement for successful R and D management. This research extends the approach called Technology Opportunities Analysis (developed by the Technology Policy and Assessment Center, Georgia Institute of Technology, in conjunction with Search Technology, Inc.) to create the proposed framework. The commercialized software, called VantagePoint, is mainly used to perform basic analyses. In addition to utilizing functions in VantagePoint, this thesis also implements a novel text association rule mining algorithm for gathering related concepts among text data. Two algorithms based on text association rule mining are also implemented. The first algorithm called tree-structured networks is used to capture important aspects of both parent-child (hierarchical structure) and sibling relations (non-hierarchical structure) among related terms. The second algorithm called concept-grouping is used to construct term thesauri for data preprocessing. Finally, the framework is applied to Thai S and T publication abstracts toward the objective of improving R and D management. The results of the study can help support strategic decision-making on the direction of S and T programs in Thailand.
|
35 |
A Study on Fuzzy Temporal Data MiningLin, Shih-Bin 06 September 2011 (has links)
Data mining is an important process of extracting desirable knowledge from existing databases for specific purposes. Nearly all transactions in real-world databases involve items bought, quantities of the items, and the time periods in which they appear. In the past, temporal quantitative mining was proposed to find temporal quantitative rules from a temporal quantitative database. However, the quantitative values of items are not suitable to human reasoning. To deal with this, the fuzzy set theory was applied to the temporal quantitative mining because of its simplicity and similarity to human reasoning. In this thesis, we thus handle the problem of mining fuzzy temporal association rules from a publication database, and propose three algorithms to achieve it. The three algorithms handle different lifespan definitions, respectively. In the first algorithm, the lifespan of an item is evaluated from the time of the first transaction with the item to the end time of the whole database. In the second algorithm, an additional publication table, which includes the publication date of each item in stores, is given, and thus the lifespan of an item is measured by its entire publication period. Finally in the third algorithm, the lifespan of an item is calculated from the end time of the whole database to its earliest time in the database for the item to be a fuzzy temporal frequent item within the duration. In addition, an effective itemset table structure is designed to store and get information about itemsets and can thus speed up the execution efficiency of the mining process. At last, experimental results on two simulation datasets compare the mined fuzzy temporal quantitative itemsets and rules with and without consideration of lifespans of items under different parameter settings.
|
36 |
A context-aware system to predict user's intention on smartphone based on ECA ModelLee, Ko-han 21 August 2012 (has links)
With the development of artificial intelligence , the application of recommender systems has been extended to fields such as e-commerce shopping cart analysis or video recommendation system. These systems provide user a recommended resource set based on their habits or behavior patterns to help users saving searching cost. However, these techniques have not been successfully adopted to help users search functions on smart-phones more efficiency. This research is designated to build the context-aware system, which can generate the list of operations predicting which function user might use under certain contexts through continuously learning users operation patterns and related device perceived scenario. The system utilize event-condition-action patterns to describe user frequent behaviors, and the research will focus on developing innovative Action-Condition-Fit algorithm to figure the similarity between action pattern sets and real-time scenario. Proposed system and algorithm will then be built on Google App Engine and Android device to empirically validate its performance through field test.
|
37 |
Temporal Data Mining with a Hierarchy of Time GranulesWu, Pei-Shan 28 August 2012 (has links)
Data mining techniques have been widely applied to extract desirable knowledge from existing databases for specific purposes. In real-world applications, a database usually involves the time periods when transactions occurred and exhibition periods of items, in addition to the items bought in the transactions. To handle this kind of data, temporal data mining techniques are thus proposed to find temporal association rules from a database with time. Most of the existing studies only consider different item lifespans to find general temporal association rules, and this may neglect some useful information. For example, while an item within the whole exhibition period may not be a frequent one, it may be frequent within part of this time. To deal with this, the concept of a hierarchy of time is thus applied to temporal data mining along with suitable time granules, as defined by users. In this thesis, we thus handle the problem of mining temporal association rules with a hierarchy of time granules from a temporal database, and also propose three novel mining algorithms for different item lifespan definitions. In the first definition, the lifespan of an item in a time granule is calculated from the first appearance time to the end time in the time granule. In the second definition, the lifespan of an item in a time granule is evaluated from the publication time of the item to the end time in the time granule. Finally, in the third definition, the lifespan of an item in a time granule is measured by its entire exhibition period. The experimental results on a simulation dataset show the performance of the three proposed algorithms under different item lifespan definitions, and compare the mined temporal association rules with and without consideration of the hierarchy of time granules under different parameter settings.
|
38 |
階層的可視化手法を用いたアソシエーション分析によるプロファイリングMITSUMATSU, Sawako, FURUHASHI, Takeshi, YOSHIKAWA, Tomohiro, ITO, Akira, 光松, 佐和子, 古橋, 武, 吉川, 大弘, 伊藤, 晃 12 1900 (has links)
No description available.
|
39 |
Applying the Apriori and FP-Growth Association Algorithms to Liver Cancer DataPinheiro, Fabiola M. R. 27 August 2013 (has links)
Cancer is the leading cause of deaths globally. Although liver cancer ranks only
fourth in incidence worldwide among all types of cancer, its survivability rate is the
lowest. Liver cancer is often diagnosed at an advanced stage, because in the early stages
of the disease patients usually do not have signs or symptoms. After initial diagnosis,
therapeutic options are limited and tend to be effective only for small size tumors with
limited spread and minimal vascular invasion. As a result, long-term patient survival
remains minimal, and has not improved in the past three decades. In order to reduce
morbidity and mortality from liver cancer, improvement in early diagnosis and the
evaluation of current treatments are essential.
This study tested the applicability of the Apriori and FP-Growth association data
mining algorithms to liver cancer patient data, obtained from the British Columbia
Cancer Agency. The data was used to develop association rules which indicate what
combinations of factors are most commonly observed with liver cancer incidence as well
as with increased or decreased rates of mortality.
Ideally, these association rules will be applied in future studies using liver cancer
data extracted from other Electronic Health Record (EHR) systems. The main objective
of making these rules available is to facilitate early detection guidelines for liver cancer
and to evaluate current treatment options. / Graduate / 0566 / 0984 / fabiola@uvic.ca
|
40 |
Domain-concept mining an efficient on-demand data mining approach /Mahamaneerat, Wannapa Kay, Shyu, Chi-Ren. January 2008 (has links)
Title from PDF of title page (University of Missouri--Columbia, viewed on February 24, 2010). The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file. Dissertation advisor: Dr. Chi-Ren Shyu. Vita. Includes bibliographical references.
|
Page generated in 0.1133 seconds