Global ETD Search

201	Fast frequent pattern mining. January 2003 (has links) Yabo Xu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2003. / Includes bibliographical references (leaves 57-60). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Frequent Pattern Mining --- p.1 / Chapter 1.2 --- Biosequence Pattern Mining --- p.2 / Chapter 1.3 --- Organization of the Thesis --- p.4 / Chapter 2 --- PP-Mine: Fast Mining Frequent Patterns In-Memory --- p.5 / Chapter 2.1 --- Background --- p.5 / Chapter 2.2 --- The Overview --- p.6 / Chapter 2.3 --- PP-tree Representations and Its Construction --- p.7 / Chapter 2.4 --- PP-Mine --- p.8 / Chapter 2.5 --- Discussions --- p.14 / Chapter 2.6 --- Performance Study --- p.15 / Chapter 3 --- Fast Biosequence Patterns Mining --- p.20 / Chapter 3.1 --- Background --- p.21 / Chapter 3.1.1 --- Differences in Biosequences --- p.21 / Chapter 3.1.2 --- Mining Sequential Patterns --- p.22 / Chapter 3.1.3 --- Mining Long Patterns --- p.23 / Chapter 3.1.4 --- Related Works in Bioinformatics --- p.23 / Chapter 3.2 --- The Overview --- p.24 / Chapter 3.2.1 --- The Problem --- p.24 / Chapter 3.2.2 --- The Overview of Our Approach --- p.25 / Chapter 3.3 --- The Segment Phase --- p.26 / Chapter 3.3.1 --- Finding Frequent Segments --- p.26 / Chapter 3.3.2 --- The Index-based Querying --- p.27 / Chapter 3.3.3 --- The Compression-based Querying --- p.30 / Chapter 3.4 --- The Pattern Phase --- p.32 / Chapter 3.4.1 --- The Pruning Strategies --- p.34 / Chapter 3.4.2 --- The Querying Strategies --- p.37 / Chapter 3.5 --- Experiment --- p.40 / Chapter 3.5.1 --- Synthetic Data Sets --- p.40 / Chapter 3.5.2 --- Biological Data Sets --- p.46 / Chapter 4 --- Conclusion --- p.55 / Bibliography --- p.60 Pattern recognition systems Data mining
202	Proposta para visualização de dados no website de carpooling www.rotapartilhada.com Sousa, João Pedro Lopes de January 2009 (has links) Tese de mestrado. Multimédia. Faculdade de Engenharia. Universidade do Porto. 2009 Infografia multimedia Data mining Websites
203	Towards the discovery of temporal patterns in music listening using Last.fm profiles Carneiro, Mário João Teixeira January 2011 (has links) Tese de mestrado integrado. Engenharia Informática e Computação. Universidade do Porto. Faculdade de Engenharia. 2011 Last.fm Preferências musicais Data mining
204	Novel Techniques for Efficient and Effective Subgroup Discovery / Neue Techniken für effiziente und effektive Subgruppenentdeckung Lemmerich, Florian January 2014 (has links) (PDF) Large volumes of data are collected today in many domains. Often, there is so much data available, that it is difficult to identify the relevant pieces of information. Knowledge discovery seeks to obtain novel, interesting and useful information from large datasets. One key technique for that purpose is subgroup discovery. It aims at identifying descriptions for subsets of the data, which have an interesting distribution with respect to a predefined target concept. This work improves the efficiency and effectiveness of subgroup discovery in different directions. For efficient exhaustive subgroup discovery, algorithmic improvements are proposed for three important variations of the standard setting: First, novel optimistic estimate bounds are derived for subgroup discovery with numeric target concepts. These allow for skipping the evaluation of large parts of the search space without influencing the results. Additionally, necessary adaptations to data structures for this setting are discussed. Second, for exceptional model mining, that is, subgroup discovery with a model over multiple attributes as target concept, a generic extension of the well-known FP-tree data structure is introduced. The modified data structure stores intermediate condensed data representations, which depend on the chosen model class, in the nodes of the trees. This allows the application for many popular model classes. Third, subgroup discovery with generalization-aware measures is investigated. These interestingness measures compare the target share or mean value in the subgroup with the respective maximum value in all its generalizations. For this setting, a novel method for deriving optimistic estimates is proposed. In contrast to previous approaches, the novel measures are not exclusively based on the anti-monotonicity of instance coverage, but also takes the difference of coverage between the subgroup and its generalizations into account. In all three areas, the advances lead to runtime improvements of more than an order of magnitude. The second part of the contributions focuses on the \emph{effectiveness} of subgroup discovery. These improvements aim to identify more interesting subgroups in practical applications. For that purpose, the concept of expectation-driven subgroup discovery is introduced as a new family of interestingness measures. It computes the score of a subgroup based on the difference between the actual target share and the target share that could be expected given the statistics for the separate influence factors that are combined to describe the subgroup. In doing so, previously undetected interesting subgroups are discovered, while other, partially redundant findings are suppressed. Furthermore, this work also approaches practical issues of subgroup discovery: In that direction, the VIKAMINE II tool is presented, which extends its predecessor with a rebuild user interface, novel algorithms for automatic discovery, new interactive mining techniques, as well novel options for result presentation and introspection. Finally, some real-world applications are described that utilized the presented techniques. These include the identification of influence factors on the success and satisfaction of university students and the description of locations using tagging data of geo-referenced images. / Neue Techniken für effiziente und effektive Subgruppenentdeckung Data Mining Wissensextraktion ddc:000
205	Generating sporadic association rules Koh, Yun Sing, n/a January 2007 (has links) Association rule mining is an essential part of data mining, which tries to discover associations, relationships, or correlations among sets of items. As it was initially proposed for market basket analysis, most of the previous research focuses on generating frequent patterns. This thesis focuses on finding infrequent patterns, which we call sporadic rules. They represent rare itemsets that are scattered sporadically throughout the database but with high confidence of occurring together. As sporadic rules have low support the minabssup (minimum absolute support) measure was proposed to filter out any rules with low support whose occurrence is indistinguishable from that of coincidence. There are two classes of sporadic rules: perfectly sporadic and imperfectly sporadic rules. Apriori-Inverse was then proposed for perfectly sporadic rule generation. It uses a maximum support threshold and user-defined minimum confidence threshold. This method is designed to find itemsets which consist only of items falling below a maximum support threshold. However imperfectly sporadic rules may contain items with a frequency of occurrence over the maximum support threshold. To look for these rules, variations of Apriori-Inverse, namely Fixed Threshold, Adaptive Threshold, and Hill Climbing, were proposed. However these extensions are heuristic. Thus the MIISR algorithm was proposed to find imperfectly sporadic rules using item constraints, which capture rules with a single-item consequent below the maximum support threshold. A comprehensive evaluation of sporadic rules and current interestingness measures was carried out. Our investigation suggests that current interestingness measures are not suitable for detecting sporadic rules. data mining sporadic groups mathematics
206	A comparison and selection of methods for handling missing data in data mining / Zou, Ying. January 2004 (has links) Thesis (M.Sc.)--York University, 2004. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 104-109). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL:http://gateway.proquest.com/openurl?url%5Fver=Z39.88-2004&res%5Fdat=xri:pqdiss&rft%5Fval%5Ffmt=info:ofi/fmt:kev:mtx:dissertation&rft%5Fdat=xri:pqdiss:MQ99411
207	Using data mining to dynamically build up just in time learner models Liu, Wengang 09 February 2010 Using rich data collected from e-learning systems, it may be possible to build up just in time dynamic learner models to analyze learners' behaviours and to evaluate learners' performance in online education systems. The goal is to create metrics to measure learners' characteristics from usage data. To achieve this goal we need to use data mining methods, especially clustering algorithms, to find patterns from which metrics can be derived from usage data. In this thesis, we propose a six layer model (raw data layer, fact data layer, data mining layer, measurement layer, metric layer and pedagogical application layer) to create a just in time learner model which draws inferences from usage data. In this approach, we collect raw data from online systems, filter fact data from raw data, and then use clustering mining methods to create measurements and metrics.<p> In a pilot study, we used usage data collected from the iHelp system to create measurements and metrics to observe learners' behaviours in a real online system. The measurements and metrics relate to a learner's sociability, activity levels, learning styles, and knowledge levels. To validate the approach we designed two experiments to compare the metrics and measurements extracted from the iHelp system: expert evaluations and learner self evaluations. Even though the experiments did not produce statistically significant results, this approach shows promise to describe learners' behaviours through dynamically generated measurements and metric. Continued research on these kinds of methodologies is promising. Educational Data Mining Learner Model
208	Using data mining to dynamically build up just in time learner models Liu, Wengang 09 February 2010 (has links) Using rich data collected from e-learning systems, it may be possible to build up just in time dynamic learner models to analyze learners' behaviours and to evaluate learners' performance in online education systems. The goal is to create metrics to measure learners' characteristics from usage data. To achieve this goal we need to use data mining methods, especially clustering algorithms, to find patterns from which metrics can be derived from usage data. In this thesis, we propose a six layer model (raw data layer, fact data layer, data mining layer, measurement layer, metric layer and pedagogical application layer) to create a just in time learner model which draws inferences from usage data. In this approach, we collect raw data from online systems, filter fact data from raw data, and then use clustering mining methods to create measurements and metrics.<p> In a pilot study, we used usage data collected from the iHelp system to create measurements and metrics to observe learners' behaviours in a real online system. The measurements and metrics relate to a learner's sociability, activity levels, learning styles, and knowledge levels. To validate the approach we designed two experiments to compare the metrics and measurements extracted from the iHelp system: expert evaluations and learner self evaluations. Even though the experiments did not produce statistically significant results, this approach shows promise to describe learners' behaviours through dynamically generated measurements and metric. Continued research on these kinds of methodologies is promising. Educational Data Mining Learner Model
209	The Research of Supporting Customer Values' Resolutions with "Data Warehousing"~ A Case Study of Concerning Subscribers' Churn Rate in TransAsia Telecommunications Yen, Yu-Lung 28 June 2001 (has links) ABSTRACT In recent years, Customer Relationship Management (CRM) and One to One Marketing have become two hit topics. Many enterprises have invested huge amount of money and manpower in these fields, hoping to build up a perfect model of customer management. Their major purpose of doing so is in desiring to raise their customer loyalty, therefore can create their corporation profits. In order to achieve this goal, they have to start to understand their customers. Advocators of One to One Marketing, Peppers and Rogers¡]1995¡^, have declared that reducing churn rate by 5% increases profit by 100%. Core value of marketing is going to shift from ¡§product¡¨ to ¡§customer¡¨. Whoever owns the most customer knowledge owns the most customer capitals in 21st century. Through Data Mining, a business can categorize its mass database into valuable information of customer behavior model. For learning to take place, data from many sources¡Xbilling records, scanner data, registration forms, applications, call records, coupon redemptions, surveys¡Xmust first be gathered together and organized in a consistent and useful way. This is called data warehousing. Data warehousing allows the enterprise to remember what it has noticed about its customers. Next, the data must be analyzed, understood, and turned into actionable information. That is where data mining comes in. By means of case study and grounded-theory, this article is in research of linkage between Data warehousing and increase of corporate value. As many business do not share their study outcome and experience on customer knowledge, this research provides a proof on how Data warehousing can efficiently support a business in reducing its churn rate and creating more business value. Custome Data Warehousing Data Mining
210	The Application of Fuzzy Decision Trees in Data Mining - Using Taiwan Stock Market as An Example Cheng, Yuan-Chung 18 June 2002 (has links) Taiwan stock market exists a special feature that over 80% of participants are natural persons while only 20% are legal persons. Compared to the latter, natural persons own less expertise in stock trading. Thus the effectiveness of the local stock market is an interesting subject for research. In this paper, we will try to find out an answer through the using of technical analysis on the past two years trading data to see if it can gain benefit in investment.Most of the similar research in past exist some problems, which either use only single or a pair of technical indices for prediction, predict only a specific stock, or filter out unwanted training and testing data in preprocessing, etc. Thus their results may not really reflect the effectiveness of the market. In this paper, we will adopt a different way of experiment design to conduct the test.Past research has shown that a fuzzy decision tree outperforms a normal crisp decision tree in data classification when there are numerical attributes in the target domain to be classified (Y.M. Jeng, 1993). Since most of the technical indices are expressed in terms of numerical values, we therefore choose it as the tool to generate rules from the eight largest stocks out of the local stock market that have the largest capitals and highest turnover rate. The trees are evaluated with more objective criteria and used to predict the up or down of the stock prices in the next day. The experimental results show that the created fuzzy trees have a better predictive accuracy than a random walk, and the investment rewards based on the trees are much better than the buy-and- hold policy. Fuzzy decision tree Data mining

Search results