Spelling suggestions: "subject:"data minining"" "subject:"data chanining""
381 |
Análisis e implementación de técnicas para el descubrimiento de reglas de asociación temporales generalizadasBlazquez, Ana Belén, Colareda, María Adriana January 2004 (has links)
No description available.
|
382 |
Two new approaches to evaluate association rulesDelpisheh, Elnaz, University of Lethbridge. Faculty of Arts and Science January 2010 (has links)
Data mining aims to discover interesting and unknown patterns in large-volume data. Association rule mining is one of the major data mining tasks, which attempts to find inherent relationships among data items in an application domain, such as supermarket basket analysis. An essential post-process in an association rule mining task is the evaluation of association rules by measures for their interestingness. Different interestingness measures have been proposed and studied. Given an association rule mining task, measures are assessed against a set of user-specified properties. However, in practice, given the subjectivity and inconsistencies in property specifications, it is a non-trivial task to make appropriate measure selections. In this work, we propose two novel approaches to assess interestingness measures. Our first approach utilizes the analytic hierarchy process to capture quantitatively domain-dependent requirements on properties, which are later used in assessing measures. This approach not only eliminates any inconsistencies in an end user’s property specifications through consistency checking but also is invariant to the number of association rules. Our
second approach dynamically evaluates association rules according to a composite and
collective effect of multiple measures. It interactively snapshots the end user’s domain-
dependent requirements in evaluating association rules. In essence, our approach uses
neural networks along with back-propagation learning to capture the relative importance
of measures in evaluating association rules. Case studies and simulations have been conducted to show the effectiveness of our two approaches. / viii, 85 leaves : ill. ; 29 cm
|
383 |
Streaming Random ForestsAbdulsalam, Hanady 16 July 2008 (has links)
Recent research addresses the problem of data-stream mining
to deal with applications that require processing huge amounts of data
such as sensor data analysis and financial applications.
Data-stream mining algorithms incorporate special provisions to meet
the requirements of stream-management systems, that is stream
algorithms must be online and incremental, processing each data
record only once (or few times); adaptive to distribution changes;
and fast enough to accommodate high arrival rates.
We consider the problem of data-stream classification,
introducing an online and incremental stream-classification
ensemble algorithm, Streaming Random Forests,
an extension of the Random Forests algorithm
by Breiman, which is a standard classification algorithm.
Our algorithm is designed to handle multi-class classification
problems.
It is able to deal with
data streams having an evolving nature and
a random arrival rate of training/test data records.
The algorithm, in addition, automatically adjusts its
parameters based on the data seen so far.
Experimental results on real and synthetic data
demonstrate that the algorithm gives a successful behavior.
Without losing classification accuracy, our algorithm
is able to handle multi-class problems for which the
underlying class boundaries drift, and handle the case when blocks of training
records are not big enough to build/update the classification model. / Thesis (Ph.D, Computing) -- Queen's University, 2008-07-15 16:12:33.221
|
384 |
Aggregation and Privacy in Multi-Relational DatabasesJafer, Yasser 11 April 2012 (has links)
Most existing data mining approaches perform data mining tasks on a single data table. However, increasingly, data repositories such as financial data and medical records, amongst others, are stored in relational databases. The inability of applying traditional data mining techniques directly on such relational database thus poses a serious challenge. To address this issue, a number of researchers convert a relational database into one or more flat files and then apply traditional data mining algorithms. The above-mentioned process of transforming a relational database into one or more flat files usually involves aggregation. Aggregation functions such as maximum, minimum, average, standard deviation, count and sum are commonly used in such a flattening process.
Our research aims to address the following question: Is there a link between aggregation and possible privacy violations during relational database mining? In this research we investigate how, and if, applying aggregation functions will affect the privacy of a relational database, during supervised learning, or classification, where the target concept is known. To this end, we introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. The PBIRD methodology combines multi-view learning with feature selection, to discover the potentially dangerous sets of features as hidden within a database. Our approach creates a number of views, which consist of subsets of the data, with and without aggregation. Then, by identifying and investigating the set of selected features in each view, potential privacy breaches are detected. In this way, our PBIRD algorithm is able to discover those features that are correlated with the classification target that may also lead to revealing of sensitive information in the database.
Our experimental results show that aggregation functions do, indeed, change the correlation between attributes and the classification target. We show that with aggregation, we obtain a set of features which can be accurately linked to the classification target and used to predict (with high accuracy) the confidential information. On the other hand, the results show that, without aggregation we obtain another different set of potentially harmful features. By identifying the complete set of potentially dangerous attributes, the PBIRD methodology provides a solution where the database designers/owners can be warned, to subsequently perform necessary adjustments to protect the privacy of the relational database.
In our research, we also perform a comparative study to investigate the impact of aggregation on the classification accuracy and on the time required to build the models. Our results suggest that in the case where a database consists only of categorical data, aggregation should especially be used with caution. This is due to the fact that aggregation causes a decrease in overall accuracies of the resulting models. When the database contains mixed attributes, the results show that the accuracies without aggregation and with aggregation are comparable. However, even in such scenarios, schemas without aggregation tend to slightly outperform. With regard to the impact of aggregation on the model building time, the results show that, in general, the models constructed with aggregation require shorter building time. However, when the database is small and consists of nominal attributes with high cardinality, aggregation causes a slower model building time.
|
385 |
Location Prediction in Social Media Based on Tie StrengthMcGee, Jeffrey A 03 October 2013 (has links)
We propose a novel network-based approach for location estimation in social media that integrates evidence of the social tie strength between users for improved location estimation. Concretely, we propose a location estimator – FriendlyLocation– that leverages the relationship between the strength of the tie between a pair of users, and the distance between the pair. Based on an examination of over 100 million geo-encoded tweets and 73 million Twitter user profiles, we identify several factors such as the number of followers and how the users interact that can strongly reveal the distance between a pair of users. We use these factors to train a decision tree to distinguish between pairs of users who are likely to live nearby and pairs of users who are likely to live in different areas. We use the results of this decision tree as the input to a maximum likelihood estimator to predict a user’s location. We find that this proposed method significantly improves the results of location estimation relative to a state-of-the-art technique. Our system reduces the average error distance for 80% of Twitter users from 40 miles to 21 miles using only information from the user’s friends and friends-of-friends, which has great significance for augmenting traditional social media and enriching location-based services with more refined and accurate location estimates.
|
386 |
Knowledge discovery in spatio-temporal databases /Abraham, Tamas Unknown Date (has links)
Thesis (PhD) -- University of South Australia, 1999
|
387 |
Data discretization simplified randomized binary search trees for data preprocessing /Boland, Donald Joseph. January 1900 (has links)
Thesis (M.S.)--West Virginia University, 2007. / Title from document title page. Document formatted into pages; contains xiv, 174 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 172-174).
|
388 |
An implementation of correspondence analysis in R and its application in the analysis of web usage /Nenadić, Oleg. January 2007 (has links)
Zugl.: Göttingen, University, Diss., 2007.
|
389 |
Data mining und graph mining auf molekularen Graphen - Cheminformatik und molekulare Kodierungen für ADME/Tox-QSAR-AnalysenWegner, Jörg Kurt January 2006 (has links)
Zugl.: Tübingen, Univ., Diss., 2006
|
390 |
Analytische Betrugserkennung im Bankenumfeld Evaluation von relationalem Data-Mining für die Suche nach BetrugsmusternReolon, Patrick January 2006 (has links)
Zugl.: Zürich, Univ., Diplomarbeit, 2006 u.d.T.: Reolon, Patrick: Analytische Betrugserkennung / Hergestellt on demand
|
Page generated in 0.101 seconds