Global ETD Search

381	Análisis e implementación de técnicas para el descubrimiento de reglas de asociación temporales generalizadas Blazquez, Ana Belén, Colareda, María Adriana January 2004 (has links) No description available. Ciencias Informáticas Data Mining Informática Procesamiento de Datos
382	Two new approaches to evaluate association rules Delpisheh, Elnaz, University of Lethbridge. Faculty of Arts and Science January 2010 (has links) Data mining aims to discover interesting and unknown patterns in large-volume data. Association rule mining is one of the major data mining tasks, which attempts to find inherent relationships among data items in an application domain, such as supermarket basket analysis. An essential post-process in an association rule mining task is the evaluation of association rules by measures for their interestingness. Different interestingness measures have been proposed and studied. Given an association rule mining task, measures are assessed against a set of user-specified properties. However, in practice, given the subjectivity and inconsistencies in property specifications, it is a non-trivial task to make appropriate measure selections. In this work, we propose two novel approaches to assess interestingness measures. Our first approach utilizes the analytic hierarchy process to capture quantitatively domain-dependent requirements on properties, which are later used in assessing measures. This approach not only eliminates any inconsistencies in an end user’s property specifications through consistency checking but also is invariant to the number of association rules. Our second approach dynamically evaluates association rules according to a composite and collective effect of multiple measures. It interactively snapshots the end user’s domain- dependent requirements in evaluating association rules. In essence, our approach uses neural networks along with back-propagation learning to capture the relative importance of measures in evaluating association rules. Case studies and simulations have been conducted to show the effectiveness of our two approaches. / viii, 85 leaves : ill. ; 29 cm Data mining Association rule mining Dissertations, Academic
383	Streaming Random Forests Abdulsalam, Hanady 16 July 2008 (has links) Recent research addresses the problem of data-stream mining to deal with applications that require processing huge amounts of data such as sensor data analysis and financial applications. Data-stream mining algorithms incorporate special provisions to meet the requirements of stream-management systems, that is stream algorithms must be online and incremental, processing each data record only once (or few times); adaptive to distribution changes; and fast enough to accommodate high arrival rates. We consider the problem of data-stream classification, introducing an online and incremental stream-classification ensemble algorithm, Streaming Random Forests, an extension of the Random Forests algorithm by Breiman, which is a standard classification algorithm. Our algorithm is designed to handle multi-class classification problems. It is able to deal with data streams having an evolving nature and a random arrival rate of training/test data records. The algorithm, in addition, automatically adjusts its parameters based on the data seen so far. Experimental results on real and synthetic data demonstrate that the algorithm gives a successful behavior. Without losing classification accuracy, our algorithm is able to handle multi-class problems for which the underlying class boundaries drift, and handle the case when blocks of training records are not big enough to build/update the classification model. / Thesis (Ph.D, Computing) -- Queen's University, 2008-07-15 16:12:33.221 Data mining Streams Classification algorithms Streaming algorithms
384	Aggregation and Privacy in Multi-Relational Databases Jafer, Yasser 11 April 2012 (has links) Most existing data mining approaches perform data mining tasks on a single data table. However, increasingly, data repositories such as financial data and medical records, amongst others, are stored in relational databases. The inability of applying traditional data mining techniques directly on such relational database thus poses a serious challenge. To address this issue, a number of researchers convert a relational database into one or more flat files and then apply traditional data mining algorithms. The above-mentioned process of transforming a relational database into one or more flat files usually involves aggregation. Aggregation functions such as maximum, minimum, average, standard deviation, count and sum are commonly used in such a flattening process. Our research aims to address the following question: Is there a link between aggregation and possible privacy violations during relational database mining? In this research we investigate how, and if, applying aggregation functions will affect the privacy of a relational database, during supervised learning, or classification, where the target concept is known. To this end, we introduce the PBIRD (Privacy Breach Investigation in Relational Databases) methodology. The PBIRD methodology combines multi-view learning with feature selection, to discover the potentially dangerous sets of features as hidden within a database. Our approach creates a number of views, which consist of subsets of the data, with and without aggregation. Then, by identifying and investigating the set of selected features in each view, potential privacy breaches are detected. In this way, our PBIRD algorithm is able to discover those features that are correlated with the classification target that may also lead to revealing of sensitive information in the database. Our experimental results show that aggregation functions do, indeed, change the correlation between attributes and the classification target. We show that with aggregation, we obtain a set of features which can be accurately linked to the classification target and used to predict (with high accuracy) the confidential information. On the other hand, the results show that, without aggregation we obtain another different set of potentially harmful features. By identifying the complete set of potentially dangerous attributes, the PBIRD methodology provides a solution where the database designers/owners can be warned, to subsequently perform necessary adjustments to protect the privacy of the relational database. In our research, we also perform a comparative study to investigate the impact of aggregation on the classification accuracy and on the time required to build the models. Our results suggest that in the case where a database consists only of categorical data, aggregation should especially be used with caution. This is due to the fact that aggregation causes a decrease in overall accuracies of the resulting models. When the database contains mixed attributes, the results show that the accuracies without aggregation and with aggregation are comparable. However, even in such scenarios, schemas without aggregation tend to slightly outperform. With regard to the impact of aggregation on the model building time, the results show that, in general, the models constructed with aggregation require shorter building time. However, when the database is small and consists of nominal attributes with high cardinality, aggregation causes a slower model building time. Aggregation Privacy Relational Database Data Mining
385	Location Prediction in Social Media Based on Tie Strength McGee, Jeffrey A 03 October 2013 (has links) We propose a novel network-based approach for location estimation in social media that integrates evidence of the social tie strength between users for improved location estimation. Concretely, we propose a location estimator – FriendlyLocation– that leverages the relationship between the strength of the tie between a pair of users, and the distance between the pair. Based on an examination of over 100 million geo-encoded tweets and 73 million Twitter user profiles, we identify several factors such as the number of followers and how the users interact that can strongly reveal the distance between a pair of users. We use these factors to train a decision tree to distinguish between pairs of users who are likely to live nearby and pairs of users who are likely to live in different areas. We use the results of this decision tree as the input to a maximum likelihood estimator to predict a user’s location. We find that this proposed method significantly improves the results of location estimation relative to a state-of-the-art technique. Our system reduces the average error distance for 80% of Twitter users from 40 miles to 21 miles using only information from the user’s friends and friends-of-friends, which has great significance for augmenting traditional social media and enriching location-based services with more refined and accurate location estimates. location prediction social media Twitter data mining
386	Knowledge discovery in spatio-temporal databases / Abraham, Tamas Unknown Date (has links) Thesis (PhD) -- University of South Australia, 1999 Data mining Spatial systems Temporal databases
387	Data discretization simplified randomized binary search trees for data preprocessing / Boland, Donald Joseph. January 1900 (has links) Thesis (M.S.)--West Virginia University, 2007. / Title from document title page. Document formatted into pages; contains xiv, 174 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 172-174).
388	An implementation of correspondence analysis in R and its application in the analysis of web usage / Nenadić, Oleg. January 2007 (has links) Zugl.: Göttingen, University, Diss., 2007.
389	Data mining und graph mining auf molekularen Graphen - Cheminformatik und molekulare Kodierungen für ADME/Tox-QSAR-Analysen Wegner, Jörg Kurt January 2006 (has links) Zugl.: Tübingen, Univ., Diss., 2006
390	Analytische Betrugserkennung im Bankenumfeld Evaluation von relationalem Data-Mining für die Suche nach Betrugsmustern Reolon, Patrick January 2006 (has links) Zugl.: Zürich, Univ., Diplomarbeit, 2006 u.d.T.: Reolon, Patrick: Analytische Betrugserkennung / Hergestellt on demand

Search results