• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Outlier Detection with Applications in Graph Data Mining

Ranga Suri, N N R January 2013 (has links) (PDF)
Outlier detection is an important data mining task due to its applicability in many contemporary applications such as fraud detection and anomaly detection in networks, etc. It assumes significance due to the general perception that outliers represent evolving novel patterns in data that are critical to many discovery tasks. Extensive use of various data mining techniques in different application domains gave rise to the rapid proliferation of research work on outlier detection problem. This has lead to the development of numerous methods for detecting outliers in various problem settings. However, most of these methods deal primarily with numeric data. Therefore, the problem of outlier detection in categorical data has been considered in this work for developing some novel methods addressing various research issues. Firstly, a ranking based algorithm for detecting a likely set of outliers in a given categorical data has been developed employing two independent ranking schemes. Subsequently, the issue of data dimensionality has been addressed by proposing a novel unsupervised feature selection algorithm on categorical data. Similarly, the uncertainty associated with the outlier detection task has also been suitably dealt with by developing a novel rough sets based categorical clustering algorithm. Due to the networked nature of the data pertaining to many real life applications such as computer communication networks, social networks of friends, the citation networks of documents, hyper-linked networks of web pages, etc., outlier detection(also known as anomaly detection) in graph representation of network data turns out to be an important pattern discovery activity. Accordingly, a novel graph mining method has been envisaged in this thesis based on the concept of community detection in graphs. In addition to finding anomalous nodes and anomalous edges, this method is capable of detecting various higher level anomalies that are arbitrary sub-graphs of the input graph. Subsequently, these ideas have been further extended in this thesis to characterize the time varying behavior of outliers(anomalies) in dynamic network data by defining various categories of temporal outliers (anomalies). Characterizing the behavior of such outliers during the evolution of the network over time is critical for discovering different anomalous connectivity patterns with potential adverse effects such as intrusions into a computer network, etc. In order to deal with temporal outlier detection in single instance network/graph data, the link prediction task has been leveraged in this thesis to produce multiple instances of the input graph. Thus, various outlier detection principles have been successfully applied for mining various categories of temporal outliers(anomalies) in the graph representation of network data.
2

Využití metod dolování dat pro analýzu sociálních sítí / Using of Data Mining Method for Analysis of Social Networks

Novosad, Andrej January 2013 (has links)
Thesis discusses data mining the social media. It gives an introduction about the topic of data mining and possible mining methods. Thesis also explores social media and social networks, what are they able to offer and what problems do they bring. Three different APIs of three social networking sites are examined with their opportunities they provide for data mining. Techniques of text mining and document classification are explored. An implementation of a web application that mines data from social site Twitter using the algorithm SVM is being described. Implemented application is classifying tweets based on their text where classes represent tweets' continents of origin. Several experiments executed both in RapidMiner software and in implemented web application are then proposed and their results examined.

Page generated in 0.0637 seconds