Spelling suggestions: "subject:"data mining -- cographic methods"" "subject:"data mining -- 12graphic methods""
1 |
Reverse Top-k search using random walk with restartYu, Wei, 余韡 January 2013 (has links)
With the increasing popularity of social networking applications, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Nodeto-node proximity is the key building block for many graph based applications that search or analyze the data. Among various proximity measures, random walk with restart (RWR) is widely adapted because of its ability to consider the global structure of the whole network.
Although RWR-based similarity search has been well studied before, there is no prior work on reverse top-k proximity search in graphs based on RWR. We discuss the applicability of this query and show that the direct application of existing methods on RWR-based similarity search to solve reverse top-k queries has very high computational and storage demands. To address this issue, we propose an indexing technique, paired with an on-line reverse top-k search algorithm.
In the indexing step, we compute from the graph G a graph index, which is based on a K X |V| matrix, containing in each column v the K largest approximate proximity values from v to any other node in G. K is application-dependent and represents the highest value of k in a practical reverse top-k query. At each column v of the index, the approximate values are lower bounds of the K largest proximity values from v to all other nodes.
Given the graph index and a reverse top-k query q (k _ K), we prove that the exact proximities from any node v to query q can be efficiently computed by applying the power method. By comparing these with the corresponding lower bounds taken from the k-th row of the graph index, we are able to determine which nodes are certainly not in the reverse top-k result of q. For some of the remaining nodes, we may also be able to determine that they are certainly in the reverse top-k result of q, based on derived upper bounds for the k-th largest proximity value from them. Finally, for any candidate that remains, we progressively refine its approximate proximities, until based on its lower or upper bound it can be determined not to be or to be in the result. The proximities refined during a reverse top-k are used to update the graph index, making its values progressively more accurate for future queries.
Our experimental evaluation shows that our technique is efficient and has manageable storage requirements even when applied on very large graphs. We also show the effectiveness of the reverse top-k search in the scenarios of spam detection and determining the popularity of authors. / published_or_final_version / Computer Science / Master / Master of Philosophy
|
2 |
Applying blended conceptual spaces to variable choice and aesthetics in data visualisationFeatherstone, Coral 09 1900 (has links)
Computational creativity is an active area of research within the artificial intelligence domain that investigates what aspects of computing can be considered as an analogue to the human creative process. Computers can be programmed to emulate the type of things that the human mind can. Artificial creativity is worthy of study for two reasons. Firstly, it can help in understanding human creativity and secondly it can help with the design of computer programs that appear to be creative. Although the implementation of creativity in computer algorithms is an active field, much of the research fails to specify which of the known theories of creativity it is aligning with.
The combination of computational creativity with computer generated visualisations has the potential to produce visualisations that are context sensitive with respect to the data and could solve some of the current automation problems that computers experience. In addition theories of creativity could theoretically compute unusual data combinations, or introducing graphical elements that draw attention to the patterns in the data. More could be learned about the creativity involved as humans go about the task of generating a visualisation.
The purpose of this dissertation was to develop a computer program that can automate the generation of a visualisation, for a suitably chosen visualisation type over a small domain of knowledge, using a subset of the computational creativity criteria, in order to try and explore the effects of the introduction of conceptual blending techniques. The problem is that existing computer programs that generate visualisations are lacking the creativity, intuition, background information, and visual perception that enable a human to decide
what aspects of the visualisation will expose patterns that are useful to the consumer of the visualisation. The main research question that guided this dissertation was, “How can criteria derived from theories of creativity be used in the generation of visualisations?”. In order to answer this question an analysis was done to determine which creativity theories and artificial intelligence techniques could potentially be used to implement the theories in the context of those relevant to computer generated visualisations. Measurable attributes and criteria that were sufficient for an algorithm that claims to model creativity were explored. The parts of the visualisation pipeline were identified and the aspects of visualisation generation that humans are better at than computers was explored. Themes that emerged in both the computational creativity and the visualisation literature were highlighted.
Finally a prototype was built that started to investigate the use of computational creativity methods in the ‘variable choice’, and ‘aesthetics’ stages of the data visualisation pipeline. / School of Computing / M. Sc. (Computing)
|
Page generated in 0.1094 seconds