Return to search

Reverse Top-k search using random walk with restart

With the increasing popularity of social networking applications, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Nodeto-node proximity is the key building block for many graph based applications that search or analyze the data. Among various proximity measures, random walk with restart (RWR) is widely adapted because of its ability to consider the global structure of the whole network.
Although RWR-based similarity search has been well studied before, there is no prior work on reverse top-k proximity search in graphs based on RWR. We discuss the applicability of this query and show that the direct application of existing methods on RWR-based similarity search to solve reverse top-k queries has very high computational and storage demands. To address this issue, we propose an indexing technique, paired with an on-line reverse top-k search algorithm.
In the indexing step, we compute from the graph G a graph index, which is based on a K X |V| matrix, containing in each column v the K largest approximate proximity values from v to any other node in G. K is application-dependent and represents the highest value of k in a practical reverse top-k query. At each column v of the index, the approximate values are lower bounds of the K largest proximity values from v to all other nodes.

Given the graph index and a reverse top-k query q (k _ K), we prove that the exact proximities from any node v to query q can be efficiently computed by applying the power method. By comparing these with the corresponding lower bounds taken from the k-th row of the graph index, we are able to determine which nodes are certainly not in the reverse top-k result of q. For some of the remaining nodes, we may also be able to determine that they are certainly in the reverse top-k result of q, based on derived upper bounds for the k-th largest proximity value from them. Finally, for any candidate that remains, we progressively refine its approximate proximities, until based on its lower or upper bound it can be determined not to be or to be in the result. The proximities refined during a reverse top-k are used to update the graph index, making its values progressively more accurate for future queries.

Our experimental evaluation shows that our technique is efficient and has manageable storage requirements even when applied on very large graphs. We also show the effectiveness of the reverse top-k search in the scenarios of spam detection and determining the popularity of authors. / published_or_final_version / Computer Science / Master / Master of Philosophy

Identiferoai:union.ndltd.org:HKU/oai:hub.hku.hk:10722/197515
Date January 2013
CreatorsYu, Wei, 余韡
ContributorsMamoulis, N
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Source SetsHong Kong University Theses
LanguageEnglish
Detected LanguageEnglish
TypePG_Thesis
RightsCreative Commons: Attribution 3.0 Hong Kong License, The author retains all proprietary rights, (such as patent rights) and the right to use in future works.
RelationHKU Theses Online (HKUTO)

Page generated in 0.0053 seconds