Return to search

Unsupervised random walk node embeddings for network block structure representation

There has been an explosion of network data in the physical, chemical, biological, computational, and social sciences in the last few decades. Node embeddings, i.e., Euclidean-space representations of nodes in a network, make it possible to apply to network data, tools and algorithms from multivariate statistics and machine learning that were developed for Euclidean-space data. Random walk node embeddings are a class of recently developed node embedding techniques where the vector representations are learned by optimizing objective functions involving skip-bigram statistics computed from random walks on the network. They have been applied to many supervised learning problems such as link prediction and node classification and have demonstrated state-of-the-art performance. Yet, their properties remain poorly understood.

This dissertation studies random walk based node embeddings in an unsupervised setting within the context of capturing hidden block structure in the network, i.e., learning node representations that reflect their patterns of adjacencies to other nodes. This doctoral research (i) Develops VEC, a random walk based unsupervised node embedding algorithm, and a series of relaxations, and experimentally validates their performance for the community detection problem under the Stochastic Block Model (SBM). (ii) Characterizes the ergodic limits of the embedding objectives to create non-randomized versions. (iii) Analyzes the embeddings for expected SBM networks and establishes certain concentration properties of the limiting ergodic objective in the large network asymptotic regime.

Comprehensive experimental results on real world and SBM random networks are presented to illustrate and compare the distributional and block-structure properties of node embeddings generated by VEC and related algorithms. As a step towards theoretical understanding, it is proved that for the variants of VEC with ergodic limits and convex relaxations, the embedding Grammian of the expected network of a two-community SBM has rank at most 2. Further experiments reveal that these extensions yield embeddings whose distribution is Gaussian-like, centered at the node embeddings of the expected network within each community, and concentrate in the linear degree-scaling regime as the number of nodes increases. / 2023-09-24T00:00:00Z

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/43083
Date25 September 2021
CreatorsLin, Christy
ContributorsIshwar, Prakash, Sussman, Daniel L.
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation
RightsAttribution-NonCommercial-NoDerivatives 4.0 International, http://creativecommons.org/licenses/by-nc-nd/4.0/

Page generated in 0.0016 seconds