Many real-world network data can be formulated as graphs, where a binary relation exists between nodes. One of the fundamental problems in network data analysis is community detection, clustering the nodes into different groups. Statistically, this problem can be formulated as hypothesis testing: under the null hypothesis, there is no community structure, while under the alternative hypothesis, community structure exists. One is of the method is to use the largest eigenvalues of the scaled adjacency matrix proposed by Bickel and Sarkar (2016), which works for dense graph. Another one is the subgraph counting method proposed by Gao and Lafferty (2017a), valid for sparse network. In this paper, firstly, we empirically study the BS or GL methods to see whether either of them works for moderately sparse network; secondly, we propose a subsampling method to reduce the computation of the BS method and run simulations to evaluate the performance.
Identifer | oai:union.ndltd.org:ndsu.edu/oai:library.ndsu.edu:10365/31640 |
Date | January 2019 |
Creators | Nan, Yehong |
Publisher | North Dakota State University |
Source Sets | North Dakota State University |
Detected Language | English |
Type | text/thesis |
Format | application/pdf |
Rights | NDSU policy 190.6.2, https://www.ndsu.edu/fileadmin/policy/190.pdf |
Page generated in 0.002 seconds