Spelling suggestions: "subject:"subgraph counting"" "subject:"subgraphs counting""
1 |
Studies on Network Graph Analysis with Decision Diagram Structures / 決定グラフ構造によるネットワーク解析の研究Nakamura, Kengo 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第25443号 / 情博第881号 / 新制||情||148(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授 湊 真一, 教授 大木 英司, 教授 山本 章博 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
2 |
Efficient and Scalable Subgraph Statistics using Regenerative Markov Chain Monte CarloMayank Kakodkar (12463929) 26 April 2022 (has links)
<p>In recent years there has been a growing interest in data mining and graph machine learning for techniques that can obtain frequencies of <em>k</em>-node Connected Induced Subgraphs (<em>k</em>-CIS) contained in large real-world graphs. While recent work has shown that 5-CISs can be counted exactly, no exact polynomial-time algorithms are known that solve this task for <em>k </em>> 5. In the past, sampling-based algorithms that work well in moderately-sized graphs for <em>k</em> ≤ 8 have been proposed. In this thesis I push this boundary up to <em>k</em> ≤ 16 for graphs containing up to 120M edges, and to <em>k</em> ≤ 25 for smaller graphs containing between a million to 20M edges. I do so by re-imagining two older, but elegant and memory-efficient algorithms -- FANMOD and PSRW -- which have large estimation errors by modern standards. This is because FANMOD produces highly correlated k-CIS samples and the cost of sampling the PSRW Markov chain becomes prohibitively expensive for k-CIS’s larger than <em>k </em>> 8.</p>
<p>In this thesis, I introduce:</p>
<p>(a) <strong>RTS:</strong> a novel regenerative Markov chain Monte Carlo (MCMC) sampling procedure on the tree, generated on-the-fly by the FANMOD algorithm. RTS is able to run on multiple cores and multiple machines (embarrassingly parallel) and compute confidence intervals of estimates, all this while preserving the memory-efficient nature of FANMOD. RTS is thus able to estimate subgraph statistics for <em>k</em> ≤ 16 for larger graphs containing up to 120M edges, and for <em>k</em> ≤ 25 for smaller graphs containing between a million to 20M edges.</p>
<p>(b) <strong>R-PSRW:</strong> which scales the PSRW algorithm to larger CIS-sizes using a rejection sampling procedure to efficiently sample transitions from the PSRW Markov chain. R-PSRW matches RTS in terms of scaling to larger CIS sizes.</p>
<p>(c) <strong>Ripple:</strong> which achieves unprecedented scalability by stratifying the R-PSRW Markov chain state-space into ordered strata via a new technique that I call <em>sequential stratified regeneration</em>. I show that the Ripple estimator is consistent, highly parallelizable, and scales well. Ripple is able to <em>count</em> CISs of size up to <em>k </em>≤ 12 in real world graphs containing up to 120M edges.</p>
<p>My empirical results show that the proposed methods offer a considerable improvement over the state-of-the-art. Moreover my methods are able to run at a scale that has been considered unreachable until now, not only by prior MCMC-based methods but also by other sampling approaches. </p>
<p><strong>Optimization of Restricted Boltzmann Machines. </strong>In addition, I also propose a regenerative transformation of MCMC samplers of Restricted Boltzmann Machines RBMs. My approach, Markov Chain Las Vegas (MCLV) gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has a maximum number of Markov chain step-count <em>K</em> (referred as MCLV-<em>K</em>). I present a MCLV-<em>K</em> gradient estimator (LVS-<em>K</em>) for RBMs and explore the correspondence and differences between LVS-<em>K</em> and Contrastive Divergence (CD-<em>K</em>). LVS-<em>K</em> significantly outperforms CD-<em>K</em> in the task of training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.</p>
|
Page generated in 0.062 seconds