Graph Neural Networks (GNNs) have emerged as a robust model for machine learning, addressing complex graph-structured data, in contrast to traditional deep learning techniques primarily used for image and text data. However, the scalability of GNNs on large graphs with billions of nodes and trillions of edges remains a challenge. Existing approaches propose partitioning across distributed systems or employing single machines with GPU caching techniques during the sampling phase. While the former encounters issues related to maintenance costs and increased latency, the latter faces bottlenecks in data movement, resulting in inefficient resource utilization and suboptimal training. To address the limitations of single-machine techniques, we direct our attention to the sampling stage and introduce a novel approach utilizing the Samsung smartSSD computational storage device. This approach significantly reduces unnecessary data movement overhead and minimizes overall training time. Computational storage devices enable the offloading of computations to their computational units. In our method, we calculate the required sampling subset on its Field programmable gate array (FPGA) of the smartSSD and transfer it to the host DRAM. Our experimental section illustrates that our proposed solution, compared to the baseline MMAP sampling method, achieves a speedup of up to 9 times in terms of sampling time and 5 times in host DRAM utilization.
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/48117 |
Date | 16 February 2024 |
Creators | Kritharakis, Emmanouil |
Contributors | Kalavri, Vasiliki |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Rights | Attribution 4.0 International, http://creativecommons.org/licenses/by/4.0/ |
Page generated in 0.0019 seconds