Spelling suggestions: "subject:"graph aprocessing"" "subject:"graph eprocessing""
11 |
Performance Optimization of Memory-Bound Programs on Data Parallel AcceleratorsSedaghati Mokhtari, Naseraddin 08 June 2016 (has links)
No description available.
|
12 |
Um modelo multiperspectiva para avaliação de desempenho de plataformas de processamento de grafos / A multiperspective model for performance evaluation of graph processing platformsSilva, Daniel Nascimento Ramos da 21 February 2017 (has links)
Submitted by Maria Cristina (library@lncc.br) on 2017-05-02T19:25:42Z
No. of bitstreams: 1
dissertacao Daniel.pdf: 13436496 bytes, checksum: e52ea76aa8685ff28f62aea6a22f98cf (MD5) / Approved for entry into archive by Maria Cristina (library@lncc.br) on 2017-05-02T19:25:53Z (GMT) No. of bitstreams: 1
dissertacao Daniel.pdf: 13436496 bytes, checksum: e52ea76aa8685ff28f62aea6a22f98cf (MD5) / Made available in DSpace on 2017-05-02T19:26:02Z (GMT). No. of bitstreams: 1
dissertacao Daniel.pdf: 13436496 bytes, checksum: e52ea76aa8685ff28f62aea6a22f98cf (MD5)
Previous issue date: 2017-02-21 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / Many emerging challenges currently arising in science relate to understanding the dynamics and structure of complex systems consisting of interacting components. People on online social networks, power grids, air transportation networks, human brain connections, and agents in the financial market are some examples of network systems present in many domains. These systems, given their scale and non-trivial connectivity patterns, are called complex networks. In this sense, the dimension of these networks turns imperative the adoption of computing systems in support to their analysis. In this case, graphs are the usual tool for the representation of complex networks. Moreover, as the scientific investigation for these graphs is of great significance for many domains, the researcher and developer communities have proposed many computing platforms for their processing. However, the multitude of graph processing platforms brings up questions about the implications of their adoption, given the analysis, network and computing environment characteristics of the analysts. Therefore, in this dissertation, we propose a multi-perspective model for the performance evaluation of graph processing platforms which differs from related works in simultaneously approaching the topic from four perspectives: algorithms, computing environment, networks, and platforms. Additionally, we carry out a performance evaluation study that follows the directives indicated by the proposed model for a representative set of algorithms, platforms, and networks, demonstrating the proposed model applicability as exhibits the relationship between algorithms, networks and platforms efficiency. / Alguns dos desafios mais relevantes que surgem no cenário científico atual envolvem a compreensão da dinâmica e estrutura de sistemas complexos constituídos por componentes em interação. Pessoas em redes sociais online, sistemas de distribuição de energia, malhas aéreas, conexões no cérebro humano ou mesmo agentes no mercado financeiro são apenas alguns exemplos de tais sistemas em rede oriundos de diversas áreas. Esses sistemas, por causa de sua escala e da não trivialidade de seus padrões de conectividade, são chamados de redes complexas. Nesse contexto, a dimensão dessas redes torna imprescindível a utilização de sistemas computacionais em apoio às suas análises, sendo grafos a ferramenta típica de representação de redes complexas para sua modelagem computacional e estudo. Mais ainda, como a análise de grafos em diversos domínios é de grande relevância, muitas plataformas computacionais para o seu processamento têm sido propostas recentemente; o que provoca questionamentos pertinentes de quais as implicações da escolha de uma delas, dadas as características de análise, rede e ambiente computacional dos interessados. Portanto, esta dissertação propõe um modelo multiperspectiva de avaliação de desempenho de plataformas de processamento de grafos, o qual se distingue da literatura ao abordar o problema considerando simultaneamente quatro perspectivas: algoritmos, arquitetura computacional, plataformas e redes. Além disso, um estudo de avaliação de desempenho de um conjunto diverso e representativo de algoritmos, plataformas computacionais e redes é realizado utilizando as diretivas indicadas pelo modelo, demonstrando sua aplicabilidade ao expor o relacionamento entre as características de redes complexas e algoritmos com a eficiência computacional das plataformas.
|
13 |
An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path QueriesTetzel, Frank, Voigt, Hannes, Paradies, Marcus, Lehner, Wolfgang 13 June 2022 (has links)
Regular path queries (RPQs) are a fundamental part of recent graph query languages like SPARQL and PGQL. They allow the definition of recursive path structures through regular expressions in a declarative pattern matching environment. We study the use of the K2-tree graph compression technique to materialize RPQ results with low memory consumption for indexing. Compact index representations enable the efficient storage of multiple indexes for varying RPQs.
|
14 |
A Survey Of Persistent Graph DatabasesLiu, Yufan 23 April 2014 (has links)
No description available.
|
15 |
Scalable Management and Analysis of Temporal Property GraphsRost, Christopher 17 May 2024 (has links)
Graphs, as simple yet powerful data structures, play a pivotal role in modeling and analyzing relationships among real-world entities. In the data representation and analysis landscape, graph data structures have established themselves as a fundamental paradigm for modeling and understanding complex relationships in various domains. The intrinsic domain independence, expressiveness, and the wide variety of analysis options based on graph theory have gained significant attention in both research and industry.
In recent years, companies have increasingly leveraged graph technology to represent, store, query, and analyze graph-shaped data. This has been notably impactful in uncovering hidden patterns and predicting relationships within diverse domains such as social networks, Internet of Things (IoT), biological systems, and medical networks. However, the dynamic nature of most real-world graphs is often neglected in existing approaches, which might lead to inaccurate analytical results or an incomplete understanding of evolving patterns within the graph over time.
Temporal graphs, in contrast, are a particular type of graphs that maintain changing structures and properties over time. They have gained significant attention in various domains, from financial networks over micromobility networks to supply chains and biological networks. A majority of these real-world networks are not static but rather exhibit high dynamics, which are rarely considered in data models, query languages, and analyses, although analytical questions often require an evaluation of the network's evolution.
This doctoral thesis addresses this critical gap by presenting a comprehensive study on analyzing and exploring temporal property graphs. It focuses on scalability and proposes novel methodologies to enhance accuracy and comprehensiveness in analyzing evolving graph patterns over time. It also offers insights into real-time querying, addressing various challenges that emerge when the time dimension is treated as an integral part of the graph.
This thesis introduces the Temporal Property Graph Model (TPGM), a sophisticated data model designed for bitemporal modeling of vertices and edges, as well as logical abstractions of subgraphs and graph collections. The reference implementation of this model, namely Gradoop, is a graph dataflow system explicitly designed for scalable and distributed analysis of static and temporal property graphs. Gradoop empowers analysts to construct comprehensive and flexible temporal graph processing workflows through a declarative analytical language. The system supports various analytical temporal graph operators, such as snapshot retrieval, temporal graph pattern matching, time-dependent grouping, and temporal metrics such as degree evolution.
The thesis provides an in-depth analysis of the data model, system architecture, and implementation details of Gradoop and its operators. Comprehensive performance evaluations have been conducted on large real-world and synthetic temporal graphs, providing valuable insights into the system's capabilities and efficiency.
Furthermore, this thesis demonstrates the flexibility of the temporal graph model and its operators through a practical use case based on a call center network. In this scenario, a TPGM operator pipeline is developed to answer a complex and time-dependent analytical question. We also showcase the Temporal Graph Explorer (TGE), a web-based user interface designed to explore temporal graphs, leveraging Gradoop as a backend. The TGE empowers users to delve into temporal graph dynamics by enabling the retrieval of snapshots from the graph's past states, computing differences between these snapshots, and providing temporal summaries of graphs. This functionality allows for a comprehensive understanding of graph evolution through diverse visualizations. Real-world temporal graph data from bicycle rentals highlight the system's flexibility and configurability of the selected temporal operators.
The impact of graph changes on its characteristics can also be explored by examining centrality measures over time. Centrality measures, encompassing both node and graph metrics, quantify the characteristics of individual nodes or the entire graph. In the dynamic context of temporal graphs, where the graph changes over time, node and graph metrics also undergo implicit changes. This thesis tackles the challenge of adapting static node and graph metrics to temporal graphs. It proposes temporal extensions for selected degree-dependent metrics and aggregations, emphasizing the importance of including the time dimension in the metrics.
This thesis demonstrates that a metric conventionally representing a scalar value for static graphs results in a time series when applied to temporal graphs. It introduces a baseline algorithm for calculating the degree evolution of vertices within a temporal graph, and its practical implementation in Gradoop is presented. The scalability of this algorithm is evaluated using both real-world and synthetic datasets, providing valuable insights into its performance across diverse scenarios.
Such time series data can also be captured from the application scenario as properties of nodes and edges, such as sensor readings in the IoT domain.
In light of this, we showcase significant advancements, including an extended version of the TPGM that supports time series data in temporal graphs. Additionally, we introduce a temporal graph query language based on Oracle's language PGQL to facilitate seamless querying of time-oriented graph structures. Furthermore, we present a novel continuous graph-based event detection approach to support scenarios involving more time-sensitive use cases.
The increasing frequency of graph changes and the need to react quickly to emerging patterns leads to the need for a unified declarative graph query language that can evaluate queries on graph streams. To address the increasing importance of real-time data analysis and management, the thesis introduces the syntax and semantics of Seraph, a Cypher-based language that supports native streaming features within property graph query languages. The semantics of Seraph combine stream processing with property graphs and time-varying relations, treating time as a first-class citizen. Real-world industrial use cases demonstrate the usage of Seraph for graph-based continuous queries.
After evaluating lessons learned from the development and research on Gradoop, a dissertation summary and an outlook on future work are given in a final section. This doctoral thesis comprehensively examines scalable analysis and exploration techniques for temporal property graphs, focusing on Gradoop and its system architecture, data model, operators, and performance evaluations. It also explores the evolution of node and graph metrics and the theoretical foundation of a real-time query language, contributing to the advancement of temporal graph analysis in various domains.:1 Introduction
2 Background and Related Work
3 The TPGM and Gradoop
4 Gradoop Application Examples
5 Evolution of Degree Metrics
6 The Fusion of Graph and Time-Series Data
7 Seraph: Continuous Queries on Property Graph Streams
8 Lessons Learned from Gradoop
9 Conclusion and Outlook
Bibliography
|
16 |
Exploration of parallel graph-processing algorithms on distributed architectures / Exploration d’algorithmes de traitement parallèle de graphes sur architectures distribuéesCollet, Julien 06 December 2017 (has links)
Avec l'explosion du volume de données produites chaque année, les applications du domaine du traitement de graphes ont de plus en plus besoin d'être parallélisées et déployées sur des architectures distribuées afin d'adresser le besoin en mémoire et en ressource de calcul. Si de telles architectures larges échelles existent, issue notamment du domaine du calcul haute performance (HPC), la complexité de programmation et de déploiement d’algorithmes de traitement de graphes sur de telles cibles est souvent un frein à leur utilisation. De plus, la difficile compréhension, a priori, du comportement en performances de ce type d'applications complexifie également l'évaluation du niveau d'adéquation des architectures matérielles avec de tels algorithmes. Dans ce contexte, ces travaux de thèses portent sur l’exploration d’algorithmes de traitement de graphes sur architectures distribuées en utilisant GraphLab, un Framework de l’état de l’art dédié à la programmation parallèle de tels algorithmes. En particulier, deux cas d'applications réelles ont été étudiées en détails et déployées sur différentes architectures à mémoire distribuée, l’un venant de l’analyse de trace d’exécution et l’autre du domaine du traitement de données génomiques. Ces études ont permis de mettre en évidence l’existence de régimes de fonctionnement permettant d'identifier des points de fonctionnements pertinents dans lesquels on souhaitera placer un système pour maximiser son efficacité. Dans un deuxième temps, une étude a permis de comparer l'efficacité d'architectures généralistes (type commodity cluster) et d'architectures plus spécialisées (type serveur de calcul hautes performances) pour le traitement de graphes distribué. Cette étude a démontré que les architectures composées de grappes de machines de type workstation, moins onéreuses et plus simples, permettaient d'obtenir des performances plus élevées. Cet écart est d'avantage accentué quand les performances sont pondérées par les coûts d'achats et opérationnels. L'étude du comportement en performance de ces architectures a également permis de proposer in fine des règles de dimensionnement et de conception des architectures distribuées, dans ce contexte. En particulier, nous montrons comment l’étude des performances fait apparaitre les axes d’amélioration du matériel et comment il est possible de dimensionner un cluster pour traiter efficacement une instance donnée. Finalement, des propositions matérielles pour la conception de serveurs de calculs plus performants pour le traitement de graphes sont formulées. Premièrement, un mécanisme est proposé afin de tempérer la baisse significative de performance observée quand le cluster opère dans un point de fonctionnement où la mémoire vive est saturée. Enfin, les deux applications développées ont été évaluées sur une architecture à base de processeurs basse-consommation afin d'étudier la pertinence de telles architectures pour le traitement de graphes. Les performances mesurés en utilisant de telles plateformes sont encourageantes et montrent en particulier que la diminution des performances brutes par rapport aux architectures existantes est compensée par une efficacité énergétique bien supérieure. / With the advent of ever-increasing graph datasets in a large number of domains, parallel graph-processing applications deployed on distributed architectures are more and more needed to cope with the growing demand for memory and compute resources. Though large-scale distributed architectures are available, notably in the High-Performance Computing (HPC) domain, the programming and deployment complexity of such graphprocessing algorithms, whose parallelization and complexity are highly data-dependent, hamper usability. Moreover, the difficult evaluation of performance behaviors of these applications complexifies the assessment of the relevance of the used architecture. With this in mind, this thesis work deals with the exploration of graph-processing algorithms on distributed architectures, notably using GraphLab, a state of the art graphprocessing framework. Two use-cases are considered. For each, a parallel implementation is proposed and deployed on several distributed architectures of varying scales. This study highlights operating ranges, which can eventually be leveraged to appropriately select a relevant operating point with respect to the datasets processed and used cluster nodes. Further study enables a performance comparison of commodity cluster architectures and higher-end compute servers using the two use-cases previously developed. This study highlights the particular relevance of using clustered commodity workstations, which are considerably cheaper and simpler with respect to node architecture, over higher-end systems in this applicative context. Then, this thesis work explores how performance studies are helpful in cluster design for graph-processing. In particular, studying throughput performances of a graph-processing system gives fruitful insights for further node architecture improvements. Moreover, this work shows that a more in-depth performance analysis can lead to guidelines for the appropriate sizing of a cluster for a given workload, paving the way toward resource allocation for graph-processing. Finally, hardware improvements for next generations of graph-processing servers areproposed and evaluated. A flash-based victim-swap mechanism is proposed for the mitigation of unwanted overloaded operations. Then, the relevance of ARM-based microservers for graph-processing is investigated with a port of GraphLab on a NVIDIA TX2-based architecture.
|
17 |
Partitioning Strategy Selection for In-Memory Graph Pattern Matching on Multiprocessor SystemsKrause, Alexander, Kissinger, Thomas, Habich, Dirk, Voigt, Hannes, Lehner, Wolfgang 19 July 2023 (has links)
Pattern matching on large graphs is the foundation for a variety of application domains. The continuously increasing size of the underlying graphs requires highly parallel in-memory graph processing engines that need to consider non-uniform memory access (NUMA) and concurrency issues to scale up on modern multiprocessor systems. To tackle these aspects, a fine-grained graph partitioning becomes increasingly important. Hence, we present a classification of graph partitioning strategies and evaluate representative algorithms on medium and large-scale NUMA systems in this paper. As a scalable pattern matching processing infrastructure, we leverage a data-oriented architecture that preserves data locality and minimizes concurrency-related bottlenecks on NUMA systems. Our in-depth evaluation reveals that the optimal partitioning strategy depends on a variety of factors and consequently, we derive a set of indicators for selecting the optimal partitioning strategy suitable for a given graph and workload.
|
18 |
Graph Partitioning Algorithms for Minimizing Inter-node Communication on a Distributed SystemGadde, Srimanth January 2013 (has links)
No description available.
|
19 |
Compilation of Graph Algorithms for Hybrid, Cross-Platform and Distributed ArchitecturesPatel, Parita January 2017 (has links) (PDF)
1. Main Contributions made by the supplicant:
This thesis proposes an Open Computing Language (OpenCL) framework to address the challenges of implementation of graph algorithms on parallel architectures and large scale graph processing. The proposed framework uses the front-end of the existing Falcon DSL compiler, andso, programmers enjoy conventional, imperative and shared memory programming style. The back-end of the framework generates implementations of graph algorithms in OpenCL to target single device architectures. The generated OpenCL code is portable across various platforms, e.g., CPU and GPU, and also vendors, e.g., NVIDIA, Intel and AMD. The framework automatically generates code for thread management and memory management for the devices. It hides all the lower level programming details from the programmers. A few optimizations are applied to reduce the execution time.
The large graph processing challenge is tackled through graph partitioning over multiple devices of a single node and multiple nodes of a distributed cluster. The programmer codes a graph algorithm in Falcon assuming that the graph fits into single machine memory and the framework handles graph partitioning without any intervention by the programmer. The framework analyses the Abstract Syntax Tree (AST) generated by Falcon to find all the necessary information about communication and synchronization. It automatically generates code for message passing to hide the complexity of programming in a distributed environment. The framework also applies a set of optimizations to minimize the communication latency. The thesis reports results of several experiments conducted on widely used graph algorithms: single source shortest path, pagerank and minimum spanning tree to name a few. Experimental evaluations show that the reported results are comparable to the state-of-art non-portable graph DSLs and frameworks on a single node. Experiments in a distributed environment to show the scalability and efficiency of the framework are also described.
2. Summary of the Referees' Written Comments:
Extracts from the referees' reports are provided below. A copy of the written replies to the clarifications sought by the external examiner is appended to this report.
Referee 1: This thesis extends the Falcon framework with OpenCL for parallel graph processing on multi-device and multi-node architectures. The thesis makes important contributions. Processing large graphs in short time is very important, and making use of multiple nodes and devices is perhaps the only way to achieve this. Towards this, the thesis makes good contributions for easy programming, compiler transformations and efficient runtime systems. One of the commendable aspects of the thesis that it demonstrates with graphs that cannot be accommodated In the memory of a single device. The thesis is generally written well. The related work coverage is very good. The magnitude of thesis excellent for a Masters work. The experimental setup is very comprehensive with good set of graphs, good experimental comparisons with state-of-art works and good platforms. Particularly. the demonstration with a GPU cluster with multiple GPU nodes (Chapter 5) is excellent. The attempt to demonstrate scalability with 2, 4 and 8 nodes is also noteworthy.
However, the contributions on optimizations are weak. Most of the optimizations and compiler transformations are straight-forward. There should be summary observations on the results in Chapter 3, especially given that the results are mixed and don't quite clearly convey the clear advantages of their work. The same is the case with multi-device results in chapter 4, where the results are once again mixed. Similarly, the speedups and scalability achieved with multiple nodes are not great. The problem size justification in the multi-node results is not clear. (Referee 1 also indicates a couple of minor changes to the thesis).
Referee 2: The thesis uses the OpenCL framework to address the problem of programming graph algorithms on distributed systems. The use of OpenCL ensures that the generated code is platform-agnoistic and vendor-agnoistic. Sufficient experimentation with large scale graphs and reasonable size clusters have been conducted to demonstrate the scalability and portability of the code generated by the framework. The automatically generated code is almost as efficient as manually written code. The thesis is well written and is of high quality. The related work section is well organized and displays a good knowledge of the subject matter under consideration. The author has made important contributions to a good publication as well.
3. An Account of the Open Oral Examination:
The oral examination of Ms. Parita Patel took place during 10 AM and 11AM on 27th November 2017, in the Seminar Hall of the Department of Computer Science and Automation. The members of the Oral Examination Board present were, Prof. Sathish Vadhiyar, external examiner and Prof. Y. N. Srikant, research supervisor.
The candidate presented the work in an open defense seminar highlighting the problem domain, the methodology used, the investigations carried out by her, and the resulting contributions documented in the thesis before an audience consisting of the examiners, some faculty members, and students. Some of the questions posed by the examiners and the members of the audience during the oral examination are listed below.
1. How much is the overlap between Falcon work and this thesis?
Response: We have used the Falcon front end in our work. Further, the existing Falcon compiler was useful to us to test our own implementation of algorithms in Falcon.
2. Why are speedup and scalability not very high with multiple nodes?
Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen. 3. Do you have plans of making the code available for use by the community?
Response: The code includes some part of Falcon implementation (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community.
4. How can a graph that does not fit into a single device fit into a single node in the case of multiple nodes?
Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning.
5. Is there a way to permit morph algorithms to be coded in your framework?
Response: Currently, our framework does not translate morph algorithms. Supporting morph algorithms will require some kind of runtime system to manage memory on GPU since morph algorithms add and remove the vertices and edges to the graph dynamically. This can be further explored in future work.
6. Is it possible to accommodate FPGA devices in your framework?
Response: Yes, we can support FPGA devices (or any other device that is compatible for OpenCL) just by specifying the device type in the command line argument. We did not work with other devices because CPU and GPU are generally used to process graph algorithms.
The candidate provided satisfactory answers to all the questions posed and the clarifications sought by the audience and the examiners during the presentation. The candidate's overall performance during the open defense and the oral examination was very satisfactory to the oral examination board.
4. Certificate of Corrections and Changes: All the necessary corrections and changes suggested by the examiners have been made in the thesis and these have been verified by the members of the oral examination board. The thesis has been recommended for acceptance in its revised form.
5. Final Recommendation:
In view of the recommendations of the referees and the satisfactory performance of the candidate in the oral examination, the oral examination board recommends that the thesis of Ms. ParitaPatel be accepted for the award of the M.Sc(Engg.) Degree of the Institute.
Response to the comments by the external examiner on the M.Sc(Engg.) thesis “Compilation of Graph Algorithms for Hybrid, Cross-Platform, and Distributed Architectures” by Parita Patel
1. Comment: The contributions on optimizations are weak.
Response: The novelty of this thesis is to make the Falcon platform agnostic, and additionally process large scale graphs on multi-devices of a single node and multi-node clusters seamlessly. Our framework performs similar to the existing frameworks, but at the same time, it targets several types of architectures which are not possible in the existing works. Advanced optimizations are beyond the scope of this thesis.
2. Comment: The translation of Falcon to OpenCL is simple.
While the translation of Falcon to OpenCL was not hard, figuring out the details of the translation for multi-device and multi-node architectures was not simple. For example, design of implementations for collection, set, global variables, concurrency, etc., were non-trivial. These designs have already been explained in the appropriate places in the thesis. Further, such large software introduced its own intricacies during development.
3. Comment: Lines between Falcon work and this work are not clear.
Response: Appendix-A shows the falcon implementation of all the algorithms which we used to run the experiments. We compiled these falcon implementations through our framework and subsequently ran the generated code on different types of target architectures and compared the results with other framework's generated code. These falcon programs were written by us. We have also used the front-end of the Falcon compiler and this has already been stated in the thesis (page 16).
4. Comment: There should be a summary of observations in chapter 3.
Response: Summary of observations have been added to chapter 3 (pages 35-36), chapter 4 (page 46), and chapter 5 (page 51) of the thesis.
5. Comment: Speedup and scalability achieved with multiple nodes are not great.
Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen.
6. Comment: It will be good to separate the related work coverage into a separate chapter.
Response: The related work is coherent with the flow in chapter 1. It consists of just 4.5 pages and separating it into a separate chapter would make both (rest of) chapter 1 and the new chapter very small. Therefore, we do not recommend it.
7. Comment: The code should be made available for use by the community.
Response: The code includes some part of Falcon code (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community.
8. Comment: Page 28: Shouldn’t the else part be inside the kernel?
Response: There was some missing text and a few minor changes in Figure 3.14 (page 28) which have been incorporated in the corrected thesis.
9. Comment: Figure 4.1 needs to be explained better.
Response: Explanation for Figure 4.1 (pages 38-39) has been added to the thesis.
10. Comment: The problem size justification in the multi-node results is not clear.
Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning.
Name of the Candidate: Parita Patel (S.R. No. 04-04-00-10-21-14-1-11610)
Degree Registered: M.Sc(Engg.)
Department: Computer Science & Automation
Title of the Thesis: Compilation of Graph Algorithms for Hybrid, Cross-Platform and
Graph algorithms are abundantly used in various disciplines. These algorithms perform poorly
due to random memory access and negligible spatial locality. In order to improve performance, parallelism exhibited by these algorithms can be exploited by leveraging modern high performance parallel computing resources. Implementing graph algorithms for these parallel architectures requires manual thread management and memory management which becomes tedious for a programmer.
Large scale graphs cannot fit into the memory of a single machine. One solution is to partition the graph either on multiple devices of a single node or on multiple nodes of a distributed network. All the available frameworks for such architectures demand unconventional programming which is difficult and error prone.
To address these challenges, we propose a framework for compilation of graph algorithms written in an intuitive graph domain-specific language, Falcon. The framework targets shared memory parallel architectures, computational accelerators and distributed architectures (CPU and GPU cluster). First, it analyses the abstract syntax tree (generated by Falcon) and gathers essential information. Subsequently, it generates optimized code in OpenCL for shared-memory parallel architectures and computational accelerators, and OpenCL coupled with MPI code for distributed architectures. Motivation behind generating OpenCL code is its platform-agnostic and vendor-agnostic behavior, i.e., it is portable to all kinds of devices. Our framework makes memory management, thread management, message passing, etc., transparent to the user. None of the available domain-specific languages, frameworks or parallel libraries handle portable implementations of graph algorithms.
Experimental evaluations demonstrate that the generated code performs comparably to the state-of-the-art non-portable implementations and hand-tuned implementations. The results also show portability and scalability of our framework.
|
20 |
Block-based and structure-based techniques for large-scale graph processing and visualization / Técnicas baseadas em bloco e em estrutura para o processamento e visualização de grafos em larga escalaHugo Armando Gualdron Colmenares 23 November 2015 (has links)
Data analysis techniques can be useful in decision-making processes, when patterns of interest can indicate trends in specific domains. Such trends might support evaluation, definition of alternatives, or prediction of events. Currently, datasets have increased in size and complexity, posing challenges to modern hardware resources. In the case of large datasets that can be represented as graphs, issues of visualization and scalable processing are of current concern. Distributed frameworks are commonly used to deal with this data, but the deployment and the management of computational clusters can be complex, demanding technical and financial resources that can be prohibitive in several scenarios. Therefore, it is desirable to design efficient techniques for processing and visualization of large scale graphs that optimize hardware resources in a single computational node. In this course of action, we developed a visualization technique named StructMatrix to find interesting insights on real-life graphs. In addition, we proposed a graph processing framework M-Flash that used a novel, bimodal block processing strategy (BBP) to boost computation speed by minimizing I/O cost. Our results show that our visualization technique allows an efficient and interactive exploration of big graphs and our framework MFlash significantly outperformed all state-of-the-art approaches based on secondary memory. Our contributions have been validated in peer-review events demonstrating the potential of our finding in fostering the analytical possibilities related to large-graph data domains. / Técnicas de análise de dados podem ser úteis em processos de tomada de decisão, quando padrões de interesse indicam tendências em domínios específicos. Tais tendências podem auxiliar a avaliação, a definição de alternativas ou a predição de eventos. Atualmente, os conjuntos de dados têm aumentado em tamanho e complexidade, impondo desafios para recursos modernos de hardware. No caso de grandes conjuntos de dados que podem ser representados como grafos, aspectos de visualização e processamento escalável têm despertado interesse. Arcabouços distribuídos são comumente usados para lidar com esses dados, mas a implantação e o gerenciamento de clusters computacionais podem ser complexos, exigindo recursos técnicos e financeiros que podem ser proibitivos em vários cenários. Portanto é desejável conceber técnicas eficazes para o processamento e visualização de grafos em larga escala que otimizam recursos de hardware em um único nó computacional. Desse modo, este trabalho apresenta uma técnica de visualização chamada StructMatrix para identificar relacionamentos estruturais em grafos reais. Adicionalmente, foi proposta uma estratégia de processamento bimodal em blocos, denominada Bimodal Block Processing (BBP), que minimiza o custo de I/O para melhorar o desempenho do processamento. Essa estratégia foi incorporada a um arcabouço de processamento de grafos denominado M-Flash e desenvolvido durante a realização deste trabalho.Foram conduzidos experimentos a fim de avaliar as técnicas propostas. Os resultados mostraram que a técnica de visualização StructMatrix permitiu uma exploração eficiente e interativa de grandes grafos. Além disso, a avaliação do arcabouço M-Flash apresentou ganhos significativos sobre todas as abordagens baseadas em memória secundária do estado da arte. Ambas as contribuições foram validadas em eventos de revisão por pares, demonstrando o potencial analítico deste trabalho em domínios associados a grafos em larga escala.
|
Page generated in 0.0964 seconds