• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 52
  • 13
  • 6
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 97
  • 97
  • 30
  • 22
  • 19
  • 16
  • 12
  • 11
  • 11
  • 10
  • 10
  • 9
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Falcon : A Graph Manipulation Language for Distributed Heterogeneous Systems

Cheramangalath, Unnikrishnan January 2017 (has links) (PDF)
Graphs model relationships across real-world entities in web graphs, social network graphs, and road network graphs. Graph algorithms analyze and transform a graph to discover graph properties or to apply a computation. For instance, a pagerank algorithm computes a rank for each page in a webgraph, and a community detection algorithm discovers likely communities in a social network, while a shortest path algorithm computes the quickest way to reach a place from another, in a road network. In Domains such as social information systems, the number of edges can be in billions or trillions. Such large graphs are processed on distributed computer systems or clusters. Graph algorithms can be executed on multi-core CPUs, GPUs with thousands of cores, multi-GPU devices, and CPU+GPU clusters, depending on the size of the graph object. While programming such algorithms on heterogeneous targets, a programmer is required to deal with parallelism and and also manage explicit data communication between distributed devices. This implies that a programmer is required to learn CUDA, OpenMP, MPI, etc., and also the details of the hardware architecture. Such codes are error prone and di cult to debug. A Domain Speci c Language (DSL) which hides all the hardware details and lets the programmer concentrate only the algorithmic logic will be very useful. With this as the research goal, Falcon, graph DSL and its compiler have been developed. Falcon programs are explicitly parallel and Falcon hides all the hardware details from the programmer. Large graphs that do not t into the memory of a single device are automatically partitioned by the Falcon compiler. Another feature of Falcon is that it supports mutation of graph objects and thus enables programming dynamic graph algorithms. The Falcon compiler converts a single DSL code to heterogeneous targets such as multi-core CPUs, GPUs, multi-GPU devices, and CPU+GPU clusters. Compiled codes of Falcon match or outperform state-of-the-art graph frameworks for di erent target platforms and benchmarks.
82

Large Scale Graph Processing in a Distributed Environment

Upadhyay, Nitesh January 2017 (has links) (PDF)
Graph algorithms are ubiquitously used across domains. They exhibit parallelism, which can be exploited on parallel architectures, such as multi-core processors and accelerators. However, real world graphs are massive in size and cannot fit into the memory of a single machine. Such large graphs are partitioned and processed in a distributed cluster environment which consists of multiple GPUs and CPUs. Existing frameworks that facilitate large scale graph processing in the distributed cluster have their own style of programming and require extensive involvement by the user in communication and synchronization aspects. Adaptation of these frameworks appears to be an overhead for a programmer. Furthermore, these frameworks have been developed to target only CPU clusters and lack the ability to harness the GPU architecture. We provide a back-end framework to the graph Domain Specific Language, Falcon, for large scale graph processing on CPU and GPU clusters. The Motivation behind choosing this DSL as a front-end is its shared-memory based imperative programmability feature. Our framework generates Giraph code for CPU clusters. Giraph code runs on the Hadoop cluster and is known for scalable and fault-tolerant graph processing. For GPU cluster, Our framework applies a set of optimizations to reduce computation and communication latency, and generates efficient CUDA code coupled with MPI. Experimental evaluations show the scalability and performance of our framework for both CPU and GPU clusters. The performance of the framework generated code is comparable to the manual implementations of various algorithms in distributed environments.
83

Connecting hitting sets and hitting paths in graphs

Camby, Eglantine 30 June 2015 (has links)
Dans cette thèse, nous étudions les aspects structurels et algorithmiques de différents problèmes de théorie des graphes. Rappelons qu’un graphe est un ensemble de sommets éventuellement reliés par des arêtes. Deux sommets sont adjacents s’ils sont reliés par une arête.<p>Tout d’abord, nous considérons les deux problèmes suivants :le problème de vertex cover et celui de dominating set, deux cas particuliers du problème de hitting set. Un vertex cover est un ensemble de sommets qui rencontrent toutes les arêtes alors qu’un dominating set est un ensemble X de sommets tel que chaque sommet n’appartenant pas à X est adjacent à un sommet de X. La version connexe de ces problèmes demande que les sommets choisis forment un sous-graphe connexe. Pour les deux problèmes précédents, nous examinons le prix de la connexité, défini comme étant le rapport entre la taille minimum d’un ensemble répondant à la version connexe du problème et celle d’un ensemble du problème originel. Nous prouvons la difficulté du calcul du prix de la connexité d’un graphe. Cependant, lorsqu’on exige que le prix de la connexité d’un graphe ainsi que de tous ses sous-graphes induits soit borné par une constante fixée, la situation change complètement. En effet, pour les problèmes de vertex cover et de dominating set, nous avons pu caractériser ces classes de graphes pour de petites constantes.<p>Ensuite, nous caractérisons en termes de dominating sets connexes les graphes Pk- free, graphes n’ayant pas de sous-graphes induits isomorphes à un chemin sur k sommets. Beaucoup de problèmes sur les graphes sont étudiés lorsqu’ils sont restreints à cette classe de graphes. De plus, nous appliquons cette caractérisation à la 2-coloration dans les hypergraphes. Pour certains hypergraphes, nous prouvons que ce problème peut être résolu en temps polynomial.<p>Finalement, nous travaillons sur le problème de Pk-hitting set. Un Pk-hitting set est un ensemble de sommets qui rencontrent tous les chemins sur k sommets. Nous développons un algorithme d’approximation avec un facteur de performance de 3. Notre algorithme, basé sur la méthode primal-dual, fournit un Pk-hitting set dont la taille est au plus 3 fois la taille minimum d’un Pk-hitting set. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
84

Sur quelques problèmes algorithmiques relatifs à la détermination de structure à partir de données de spectrométrie de masse / Topics in mass spectrometry based structure determination

Agarwal, Deepesh 18 May 2015 (has links)
La spectrométrie de masse, initialement développée pour de petites molécules, a permis au cours de la dernière écoulée d’étudier en phase gazeuse des assemblages macro-moléculaires intacts, posant nombre de questions algorithmiques difficiles, dont trois sont étudiées dans cette thèse. La première contribution concerne la détermination de stoichiométrie (SD), et vise à trouver le nombre de copies de chaque constituant dans un assemblage. On étudie le cas où la masse cible se trouve dans un intervalle dont les bornes rendent compte des incertitudes des mesures des masses. Nous présentons un algorithme de taille mémoire constante (DIOPHANTINE), et un algorithme de complexité sensible à la sortie (DP++), plus performants que l’état de l’art, pour des masses en nombre entier ou flottant. La seconde contribution traite de l’inférence de connectivité à partir d’une liste d’oligomères dont la composition en termes de sous-unités est connue. On introduit le problème d’inférence de connectivité minimale (MCI) et présente deux algorithmes pour le résoudre. On montre aussi un accord excellent entre les contacts trouvés et ceux détermines expérimentalement. La troisième contribution aborde le problème d’inférence de connectivité de poids minimal, lorsque chaque contact potentiel a un poids reflétant sa probabilité d’occurrence. On présente en particulier un algorithme de bootstrap permettant de trouver un ensemble d’arêtes de sensitivité et spécificité meilleures que celles obtenues pour les solutions du problème MCI. / Mass spectrometry (MS), an analytical technique initially invented to deal with small molecules, has emerged over the past decade as a key approach in structural biology. The recent advances have made it possible to transfer large macromolecular assemblies into the vacuum without their dissociation, raising challenging algorithmic problems. This thesis makes contributions to three such problems. The first contribution deals with stoichiometry determination (SD), namely the problem of determining the number of copies of each subunit of an assembly, from mass measurements. We deal with the interval SD problem, where the target mass belongs to an interval accounting for mass measurement uncertainties. We present a constant memory space algorithm (DIOPHANTINE), and an output sensitive dynamic programming based algorithm (DP++), outperforming state-of-the-art methods both for integer type and float type problems. The second contribution deals with the inference of pairwise contacts between subunits, using a list of sub-complexes whose composition is known. We introduce the Minimum Connectivity Inference problem (MCI) and present two algorithms solving it. We also show an excellent agreement between the contacts reported by these algorithms and those determined experimentally. The third contribution deals with Minimum Weight Connectivity Inference (MWCI), a problem where weights on candidate edges are available, reflecting their likelihood. We present in particular a bootstrap algorithm allowing one to report a set of edges with improved sensitivity and specificity with respect to those obtaining upon solving MCI.
85

Registrace fotografií do 3D modelu terénu / Registration of Photos to 3D Model

Deák, Jaromír January 2017 (has links)
This work refers existing solutions and options for the task registration of photos to 3D model based on the previous knowledge of the geographic position of the camera. The contribution of the work are new ways and possibilities of the solution with the usage of graph algorithms. In this area, the work interests are useful points of interest detection in input data, a construction of graphs and graph matching possibilities.
86

Modelování rizik v dopravě / Risk modelling in transportation

Lipovský, Tomáš January 2016 (has links)
This thesis deals with theoretical basics of risk modelling in transportation and optimization using aggregated traffic data. In this thesis is suggested the procedure and implemented the application solving network problem of shortest path between geographical points. The thesis includes method for special paths evaluation depending on the frequency of traffic incidents based on real historical data. The thesis also includes a~graphical interface for presentation of the achieved results.
87

Two Problems in Computational Genomics

Belal, Nahla Ahmed 22 March 2011 (has links)
This work addresses two novel problems in the field of computational genomics. The first is whole genome alignment and the second is inferring horizontal gene transfer using posets. We define these two problems and present algorithmic approaches for solving them. For the whole genome alignment, we define alignment graphs for representing different evolutionary events, and define a scoring function for those graphs. The problem defined is proven to be NP-complete. Two heuristics are presented to solve the problem, one is a dynamic programming approach that is optimal for a class of sequences that we define in this work as breakable arrangements. And, the other is a greedy approach that is not necessarily optimal, however, unlike the dynamic programming approach, it allows for reversals. For inferring horizontal gene transfer, we define partial order sets among species, with respect to different genes, and infer genes involved in horizontal gene transfer by comparing posets for different genes. The posets are used to construct a tree for each gene. Those trees are then compared and tested for contradiction, where contradictory trees correspond to genes that are candidates of horizontal gene transfer. / Ph. D.
88

Capacité opérative des réseaux de transfert de pétrole / Operative capacity of crude oil local transportation networks

Rojas d'Onofrio, Jorge 17 March 2011 (has links)
Cette thèse étudie des systèmes locaux de gestion de transfert de pétrole ayant une architecture de réseau de canalisation. Pour leur représentativité, deux systèmes localisés au Venezuela et appartenant à l'entreprise PDVSA (Pétroles du Venezuela) ont été retenus pour illustrer les méthodes proposées et les valider : le Terminal Maritime de Pétrole de Guaraguao et le Centre de Stockage de Punta de Palmas. Dans ces réseaux des connexions, appelées « alignements », sont établies en ouvrant/fermant des vannes à travers d'un système SCADA (Supervisory Control and Data Acquisition). Le choix d'un alignement doit tenir compte de critères d'optimisation. La minimisation des interférences avec d'autres alignements, liée à la notion de capacité opérative, a été identifiée comme le critère de choix le plus important. Les contributions de cette thèse reposent sur une modélisation sous forme de graphes, et sur des algorithmes appartenant au domaine de la recherche opérationnelle. Elles contribuent à fournir aux opérateurs de supervision des outils d'analyse permettant d'optimiser le choix des alignements. Des indicateurs permettant de quantifier l'impact des opérations d'alignement ou des défaillances, sur la capacité opérative du système, sont proposés. La minimisation de l'impact sur la capacité opérative, va correspondre à la minimisation des interférences avec des alignements potentiels. Un algorithme de calcul de ces indicateurs, est présenté, ainsi que des algorithmes de recherche de chemin, de détermination d'éléments critiques, et de recherche d'alignements utilisant des pompes. Ces algorithmes sont basés sur des algorithmes classiques s'adressant au problème du plus court chemin, du flot maximum et du nombre maximum de chemins disjoints. Cependant, ils utilisent des méthodes innovantes, comme l'ajout de contraintes considérant l'existence de sous-types d'alignements, le calcul dynamique des coûts des chemins à partir de son impact sur la capacité opérative, et la recherche de chemins via un point intermédiaire obligatoire. Les contributions sont potentiellement applicables dans des domaines autres que le transport de pétrole. Les algorithmes ont été mis en œuvre en utilisant le langage Python et ont été testés en utilisant les données réelles des réseaux étudiés. L'objectif à moyen terme de ces travaux est le développement d'un logiciel d'assistance à la prise de décision. / This thesis studies local crude oil transportation systems with a pipe network architecture. Two representative systems, belonging to PDVSA (Venezuelan oil company), have been studied: the Guaraguao Crude Oil Seaport and the Punta de Palmas Tanks Yard. In this systems, connections, called "alignments", are established by opening/closing valves using a SCADA(Supervisory Control and Data Acquisition) system. Alignment choice is made based on optimization criteria. Interferences minimization with other alignments, related to the notion of operative capacity, has been identified as the most important criterion. The contributions of this thesis are based on graph modelling and algorithms from operational research. The main goal is to provide analysis tools allowing alignment choice optimization. Indexes permitting the quantification of alignments or failures impact on the operative capacity of the system are proposed. Minimizing the impact on the operative capacity will correspond to minimizing interferences with potential alignments. An algorithm computing these indexes is presented, as well as complementary developments such as a path search algorithm, an algorithm for critical elements determination, and algorithm for alignments using pumps. These algorithms are based on classical algorithms for the shortest path problem, the maximum flow problem and the maximum disjoint paths problem. However, they use innovative methods such as adding constraints when considering alignment sub-types, the dynamic computation of path costs based on their impact on operative capacity, and path search considering an obligatory intermediate node. These contributions can potentially be applied in areas other than oil transportation. The algorithms had been implemented in Python and had been tested using real data from the studied systems. The middle term goal of these works is the development of assistance software for decision making.
89

A Modified Sum-Product Algorithm over Graphs with Short Cycles

Raveendran, Nithin January 2015 (has links) (PDF)
We investigate into the limitations of the sum-product algorithm for binary low density parity check (LDPC) codes having isolated short cycles. Independence assumption among messages passed, assumed reasonable in all configurations of graphs, fails the most in graphical structures with short cycles. This research work is a step forward towards understanding the effect of short cycles on error floors of the sum-product algorithm. We propose a modified sum-product algorithm by considering the statistical dependency of the messages passed in a cycle of length 4. We also formulate a modified algorithm in the log domain which eliminates the numerical instability and precision issues associated with the probability domain. Simulation results show a signal to noise ratio (SNR) improvement for the modified sum-product algorithm compared to the original algorithm. This suggests that dependency among messages improves the decisions and successfully mitigates the effects of length-4 cycles in the Tanner graph. The improvement is significant at high SNR region, suggesting a possible cause to the error floor effects on such graphs. Using density evolution techniques, we analysed the modified decoding algorithm. The threshold computed for the modified algorithm is higher than the threshold computed for the sum-product algorithm, validating the observed simulation results. We also prove that the conditional entropy of a codeword given the estimate obtained using the modified algorithm is lower compared to using the original sum-product algorithm.
90

Compilation of Graph Algorithms for Hybrid, Cross-Platform and Distributed Architectures

Patel, Parita January 2017 (has links) (PDF)
1. Main Contributions made by the supplicant: This thesis proposes an Open Computing Language (OpenCL) framework to address the challenges of implementation of graph algorithms on parallel architectures and large scale graph processing. The proposed framework uses the front-end of the existing Falcon DSL compiler, andso, programmers enjoy conventional, imperative and shared memory programming style. The back-end of the framework generates implementations of graph algorithms in OpenCL to target single device architectures. The generated OpenCL code is portable across various platforms, e.g., CPU and GPU, and also vendors, e.g., NVIDIA, Intel and AMD. The framework automatically generates code for thread management and memory management for the devices. It hides all the lower level programming details from the programmers. A few optimizations are applied to reduce the execution time. The large graph processing challenge is tackled through graph partitioning over multiple devices of a single node and multiple nodes of a distributed cluster. The programmer codes a graph algorithm in Falcon assuming that the graph fits into single machine memory and the framework handles graph partitioning without any intervention by the programmer. The framework analyses the Abstract Syntax Tree (AST) generated by Falcon to find all the necessary information about communication and synchronization. It automatically generates code for message passing to hide the complexity of programming in a distributed environment. The framework also applies a set of optimizations to minimize the communication latency. The thesis reports results of several experiments conducted on widely used graph algorithms: single source shortest path, pagerank and minimum spanning tree to name a few. Experimental evaluations show that the reported results are comparable to the state-of-art non-portable graph DSLs and frameworks on a single node. Experiments in a distributed environment to show the scalability and efficiency of the framework are also described. 2. Summary of the Referees' Written Comments: Extracts from the referees' reports are provided below. A copy of the written replies to the clarifications sought by the external examiner is appended to this report. Referee 1: This thesis extends the Falcon framework with OpenCL for parallel graph processing on multi-device and multi-node architectures. The thesis makes important contributions. Processing large graphs in short time is very important, and making use of multiple nodes and devices is perhaps the only way to achieve this. Towards this, the thesis makes good contributions for easy programming, compiler transformations and efficient runtime systems. One of the commendable aspects of the thesis that it demonstrates with graphs that cannot be accommodated In the memory of a single device. The thesis is generally written well. The related work coverage is very good. The magnitude of thesis excellent for a Masters work. The experimental setup is very comprehensive with good set of graphs, good experimental comparisons with state-of-art works and good platforms. Particularly. the demonstration with a GPU cluster with multiple GPU nodes (Chapter 5) is excellent. The attempt to demonstrate scalability with 2, 4 and 8 nodes is also noteworthy. However, the contributions on optimizations are weak. Most of the optimizations and compiler transformations are straight-forward. There should be summary observations on the results in Chapter 3, especially given that the results are mixed and don't quite clearly convey the clear advantages of their work. The same is the case with multi-device results in chapter 4, where the results are once again mixed. Similarly, the speedups and scalability achieved with multiple nodes are not great. The problem size justification in the multi-node results is not clear. (Referee 1 also indicates a couple of minor changes to the thesis). Referee 2: The thesis uses the OpenCL framework to address the problem of programming graph algorithms on distributed systems. The use of OpenCL ensures that the generated code is platform-agnoistic and vendor-agnoistic. Sufficient experimentation with large scale graphs and reasonable size clusters have been conducted to demonstrate the scalability and portability of the code generated by the framework. The automatically generated code is almost as efficient as manually written code. The thesis is well written and is of high quality. The related work section is well organized and displays a good knowledge of the subject matter under consideration. The author has made important contributions to a good publication as well. 3. An Account of the Open Oral Examination: The oral examination of Ms. Parita Patel took place during 10 AM and 11AM on 27th November 2017, in the Seminar Hall of the Department of Computer Science and Automation. The members of the Oral Examination Board present were, Prof. Sathish Vadhiyar, external examiner and Prof. Y. N. Srikant, research supervisor. The candidate presented the work in an open defense seminar highlighting the problem domain, the methodology used, the investigations carried out by her, and the resulting contributions documented in the thesis before an audience consisting of the examiners, some faculty members, and students. Some of the questions posed by the examiners and the members of the audience during the oral examination are listed below. 1. How much is the overlap between Falcon work and this thesis? Response: We have used the Falcon front end in our work. Further, the existing Falcon compiler was useful to us to test our own implementation of algorithms in Falcon. 2. Why are speedup and scalability not very high with multiple nodes? Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen. 3. Do you have plans of making the code available for use by the community? Response: The code includes some part of Falcon implementation (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community. 4. How can a graph that does not fit into a single device fit into a single node in the case of multiple nodes? Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning. 5. Is there a way to permit morph algorithms to be coded in your framework? Response: Currently, our framework does not translate morph algorithms. Supporting morph algorithms will require some kind of runtime system to manage memory on GPU since morph algorithms add and remove the vertices and edges to the graph dynamically. This can be further explored in future work. 6. Is it possible to accommodate FPGA devices in your framework? Response: Yes, we can support FPGA devices (or any other device that is compatible for OpenCL) just by specifying the device type in the command line argument. We did not work with other devices because CPU and GPU are generally used to process graph algorithms. The candidate provided satisfactory answers to all the questions posed and the clarifications sought by the audience and the examiners during the presentation. The candidate's overall performance during the open defense and the oral examination was very satisfactory to the oral examination board. 4. Certificate of Corrections and Changes: All the necessary corrections and changes suggested by the examiners have been made in the thesis and these have been verified by the members of the oral examination board. The thesis has been recommended for acceptance in its revised form. 5. Final Recommendation: In view of the recommendations of the referees and the satisfactory performance of the candidate in the oral examination, the oral examination board recommends that the thesis of Ms. ParitaPatel be accepted for the award of the M.Sc(Engg.) Degree of the Institute. Response to the comments by the external examiner on the M.Sc(Engg.) thesis “Compilation of Graph Algorithms for Hybrid, Cross-Platform, and Distributed Architectures” by Parita Patel 1. Comment: The contributions on optimizations are weak. Response: The novelty of this thesis is to make the Falcon platform agnostic, and additionally process large scale graphs on multi-devices of a single node and multi-node clusters seamlessly. Our framework performs similar to the existing frameworks, but at the same time, it targets several types of architectures which are not possible in the existing works. Advanced optimizations are beyond the scope of this thesis. 2. Comment: The translation of Falcon to OpenCL is simple. While the translation of Falcon to OpenCL was not hard, figuring out the details of the translation for multi-device and multi-node architectures was not simple. For example, design of implementations for collection, set, global variables, concurrency, etc., were non-trivial. These designs have already been explained in the appropriate places in the thesis. Further, such large software introduced its own intricacies during development. 3. Comment: Lines between Falcon work and this work are not clear. Response: Appendix-A shows the falcon implementation of all the algorithms which we used to run the experiments. We compiled these falcon implementations through our framework and subsequently ran the generated code on different types of target architectures and compared the results with other framework's generated code. These falcon programs were written by us. We have also used the front-end of the Falcon compiler and this has already been stated in the thesis (page 16). 4. Comment: There should be a summary of observations in chapter 3. Response: Summary of observations have been added to chapter 3 (pages 35-36), chapter 4 (page 46), and chapter 5 (page 51) of the thesis. 5. Comment: Speedup and scalability achieved with multiple nodes are not great. Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen. 6. Comment: It will be good to separate the related work coverage into a separate chapter. Response: The related work is coherent with the flow in chapter 1. It consists of just 4.5 pages and separating it into a separate chapter would make both (rest of) chapter 1 and the new chapter very small. Therefore, we do not recommend it. 7. Comment: The code should be made available for use by the community. Response: The code includes some part of Falcon code (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community. 8. Comment: Page 28: Shouldn’t the else part be inside the kernel? Response: There was some missing text and a few minor changes in Figure 3.14 (page 28) which have been incorporated in the corrected thesis. 9. Comment: Figure 4.1 needs to be explained better. Response: Explanation for Figure 4.1 (pages 38-39) has been added to the thesis. 10. Comment: The problem size justification in the multi-node results is not clear. Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning. Name of the Candidate: Parita Patel (S.R. No. 04-04-00-10-21-14-1-11610) Degree Registered: M.Sc(Engg.) Department: Computer Science & Automation Title of the Thesis: Compilation of Graph Algorithms for Hybrid, Cross-Platform and Graph algorithms are abundantly used in various disciplines. These algorithms perform poorly due to random memory access and negligible spatial locality. In order to improve performance, parallelism exhibited by these algorithms can be exploited by leveraging modern high performance parallel computing resources. Implementing graph algorithms for these parallel architectures requires manual thread management and memory management which becomes tedious for a programmer. Large scale graphs cannot fit into the memory of a single machine. One solution is to partition the graph either on multiple devices of a single node or on multiple nodes of a distributed network. All the available frameworks for such architectures demand unconventional programming which is difficult and error prone. To address these challenges, we propose a framework for compilation of graph algorithms written in an intuitive graph domain-specific language, Falcon. The framework targets shared memory parallel architectures, computational accelerators and distributed architectures (CPU and GPU cluster). First, it analyses the abstract syntax tree (generated by Falcon) and gathers essential information. Subsequently, it generates optimized code in OpenCL for shared-memory parallel architectures and computational accelerators, and OpenCL coupled with MPI code for distributed architectures. Motivation behind generating OpenCL code is its platform-agnostic and vendor-agnostic behavior, i.e., it is portable to all kinds of devices. Our framework makes memory management, thread management, message passing, etc., transparent to the user. None of the available domain-specific languages, frameworks or parallel libraries handle portable implementations of graph algorithms. Experimental evaluations demonstrate that the generated code performs comparably to the state-of-the-art non-portable implementations and hand-tuned implementations. The results also show portability and scalability of our framework.

Page generated in 0.0632 seconds