71 |
Comportamento de células pulpares humanas expostas ao TGFβ1 a ao aFGF em culturaLuisi, Simone Bonato January 2006 (has links)
O propósito do presente estudo foi avaliar o comportamento de células pulpares humanas expostas ao TGFβ1 e ao aFGF, em cultura, nas seguintes concentrações: TGFβ1 a 1ng/mL, TGFβ1 a 5ng/mL, TGFβ1 a 1ng/mL + aFGF a 5ng/mL, TGFβ1 a 5ng/mL + aFGF a 5ng/mL e aFGF a 5ng/mL. Foi avaliada a morfologia celular, a atividade da fosfatase alcalina, através de ensaio com pNPP como substrato e a expressão das proteínas osteocalcina, sialoproteína óssea e sialofosfoproteína de dentina, através de RT-PCR. Após quatro dias, verificou-se que a média do número de nucléolos no grupo tratado com TGFβ1 a 1ng/mL foi significativamente maior que no grupo tratado com aFGF a 5ng/mL. A média da atividade da fosfatase alcalina no grupo tratado com TGFβ1 a 1ng/mL foi significativamente maior que no grupo tratado com TGFβ1 a 5ng/mL + aFGF a 5ng/mL. Foi observada a expressão de osteocalcina em todas as células pulpares humanas que proliferaram em cultura. Entretanto, no grupo em que foi utilizado o aFGF a 5ng/mL houve diminuição da expressão da osteocalcina. A exposição dos fatores não induziu a expressão de componentes da matriz de dentina tais como BSP e DSPP. Sugere-se que as células expostas ao TGFβ1 1ng/mL foram estimuladas, apresentando uma maior atividade celular e as células expostas ao aFGF 5ng/mL foram inibidas, apresentando uma menor atividade celular. / The aim of the present work was to evaluate the behavior of human dental pulp cells exposed to TGFβ1 and aFGF in culture, at the following concentrations: TGFβ1 1ng/mL, TGFβ1 5ng/mL, TGFβ1 1ng/mL + aFGF 5ng/mL, TGFβ1 5ng/mL + aFGF 5ng/mL e aFGF 5ng/mL. We assessed the cellular morphology, alkaline phosphatase activity, using pNPP as substrate, and expression of osteocalcin, bone sialoprotein, dentin sialophosphoprotein proteins by RT-PCR. After four days, the nucleolus media in the group treated with TGFβ1 1ng/mL was significantly higher than the group treated with aFGF 5ng/mL The alkaline phosphatase activity in the TGFβ1 1ng/mL treated group was significantly higher than the media observed in TGFβ1 5ng/mL + aFGF 5ng/m treated group. Osteocalcin expression was observed in all human dental pulp cell cultures. However, in the aFGF 5ng/mL treated group the osteocalcin expression decreased. The exposure to growth factors did not induced the expression of dentin matrix components such as BSP or DSPP. Our data suggest that the cells exposed to TGFβ1 1ng/mL were stimulated and had a higher cell activity, and that cells exposed to aFGF 5ng/mL were inhibited having a cell activity decrease.
|
72 |
Comportamento de células pulpares humanas expostas ao TGFβ1 a ao aFGF em culturaLuisi, Simone Bonato January 2006 (has links)
O propósito do presente estudo foi avaliar o comportamento de células pulpares humanas expostas ao TGFβ1 e ao aFGF, em cultura, nas seguintes concentrações: TGFβ1 a 1ng/mL, TGFβ1 a 5ng/mL, TGFβ1 a 1ng/mL + aFGF a 5ng/mL, TGFβ1 a 5ng/mL + aFGF a 5ng/mL e aFGF a 5ng/mL. Foi avaliada a morfologia celular, a atividade da fosfatase alcalina, através de ensaio com pNPP como substrato e a expressão das proteínas osteocalcina, sialoproteína óssea e sialofosfoproteína de dentina, através de RT-PCR. Após quatro dias, verificou-se que a média do número de nucléolos no grupo tratado com TGFβ1 a 1ng/mL foi significativamente maior que no grupo tratado com aFGF a 5ng/mL. A média da atividade da fosfatase alcalina no grupo tratado com TGFβ1 a 1ng/mL foi significativamente maior que no grupo tratado com TGFβ1 a 5ng/mL + aFGF a 5ng/mL. Foi observada a expressão de osteocalcina em todas as células pulpares humanas que proliferaram em cultura. Entretanto, no grupo em que foi utilizado o aFGF a 5ng/mL houve diminuição da expressão da osteocalcina. A exposição dos fatores não induziu a expressão de componentes da matriz de dentina tais como BSP e DSPP. Sugere-se que as células expostas ao TGFβ1 1ng/mL foram estimuladas, apresentando uma maior atividade celular e as células expostas ao aFGF 5ng/mL foram inibidas, apresentando uma menor atividade celular. / The aim of the present work was to evaluate the behavior of human dental pulp cells exposed to TGFβ1 and aFGF in culture, at the following concentrations: TGFβ1 1ng/mL, TGFβ1 5ng/mL, TGFβ1 1ng/mL + aFGF 5ng/mL, TGFβ1 5ng/mL + aFGF 5ng/mL e aFGF 5ng/mL. We assessed the cellular morphology, alkaline phosphatase activity, using pNPP as substrate, and expression of osteocalcin, bone sialoprotein, dentin sialophosphoprotein proteins by RT-PCR. After four days, the nucleolus media in the group treated with TGFβ1 1ng/mL was significantly higher than the group treated with aFGF 5ng/mL The alkaline phosphatase activity in the TGFβ1 1ng/mL treated group was significantly higher than the media observed in TGFβ1 5ng/mL + aFGF 5ng/m treated group. Osteocalcin expression was observed in all human dental pulp cell cultures. However, in the aFGF 5ng/mL treated group the osteocalcin expression decreased. The exposure to growth factors did not induced the expression of dentin matrix components such as BSP or DSPP. Our data suggest that the cells exposed to TGFβ1 1ng/mL were stimulated and had a higher cell activity, and that cells exposed to aFGF 5ng/mL were inhibited having a cell activity decrease.
|
73 |
Compilation of Graph Algorithms for Hybrid, Cross-Platform and Distributed ArchitecturesPatel, Parita January 2017 (has links) (PDF)
1. Main Contributions made by the supplicant:
This thesis proposes an Open Computing Language (OpenCL) framework to address the challenges of implementation of graph algorithms on parallel architectures and large scale graph processing. The proposed framework uses the front-end of the existing Falcon DSL compiler, andso, programmers enjoy conventional, imperative and shared memory programming style. The back-end of the framework generates implementations of graph algorithms in OpenCL to target single device architectures. The generated OpenCL code is portable across various platforms, e.g., CPU and GPU, and also vendors, e.g., NVIDIA, Intel and AMD. The framework automatically generates code for thread management and memory management for the devices. It hides all the lower level programming details from the programmers. A few optimizations are applied to reduce the execution time.
The large graph processing challenge is tackled through graph partitioning over multiple devices of a single node and multiple nodes of a distributed cluster. The programmer codes a graph algorithm in Falcon assuming that the graph fits into single machine memory and the framework handles graph partitioning without any intervention by the programmer. The framework analyses the Abstract Syntax Tree (AST) generated by Falcon to find all the necessary information about communication and synchronization. It automatically generates code for message passing to hide the complexity of programming in a distributed environment. The framework also applies a set of optimizations to minimize the communication latency. The thesis reports results of several experiments conducted on widely used graph algorithms: single source shortest path, pagerank and minimum spanning tree to name a few. Experimental evaluations show that the reported results are comparable to the state-of-art non-portable graph DSLs and frameworks on a single node. Experiments in a distributed environment to show the scalability and efficiency of the framework are also described.
2. Summary of the Referees' Written Comments:
Extracts from the referees' reports are provided below. A copy of the written replies to the clarifications sought by the external examiner is appended to this report.
Referee 1: This thesis extends the Falcon framework with OpenCL for parallel graph processing on multi-device and multi-node architectures. The thesis makes important contributions. Processing large graphs in short time is very important, and making use of multiple nodes and devices is perhaps the only way to achieve this. Towards this, the thesis makes good contributions for easy programming, compiler transformations and efficient runtime systems. One of the commendable aspects of the thesis that it demonstrates with graphs that cannot be accommodated In the memory of a single device. The thesis is generally written well. The related work coverage is very good. The magnitude of thesis excellent for a Masters work. The experimental setup is very comprehensive with good set of graphs, good experimental comparisons with state-of-art works and good platforms. Particularly. the demonstration with a GPU cluster with multiple GPU nodes (Chapter 5) is excellent. The attempt to demonstrate scalability with 2, 4 and 8 nodes is also noteworthy.
However, the contributions on optimizations are weak. Most of the optimizations and compiler transformations are straight-forward. There should be summary observations on the results in Chapter 3, especially given that the results are mixed and don't quite clearly convey the clear advantages of their work. The same is the case with multi-device results in chapter 4, where the results are once again mixed. Similarly, the speedups and scalability achieved with multiple nodes are not great. The problem size justification in the multi-node results is not clear. (Referee 1 also indicates a couple of minor changes to the thesis).
Referee 2: The thesis uses the OpenCL framework to address the problem of programming graph algorithms on distributed systems. The use of OpenCL ensures that the generated code is platform-agnoistic and vendor-agnoistic. Sufficient experimentation with large scale graphs and reasonable size clusters have been conducted to demonstrate the scalability and portability of the code generated by the framework. The automatically generated code is almost as efficient as manually written code. The thesis is well written and is of high quality. The related work section is well organized and displays a good knowledge of the subject matter under consideration. The author has made important contributions to a good publication as well.
3. An Account of the Open Oral Examination:
The oral examination of Ms. Parita Patel took place during 10 AM and 11AM on 27th November 2017, in the Seminar Hall of the Department of Computer Science and Automation. The members of the Oral Examination Board present were, Prof. Sathish Vadhiyar, external examiner and Prof. Y. N. Srikant, research supervisor.
The candidate presented the work in an open defense seminar highlighting the problem domain, the methodology used, the investigations carried out by her, and the resulting contributions documented in the thesis before an audience consisting of the examiners, some faculty members, and students. Some of the questions posed by the examiners and the members of the audience during the oral examination are listed below.
1. How much is the overlap between Falcon work and this thesis?
Response: We have used the Falcon front end in our work. Further, the existing Falcon compiler was useful to us to test our own implementation of algorithms in Falcon.
2. Why are speedup and scalability not very high with multiple nodes?
Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen. 3. Do you have plans of making the code available for use by the community?
Response: The code includes some part of Falcon implementation (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community.
4. How can a graph that does not fit into a single device fit into a single node in the case of multiple nodes?
Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning.
5. Is there a way to permit morph algorithms to be coded in your framework?
Response: Currently, our framework does not translate morph algorithms. Supporting morph algorithms will require some kind of runtime system to manage memory on GPU since morph algorithms add and remove the vertices and edges to the graph dynamically. This can be further explored in future work.
6. Is it possible to accommodate FPGA devices in your framework?
Response: Yes, we can support FPGA devices (or any other device that is compatible for OpenCL) just by specifying the device type in the command line argument. We did not work with other devices because CPU and GPU are generally used to process graph algorithms.
The candidate provided satisfactory answers to all the questions posed and the clarifications sought by the audience and the examiners during the presentation. The candidate's overall performance during the open defense and the oral examination was very satisfactory to the oral examination board.
4. Certificate of Corrections and Changes: All the necessary corrections and changes suggested by the examiners have been made in the thesis and these have been verified by the members of the oral examination board. The thesis has been recommended for acceptance in its revised form.
5. Final Recommendation:
In view of the recommendations of the referees and the satisfactory performance of the candidate in the oral examination, the oral examination board recommends that the thesis of Ms. ParitaPatel be accepted for the award of the M.Sc(Engg.) Degree of the Institute.
Response to the comments by the external examiner on the M.Sc(Engg.) thesis “Compilation of Graph Algorithms for Hybrid, Cross-Platform, and Distributed Architectures” by Parita Patel
1. Comment: The contributions on optimizations are weak.
Response: The novelty of this thesis is to make the Falcon platform agnostic, and additionally process large scale graphs on multi-devices of a single node and multi-node clusters seamlessly. Our framework performs similar to the existing frameworks, but at the same time, it targets several types of architectures which are not possible in the existing works. Advanced optimizations are beyond the scope of this thesis.
2. Comment: The translation of Falcon to OpenCL is simple.
While the translation of Falcon to OpenCL was not hard, figuring out the details of the translation for multi-device and multi-node architectures was not simple. For example, design of implementations for collection, set, global variables, concurrency, etc., were non-trivial. These designs have already been explained in the appropriate places in the thesis. Further, such large software introduced its own intricacies during development.
3. Comment: Lines between Falcon work and this work are not clear.
Response: Appendix-A shows the falcon implementation of all the algorithms which we used to run the experiments. We compiled these falcon implementations through our framework and subsequently ran the generated code on different types of target architectures and compared the results with other framework's generated code. These falcon programs were written by us. We have also used the front-end of the Falcon compiler and this has already been stated in the thesis (page 16).
4. Comment: There should be a summary of observations in chapter 3.
Response: Summary of observations have been added to chapter 3 (pages 35-36), chapter 4 (page 46), and chapter 5 (page 51) of the thesis.
5. Comment: Speedup and scalability achieved with multiple nodes are not great.
Response: For the multi-node architecture, we were not able to achieve linear scalability because, with the increase in number of nodes, communication cost increases significantly. Unless the computation cost in the nodes is significant and is much more than the communication cost, this is bound to happen.
6. Comment: It will be good to separate the related work coverage into a separate chapter.
Response: The related work is coherent with the flow in chapter 1. It consists of just 4.5 pages and separating it into a separate chapter would make both (rest of) chapter 1 and the new chapter very small. Therefore, we do not recommend it.
7. Comment: The code should be made available for use by the community.
Response: The code includes some part of Falcon code (front-end parsing/grammar) also. After discussion with the author of Falcon, the code can be made available to the community.
8. Comment: Page 28: Shouldn’t the else part be inside the kernel?
Response: There was some missing text and a few minor changes in Figure 3.14 (page 28) which have been incorporated in the corrected thesis.
9. Comment: Figure 4.1 needs to be explained better.
Response: Explanation for Figure 4.1 (pages 38-39) has been added to the thesis.
10. Comment: The problem size justification in the multi-node results is not clear.
Response: Single node machine used in the experiments of “multi-device architecture” contains multiple devices while each node used in experiments of “multi-node architecture” contains only a single device. So, the graph which does not fit into single-node-single-device memory can fit into single-node-multi-device after partitioning.
Name of the Candidate: Parita Patel (S.R. No. 04-04-00-10-21-14-1-11610)
Degree Registered: M.Sc(Engg.)
Department: Computer Science & Automation
Title of the Thesis: Compilation of Graph Algorithms for Hybrid, Cross-Platform and
Graph algorithms are abundantly used in various disciplines. These algorithms perform poorly
due to random memory access and negligible spatial locality. In order to improve performance, parallelism exhibited by these algorithms can be exploited by leveraging modern high performance parallel computing resources. Implementing graph algorithms for these parallel architectures requires manual thread management and memory management which becomes tedious for a programmer.
Large scale graphs cannot fit into the memory of a single machine. One solution is to partition the graph either on multiple devices of a single node or on multiple nodes of a distributed network. All the available frameworks for such architectures demand unconventional programming which is difficult and error prone.
To address these challenges, we propose a framework for compilation of graph algorithms written in an intuitive graph domain-specific language, Falcon. The framework targets shared memory parallel architectures, computational accelerators and distributed architectures (CPU and GPU cluster). First, it analyses the abstract syntax tree (generated by Falcon) and gathers essential information. Subsequently, it generates optimized code in OpenCL for shared-memory parallel architectures and computational accelerators, and OpenCL coupled with MPI code for distributed architectures. Motivation behind generating OpenCL code is its platform-agnostic and vendor-agnostic behavior, i.e., it is portable to all kinds of devices. Our framework makes memory management, thread management, message passing, etc., transparent to the user. None of the available domain-specific languages, frameworks or parallel libraries handle portable implementations of graph algorithms.
Experimental evaluations demonstrate that the generated code performs comparably to the state-of-the-art non-portable implementations and hand-tuned implementations. The results also show portability and scalability of our framework.
|
74 |
Comportamento de células pulpares humanas expostas ao TGFβ1 a ao aFGF em culturaLuisi, Simone Bonato January 2006 (has links)
O propósito do presente estudo foi avaliar o comportamento de células pulpares humanas expostas ao TGFβ1 e ao aFGF, em cultura, nas seguintes concentrações: TGFβ1 a 1ng/mL, TGFβ1 a 5ng/mL, TGFβ1 a 1ng/mL + aFGF a 5ng/mL, TGFβ1 a 5ng/mL + aFGF a 5ng/mL e aFGF a 5ng/mL. Foi avaliada a morfologia celular, a atividade da fosfatase alcalina, através de ensaio com pNPP como substrato e a expressão das proteínas osteocalcina, sialoproteína óssea e sialofosfoproteína de dentina, através de RT-PCR. Após quatro dias, verificou-se que a média do número de nucléolos no grupo tratado com TGFβ1 a 1ng/mL foi significativamente maior que no grupo tratado com aFGF a 5ng/mL. A média da atividade da fosfatase alcalina no grupo tratado com TGFβ1 a 1ng/mL foi significativamente maior que no grupo tratado com TGFβ1 a 5ng/mL + aFGF a 5ng/mL. Foi observada a expressão de osteocalcina em todas as células pulpares humanas que proliferaram em cultura. Entretanto, no grupo em que foi utilizado o aFGF a 5ng/mL houve diminuição da expressão da osteocalcina. A exposição dos fatores não induziu a expressão de componentes da matriz de dentina tais como BSP e DSPP. Sugere-se que as células expostas ao TGFβ1 1ng/mL foram estimuladas, apresentando uma maior atividade celular e as células expostas ao aFGF 5ng/mL foram inibidas, apresentando uma menor atividade celular. / The aim of the present work was to evaluate the behavior of human dental pulp cells exposed to TGFβ1 and aFGF in culture, at the following concentrations: TGFβ1 1ng/mL, TGFβ1 5ng/mL, TGFβ1 1ng/mL + aFGF 5ng/mL, TGFβ1 5ng/mL + aFGF 5ng/mL e aFGF 5ng/mL. We assessed the cellular morphology, alkaline phosphatase activity, using pNPP as substrate, and expression of osteocalcin, bone sialoprotein, dentin sialophosphoprotein proteins by RT-PCR. After four days, the nucleolus media in the group treated with TGFβ1 1ng/mL was significantly higher than the group treated with aFGF 5ng/mL The alkaline phosphatase activity in the TGFβ1 1ng/mL treated group was significantly higher than the media observed in TGFβ1 5ng/mL + aFGF 5ng/m treated group. Osteocalcin expression was observed in all human dental pulp cell cultures. However, in the aFGF 5ng/mL treated group the osteocalcin expression decreased. The exposure to growth factors did not induced the expression of dentin matrix components such as BSP or DSPP. Our data suggest that the cells exposed to TGFβ1 1ng/mL were stimulated and had a higher cell activity, and that cells exposed to aFGF 5ng/mL were inhibited having a cell activity decrease.
|
75 |
A novel purification method for binder of SPerm proteins and characterization of the protein interaction network of BSPH1Sabouhi Zarafshan, Samin 08 1900 (has links)
Les protéines Binder of Sperm (BSP) appartiennent à une superfamille de protéines exprimées dans le système reproducteur masculin, plus particulièrement dans les vésicules séminales chez les ongulés, et dans l’épididyme chez l’humain et la souris. Jusqu'à présent, des rôles variés chez différentes espèces ont été démontrés pour les protéines BSP, tels que dans la motilité et la capacitation chez le bovin. Cependant, leur rôle demeure élusif chez d’autres mammifères comme la souris et l’humain. Des études in vivo récentes ont démontré que la délétion des gènes Bsph1 et Bsph2 chez la souris n’a aucune conséquence sur la fertilité, et n’induit aucune anomalie au niveau de l'appareil reproducteur masculin. Afin d'élucider le rôle spécifique de la protéine BSP chez l'humain (BSPH1), nous avons d’abord développé une méthode de purification efficace permettant d’obtenir la protéine BSPH1 fonctionnelle car ces protéines ne sont présentes qu'en quantité infime dans l’épididyme humain. Suite, a la purification de BSPH1, j’ai réalisé des expériences in vitro et cherché à identifier son réseau d'interaction protéique. Il a été démontré que les protéines BSP interagissent avec des groupes pseudo-choline tels que le diéthylaminométhyle par affinité plutôt que par des interactions ioniques. Le diéthylaminoéthyle est chargé positivement et par conséquence, est un échangeur d'anions faible, mais les BSP interagissent avec affinité à la résine. Cette étude présente également une nouvelle méthode de purification rapide et peu coûteuse, qui fournit des protéines BSP recombinantes de grande pureté qui peuvent être utilisées pour étudier leurs rôles dans la fécondation chez les mammifères. Nous avons montré que la pré-incubation des ovocytes avec la protéine BSPH1 recombinante peut diminuer le taux de fécondation de manière dose-dépendante. Les spermatozoïdes ont également été pré-incubés avec un anticorps anti-BSPH1 et ont montré une diminution du taux de fécondation. Pour identifier le réseau d’interaction protéique de BSPH1, j'ai utilisé la méthode « Proximity-dependent biotin identification » (BioID) couplée à la spectrométrie de masse. Les résultats de la spectrométrie de masse ont démontré une interaction entre BSPH1 et toutes les sous-unités du complexe CCT / TRIC (Chaperonin containing tailless complex polypeptide 1 (CCT) ou tailless complex polypeptide 1 ring complex (TRiC)). Ce complexe interagit avec un autre complexe appelé BBSome (Bardet–Bied syndrome complex), qui joue un rôle important dans le transport de protéines à travers les cils primaires. BSPH1 a également interagi avec un grand nombre de protéines de la famille CEP (centrosome-associated proteins), importantes dans la formation des cils primaires par les microtubules et de la maturation du centrosome, qui soutiennent le rôle de BSPH1 dans les cils primaires.
Dans l’ensemble, cette étude démontre que BSPH1 pourrait avoir un nouveau rôle en tant que chaperonne, à travers les cils primaires dans les cellules qui l’expriment dans l’appareil reproducteur masculin. / Binder of SPerm (BSP) proteins belong to a superfamily of proteins expressed in the male reproductive tract, particularly in seminal vesicles of ungulates (e.g., bovine, ram) and in the epididymis of humans and mice. So far, BSP proteins have been shown to play different roles in different species such as in motility and capacitation in bovine; however, their role remains unclear in other mammals. For instance, depletion of Bsph1/Bsph2 in mice had no effect on fertility. In order to elucidate the specific role of BSP protein in humans (BSPH1), I sought to investigate a purification method to produce functional human BSP protein, as these proteins are only present in minute amounts in the human epididymis. Following purification of BSPH1, I carried out in vitro experiments and sought to identify its protein interaction network. BSP proteins have been shown to interact with pseudo-choline groups such as diethylaminomethyl through affinity rather than ionic interactions. Diethylaminoethyl is positively charged and therefore is a weak anion exchanger, but BSPs interact through affinity to this resin. This study presents a new, rapid and cost-effective purification method that provides recombinant BSP proteins of a high purity level, which can be used to study their roles in mammalian fertilization. We showed that pre-incubation of oocytes with recombinant BSPH1 can decrease fertilization rate in a dose-dependant manner. Sperm were also preincubated with anti-BSPH1 antibody and showed a decrease in fertilization rate. Secondly, I used BioID (proximity-dependent biotin identification), coupled with mass spectrometry to identify the protein-protein interaction network of BSPH1 by proximity labeling. Mass spectrometry results showed an interaction between BSPH1 and all subunits of the CCT/TRIC complex (Chaperonin containing tailless complex polypeptide 1 (CCT) or tailless complex polypeptide 1 ring complex (TRiC). This complex interacts with another complex called BBSome (Bardet–Biedl syndrome complex), which plays a role in protein trafficking through primary cilium. I also identified BBS proteins, as well as other proteins, that interact with the BBSome complex and regulate protein trafficking in the cilia.
BSPH1 also interacted with a large number of CEP (centrosome-associated proteins) family proteins, important in the formation of primary cilium through microtubules and centrosome maturation, which further support the potential implication of BSPH1 with the primary cilia. Overall, this study demonstrates that BSPH1 may have a new role as a chaperone involved in protein trafficking through the primary cilia in cells that express it in the male reproductive system
|
76 |
Efficient betweenness Centrality Computations on Hybrid CPU-GPU SystemsMishra, Ashirbad January 2016 (has links) (PDF)
Analysis of networks is quite interesting, because they can be interpreted for several purposes. Various features require different metrics to measure and interpret them. Measuring the relative importance of each vertex in a network is one of the most fundamental building blocks in network analysis. Between’s Centrality (BC) is one such metric that plays a key role in many real world applications. BC is an important graph analytics application for large-scale graphs. However it is one of the most computationally intensive kernels to execute, and measuring centrality in billion-scale graphs is quite challenging.
While there are several existing e orts towards parallelizing BC algorithms on multi-core CPUs and many-core GPUs, in this work, we propose a novel ne-grained CPU-GPU hybrid algorithm that partitions a graph into two partitions, one each for CPU and GPU. Our method performs BC computations for the graph on both the CPU and GPU resources simultaneously, resulting in a very small number of CPU-GPU synchronizations, hence taking less time for communications. The BC algorithm consists of two phases, the forward phase and the backward phase. In the forward phase, we initially and the paths that are needed by either partitions, after which each partition is executed on each processor in an asynchronous manner. We initially compute border matrices for each partition which stores the relative distances between each pair of border vertex in a partition. The matrices are used in the forward phase calculations of all the sources. In this way, our hybrid BC algorithm leverages the multi-source property inherent in the BC problem. We present proof of correctness and the bounds for the number of iterations for each source. We also perform a novel hybrid and asynchronous backward phase, in which each partition communicates with the other only when there is a path that crosses the partition, hence it performs minimal CPU-GPU synchronizations.
We use a variety of implementations for our work, like node-based and edge based parallelism, which includes data-driven and topology based techniques. In the implementation we show that our method also works using variable partitioning technique. The technique partitions the graph into unequal parts accounting for the processing power of each processor. Our implementations achieve almost equal percentage of utilization on both the processors due to the technique. For large scale graphs, the size of the border matrix also becomes large, hence to accommodate the matrix we present various techniques. The techniques use the properties inherent in the shortest path problem for reduction. We mention the drawbacks of performing shortest path computations on a large scale and also provide various solutions to it.
Evaluations using a large number of graphs with different characteristics show that our hybrid approach without variable partitioning and border matrix reduction gives 67% improvement in performance, and 64-98.5% less CPU-GPU communications than the state of art hybrid algorithm based on the popular Bulk Synchronous Paradigm (BSP) approach implemented in TOTEM. This shows our algorithm's strength which reduces the need for larger synchronizations. Implementing variable partitioning, border matrix reduction and backward phase optimizations on our hybrid algorithm provides up to 10x speedup. We compare our optimized implementation, with CPU and GPU standalone codes based on our forward phase and backward phase kernels, and show around 2-8x speedup over the CPU-only code and can accommodate large graphs that cannot be accommodated in the GPU-only code. We also show that our method`s performance is competitive to the state of art multi-core CPU and performs 40-52% better than GPU implementations, on large graphs. We show the drawbacks of CPU and GPU only implementations and try to motivate the reader about the challenges that graph algorithms face in large scale computing, suggesting that a hybrid or distributed way of approaching the problem is a better way of overcoming the hurdles.
|
Page generated in 0.0629 seconds