Spelling suggestions: "subject:"1atrix reduction"" "subject:"béatrix reduction""
1 |
Efficient betweenness Centrality Computations on Hybrid CPU-GPU SystemsMishra, Ashirbad January 2016 (has links) (PDF)
Analysis of networks is quite interesting, because they can be interpreted for several purposes. Various features require different metrics to measure and interpret them. Measuring the relative importance of each vertex in a network is one of the most fundamental building blocks in network analysis. Between’s Centrality (BC) is one such metric that plays a key role in many real world applications. BC is an important graph analytics application for large-scale graphs. However it is one of the most computationally intensive kernels to execute, and measuring centrality in billion-scale graphs is quite challenging.
While there are several existing e orts towards parallelizing BC algorithms on multi-core CPUs and many-core GPUs, in this work, we propose a novel ne-grained CPU-GPU hybrid algorithm that partitions a graph into two partitions, one each for CPU and GPU. Our method performs BC computations for the graph on both the CPU and GPU resources simultaneously, resulting in a very small number of CPU-GPU synchronizations, hence taking less time for communications. The BC algorithm consists of two phases, the forward phase and the backward phase. In the forward phase, we initially and the paths that are needed by either partitions, after which each partition is executed on each processor in an asynchronous manner. We initially compute border matrices for each partition which stores the relative distances between each pair of border vertex in a partition. The matrices are used in the forward phase calculations of all the sources. In this way, our hybrid BC algorithm leverages the multi-source property inherent in the BC problem. We present proof of correctness and the bounds for the number of iterations for each source. We also perform a novel hybrid and asynchronous backward phase, in which each partition communicates with the other only when there is a path that crosses the partition, hence it performs minimal CPU-GPU synchronizations.
We use a variety of implementations for our work, like node-based and edge based parallelism, which includes data-driven and topology based techniques. In the implementation we show that our method also works using variable partitioning technique. The technique partitions the graph into unequal parts accounting for the processing power of each processor. Our implementations achieve almost equal percentage of utilization on both the processors due to the technique. For large scale graphs, the size of the border matrix also becomes large, hence to accommodate the matrix we present various techniques. The techniques use the properties inherent in the shortest path problem for reduction. We mention the drawbacks of performing shortest path computations on a large scale and also provide various solutions to it.
Evaluations using a large number of graphs with different characteristics show that our hybrid approach without variable partitioning and border matrix reduction gives 67% improvement in performance, and 64-98.5% less CPU-GPU communications than the state of art hybrid algorithm based on the popular Bulk Synchronous Paradigm (BSP) approach implemented in TOTEM. This shows our algorithm's strength which reduces the need for larger synchronizations. Implementing variable partitioning, border matrix reduction and backward phase optimizations on our hybrid algorithm provides up to 10x speedup. We compare our optimized implementation, with CPU and GPU standalone codes based on our forward phase and backward phase kernels, and show around 2-8x speedup over the CPU-only code and can accommodate large graphs that cannot be accommodated in the GPU-only code. We also show that our method`s performance is competitive to the state of art multi-core CPU and performs 40-52% better than GPU implementations, on large graphs. We show the drawbacks of CPU and GPU only implementations and try to motivate the reader about the challenges that graph algorithms face in large scale computing, suggesting that a hybrid or distributed way of approaching the problem is a better way of overcoming the hurdles.
|
2 |
Sur des méthodes préservant les structures d'une classe de matrices structurées / On structure-preserving methods of a class of structured matricesBen Kahla, Haithem 14 December 2017 (has links)
Les méthodes d'algèbres linéaire classiques, pour le calcul de valeurs et vecteurs propres d'une matrice, ou des approximations de rangs inférieurs (low-rank approximations) d'une solution, etc..., ne tiennent pas compte des structures de matrices. Ces dernières sont généralement détruites durant le procédé du calcul. Des méthodes alternatives préservant ces structures font l'objet d'un intérêt important par la communauté. Cette thèse constitue une contribution dans ce domaine. La décomposition SR peut être calculé via l'algorithme de Gram-Schmidt symplectique. Comme dans le cas classique, une perte d'orthogonalité peut se produire. Pour y remédier, nous avons proposé deux algorithmes RSGSi et RMSGSi qui consistent à ré-orthogonaliser deux fois les vecteurs à calculer. La perte de la J-orthogonalité s'est améliorée de manière très significative. L'étude directe de la propagation des erreurs d'arrondis dans les algorithmes de Gram-Schmidt symplectique est très difficile à effectuer. Nous avons réussi à contourner cette difficulté et donner des majorations pour la perte de la J-orthogonalité et de l'erreur de factorisation. Une autre façon de calculer la décomposition SR est basée sur les transformations de Householder symplectique. Un choix optimal a abouti à l'algorithme SROSH. Cependant, ce dernier peut être sujet à une instabilité numérique. Nous avons proposé une version modifiée nouvelle SRMSH, qui a l'avantage d'être aussi stable que possible. Une étude approfondie a été faite, présentant les différentes versions : SRMSH et SRMSH2. Dans le but de construire un algorithme SR, d'une complexité d'ordre O(n³) où 2n est la taille de la matrice, une réduction (appropriée) de la matrice à une forme condensée (J(Hessenberg forme) via des similarités adéquates, est cruciale. Cette réduction peut être effectuée via l'algorithme JHESS. Nous avons montré qu'il est possible de réduire une matrice sous la forme J-Hessenberg, en se basant exclusivement sur les transformations de Householder symplectiques. Le nouvel algorithme, appelé JHSJ, est basé sur une adaptation de l'algorithme SRSH. Nous avons réussi à proposer deux nouvelles variantes, aussi stables que possible : JHMSH et JHMSH2. Nous avons constaté que ces algorithmes se comportent d'une manière similaire à l'algorithme JHESS. Une caractéristique importante de tous ces algorithmes est qu'ils peuvent rencontrer un breakdown fatal ou un "near breakdown" rendant impossible la suite des calculs, ou débouchant sur une instabilité numérique, privant le résultat final de toute signification. Ce phénomène n'a pas d'équivalent dans le cas Euclidien. Nous avons réussi à élaborer une stratégie très efficace pour "guérir" le breakdown fatal et traîter le near breakdown. Les nouveaux algorithmes intégrant cette stratégie sont désignés par MJHESS, MJHSH, JHM²SH et JHM²SH2. Ces stratégies ont été ensuite intégrées dans la version implicite de l'algorithme SR lui permettant de surmonter les difficultés rencontrées lors du fatal breakdown ou du near breakdown. Rappelons que, sans ces stratégies, l'algorithme SR s'arrête. Finalement, et dans un autre cadre de matrices structurées, nous avons présenté un algorithme robuste via FFT et la matrice de Hankel, basé sur le calcul approché de plus grand diviseur commun (PGCD) de deux polynômes, pour résoudre le problème de la déconvolution d'images. Plus précisément, nous avons conçu un algorithme pour le calcul du PGCD de deux polynômes bivariés. La nouvelle approche est basée sur un algorithme rapide, de complexité quadratique O(n²), pour le calcul du PGCD des polynômes unidimensionnels. La complexité de notre algorithme est O(n²log(n)) où la taille des images floues est n x n. Les résultats expérimentaux avec des images synthétiquement floues illustrent l'efficacité de notre approche. / The classical linear algebra methods, for calculating eigenvalues and eigenvectors of a matrix, or lower-rank approximations of a solution, etc....do not consider the structures of matrices. Such structures are usually destroyed in the numerical process. Alternative structure-preserving methods are the subject of an important interest mattering to the community. This thesis establishes a contribution in this field. The SR decomposition is usually implemented via the symplectic Gram-Schmidt algorithm. As in the classical case, a loss of orthogonality can occur. To remedy this, we have proposed two algorithms RSGSi and RMSGSi, where the reorthogonalization of a current set of vectors against the previously computed set is performed twice. The loss of J-orthogonality has significantly improved. A direct rounding error analysis of symplectic Gram-Schmidt algorithm is very hard to accomplish. We managed to get around this difficulty and give the error bounds on the loss of the J-orthogonality and on the factorization. Another way to implement the SR decomposition is based on symplectic Householder transformations. An optimal choice of free parameters provided an optimal version of the algorithm SROSH. However, the latter may be subject to numerical instability. We have proposed a new modified version SRMSH, which has the advantage of being numerically more stable. By a detailes study, we are led to two new variants numerically more stables : SRMSH and SRMSH2. In order to build a SR algorithm of complexity O(n³), where 2n is the size of the matrix, a reduction to the condensed matrix form (upper J-Hessenberg form) via adequate similarities is crucial. This reduction may be handled via the algorithm JHESS. We have shown that it is possible to perform a reduction of a general matrix, to an upper J-Hessenberg form, based only on the use of symplectic Householder transformations. The new algorithm, which will be called JHSH algorithm, is based on an adaptation of SRSH algorithm. We are led to two news variants algorithms JHMSH and JHMSH2 which are significantly more stable numerically. We found that these algortihms behave quite similarly to JHESS algorithm. The main drawback of all these algorithms (JHESS, JHMSH, JHMSH2) is that they may encounter fatal breakdowns or may suffer from a severe form of near-breakdowns, causing a brutal stop of the computations, the algorithm breaks down, or leading to a serious numerical instability. This phenomenon has no equivalent in the Euclidean case. We sketch out a very efficient strategy for curing fatal breakdowns and treating near breakdowns. Thus, the new algorithms incorporating this modification will be referred to as MJHESS, MJHSH, JHM²SH and JHM²SH2. These strategies were then incorporated into the implicit version of the SR algorithm to overcome the difficulties encountered by the fatal breakdown or near-breakdown. We recall that without these strategies, the SR algorithms breaks. Finally ans in another framework of structured matrices, we presented a robust algorithm via FFT and a Hankel matrix, based on computing approximate greatest common divisors (GCD) of polynomials, for solving the problem pf blind image deconvolution. Specifically, we designe a specialized algorithm for computing the GCD of bivariate polynomials. The new algorithm is based on the fast GCD algorithm for univariate polynomials , of quadratic complexity O(n²) flops. The complexitiy of our algorithm is O(n²log(n)) where the size of blurred images is n x n. The experimental results with synthetically burred images are included to illustrate the effectiveness of our approach
|
Page generated in 0.0861 seconds