1 |
Parallelization of backward deleted distance calculation in graph based features using HadoopPillamari, Jayachandran January 1900 (has links)
Master of Science / Department of Computing & Information Sciences / Daniel Andresen / The current project presents an approach to parallelize the calculation of Backward Deleted Distance (BDD) in Graph Based Features (GBF) computation using Hadoop. In this project the issues concerned with the calculation of BDD are identified and parallel computing technologies like Hadoop are applied to solve them. The project introduces a new algorithm to parallelize the APSP problem in BDD calculation using Hadoop Map Reduce feature. The project is implemented in Java and Hadoop technologies. The aim of this project is to parallelize the calculation of BDD thereby reducing GBF computation time. The process of BDD calculation is examined to identify the key places where it could be parallelized. Since the BDD calculation involves calculating the shortest paths between all pairs of given users, it can viewed as All Pairs Shortest Path (APSP) problem. The internal structure and implementation of Hadoop Map-Reduce framework is studied and applied to the process of APSP problem. The GBF features are one of the features set used in the Ontology classifiers. In the current project, GBF features are used to predict the friendship relationship between the users whose direct link is deleted. The computation involves calculating BDD between all pairs of users. The BDD for a user pair represents the shortest path between them when their direct link is deleted. In real terms, it is the shortest distance between them other than the direct path. The project uses train and test data sets consisting of positive instances and negative instances. The positive instances consist of user pairs having a friendship link between them whereas the negative instances do not have any direct link between them. Apache Hadoop is a latest emerging technology in the market introduced for scalable, distributed computing across clusters of computers. It has a Map Reduce framework used for developing applications which process large amounts of data in parallel on large clusters.
The project is developed and implemented successfully and has the best time complexity. The project is tested for its reliability and performance. Different data sets are used in this testing by considering various factors and typical graph representations. The test results were analyzed to predict the behavior of the system. The test results show that the system has best speedup and considerably decreased the processing time from 10 hours to 20 minutes which is rewarding.
|
2 |
Implementações paralelas para os problemas do fecho transitivo e caminho mínimo APSP na GPU / Parallel implementations for transitive closure and minimum path APSP problems in GPUGaioso, Roussian Di Ramos Alves 08 August 2014 (has links)
Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2014-10-30T14:24:27Z
No. of bitstreams: 2
Dissertação - Roussian Di Ramos Alves Gaioso - 2014.pdf: 6127790 bytes, checksum: 9990f791c0f9abaee7e3e03e4cdc8ee4 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2014-10-30T14:29:29Z (GMT) No. of bitstreams: 2
Dissertação - Roussian Di Ramos Alves Gaioso - 2014.pdf: 6127790 bytes, checksum: 9990f791c0f9abaee7e3e03e4cdc8ee4 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2014-10-30T14:29:29Z (GMT). No. of bitstreams: 2
Dissertação - Roussian Di Ramos Alves Gaioso - 2014.pdf: 6127790 bytes, checksum: 9990f791c0f9abaee7e3e03e4cdc8ee4 (MD5)
license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Previous issue date: 2014-08-08 / Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPq / This paper presents a Graphics Processing Unit (GPU) based parallels implementations
for the All Pairs Shortest Paths and Transitive Closure problems in graph. The implementations
are based on the main sequential algorithms and takes full advantage of the
highly multithreaded architecture of current manycore GPUs. Our solutions reduces the
communication between CPU and GPU, improves the Streaming Multiprocessors (SMs)
utilization, and makes intensive use of coalesced memory access to optimize graph data
access. The advantages of the proposed implementations are demonstrated for several
graphs randomly generated using the widely known graph library GTgraph. Graphs containing
thousands of vertices and different edges densities, varying from sparse to complete
graphs, were generated and used in the experiments. Our results confirm that GPU
implementations can be competitive even for graph algorithms whose memory accesses
and work distribution are both irregular and data-dependent.
Keywords / Este trabalho apresenta implementações paralelas baseadas em Graphics Processing Unit
(GPU) para os problemas da identificação dos caminhos mínimos entre todos os pares
de vértices e do fecho transitivo em um grafo. As implementações são baseadas nos
principais algoritmos sequenciais e tiram o máximo proveito da arquitetura multithreaded
das GPUs atuais. Nossa solução reduz a comunicação entre a Central Processing Unit
(CPU) e a GPU, melhora a utilização dos Streaming Multiprocessors (SMs) e faz um
uso intensivo de acesso aglutinado em memória para otimizar o acesso de dados do
grafo. As vantagens dessas implementações propostas são demonstradas por vários grafos
gerados aleatoriamente utilizando a ferramenta GTgraph. Grafos contendo milhares de
vértices foram gerados e utilizados nos experimentos. Nossos resultados confirmam que
implementações baseadas em GPU podem ser viáveis mesmo para algoritmos de grafos
cujo acessos à memória e distribuição de trabalho são irregulares e causam dependência
de dados.
|
3 |
Combining Shortest Paths, Bottleneck Paths and Matrix MultiplicationShinn, Tong-Wook January 2014 (has links)
We provide a formal mathematical definition of the Shortest Paths for All Flows (SP-AF) problem and provide many efficient algorithms. The SP-AF problem combines the well known Shortest Paths (SP) and Bottleneck Paths (BP) problems, and can be solved by utilising matrix multiplication. Thus in our research of the SP-AF problem, we also make a series of contributions to the underlying topics of the SP problem, the BP problem, and matrix multiplication.
For the topic of matrix multiplication we show that on an n-by-n two dimensional (2D) square mesh array, two n-by-n matrices can be multiplied in exactly 1.5n ‒ 1 communication steps. This halves the number of communication steps required by the well known Cannon’s algorithm that runs
on the same sized mesh array.
We provide two contributions for the SP problem. Firstly, we enhance the breakthrough algorithm by Alon, Galil and Margalit (AGM), which was the first algorithm to achieve a deeply sub-cubic time bound for solving the All Pairs Shortest Paths (APSP) problem on dense directed graphs. Our enhancement allows the algorithm by AGM to remain sub-cubic for larger upper bounds on integer edge costs. Secondly, we show that for graphs with n vertices, the APSP problem can be solved in exactly 3n ‒ 2 communication steps on an n-by-n 2D square mesh array. This improves on the previous result of 3.5n communication steps achieved by Takaoka and Umehara.
For the BP problem, we show that we can compute the bottleneck of the entire graph without solving the All Pairs Bottleneck Paths (APBP) problem, resulting in a much more efficient time bound.
Finally we define an algebraic structure called the distance/flow semi-ring to formally introduce the SP-AF problem, and we provide many algorithms for solving the Single Source SP-AF (SSSP-AF) problem and the All Pairs SP-AF (APSP-AF) problem. For the APSP-AF problem, algebraic algorithms are given that utilise faster matrix multiplication over a ring.
|
4 |
Average case analysis of algorithms for the maximum subarray problemBashar, Mohammad Ehsanul January 2007 (has links)
Maximum Subarray Problem (MSP) is to find the consecutive array portion that maximizes the sum of array elements in it. The goal is to locate the most useful and informative array segment that associates two parameters involved in data in a 2D array. It's an efficient data mining method which gives us an accurate pattern or trend of data with respect to some associated parameters. Distance Matrix Multiplication (DMM) is at the core of MSP. Also DMM and MSP have the worst-case complexity of the same order. So if we improve the algorithm for DMM that would also trigger the improvement of MSP. The complexity of Conventional DMM is O(n³). In the average case, All Pairs Shortest Path (APSP) Problem can be modified as a fast engine for DMM and can be solved in O(n² log n) expected time. Using this result, MSP can be solved in O(n² log² n) expected time. MSP can be extended to K-MSP. To incorporate DMM into K-MSP, DMM needs to be extended to K-DMM as well. In this research we show how DMM can be extended to K-DMM using K-Tuple Approach to solve K-MSP in O(Kn² log² n log K) time complexity when K ≤ n/log n. We also present Tournament Approach which solves K-MSP in O(n² log² n + Kn²) time complexity and outperforms the K-Tuple
|
Page generated in 0.0273 seconds