Spelling suggestions: "subject:"graph algorithms"" "subject:"graph a.lgorithms""
81 |
Algorithm design on multicore processors for massive-data analysisAgarwal, Virat 28 June 2010 (has links)
Analyzing massive-data sets and streams is computationally very challenging. Data sets in
systems biology, network analysis and security use network abstraction to construct large-scale
graphs. Graph algorithms such as traversal and search are memory-intensive and typically require
very little computation, with access patterns that are irregular and fine-grained. The increasing
streaming data rates in various domains such as security, mining, and finance leaves algorithm
designers with only a handful of clock cycles (with current general purpose computing technology)
to process every incoming byte of data in-core at real-time. This along with increasing complexity of
mining patterns and other analytics puts further pressure on already high computational requirement.
Processing streaming data in finance comes with an additional constraint to process at low latency,
that restricts the algorithm to use common techniques such as batching to obtain high throughput.
The primary contributions of this dissertation are the design of novel parallel data analysis algorithms
for graph traversal on large-scale graphs, pattern recognition and keyword scanning on massive
streaming data, financial market data feed processing and analytics, and data transformation,
that capture the machine-independent aspects, to guarantee portability with performance to future
processors, with high performance implementations on multicore processors that embed processorspecific
optimizations. Our breadth first search graph traversal algorithm demonstrates a capability
to process massive graphs with billions of vertices and edges on commodity multicore processors
at rates that are competitive with supercomputing results in the recent literature. We also present
high performance scalable keyword scanning on streaming data using novel automata compression
algorithm, a model of computation based on small software content addressable memories (CAMs)
and a unique data layout that forces data re-use and minimizes memory traffic. Using a high-level
algorithmic approach to process financial feeds we present a solution that decodes and normalizes
option market data at rates an order of magnitude more than the current needs of the market, yet
portable and flexible to other feeds in this domain. In this dissertation we discuss in detail algorithm
design challenges to process massive-data and present solutions and techniques that we believe can
be used and extended to solve future research problems in this domain.
|
82 |
Falcon : A Graph Manipulation Language for Distributed Heterogeneous SystemsCheramangalath, Unnikrishnan January 2017 (has links) (PDF)
Graphs model relationships across real-world entities in web graphs, social network graphs, and road network graphs. Graph algorithms analyze and transform a graph to discover graph properties or to apply a computation. For instance, a pagerank algorithm computes a rank for each page in a webgraph, and a community detection algorithm discovers likely communities in a social network, while a shortest path algorithm computes the quickest way to reach a place from another, in a road network. In Domains such as social information systems, the number of edges can be in billions or trillions. Such large graphs are processed on distributed computer systems or clusters.
Graph algorithms can be executed on multi-core CPUs, GPUs with thousands of cores, multi-GPU devices, and CPU+GPU clusters, depending on the size of the graph object. While programming such algorithms on heterogeneous targets, a programmer is required to deal with parallelism and and also manage explicit data communication between distributed devices. This implies that a programmer is required to learn CUDA, OpenMP, MPI, etc., and also the details of the hardware architecture. Such codes are error prone and di cult to debug. A Domain Speci c Language (DSL) which hides all the hardware details and lets the programmer concentrate only the algorithmic logic will be very useful.
With this as the research goal, Falcon, graph DSL and its compiler have been developed. Falcon programs are explicitly parallel and Falcon hides all the hardware details from the programmer. Large graphs that do not t into the memory of a single device are automatically partitioned by the Falcon compiler. Another feature of Falcon is that it supports mutation of graph objects and thus enables programming dynamic graph algorithms. The Falcon compiler converts a single DSL code to heterogeneous targets such as multi-core CPUs, GPUs, multi-GPU devices, and CPU+GPU clusters. Compiled codes of Falcon match or outperform state-of-the-art graph frameworks for di erent target platforms and benchmarks.
|
83 |
Large Scale Graph Processing in a Distributed EnvironmentUpadhyay, Nitesh January 2017 (has links) (PDF)
Graph algorithms are ubiquitously used across domains. They exhibit parallelism, which can be exploited on parallel architectures, such as multi-core processors and accelerators. However, real world graphs are massive in size and cannot fit into the memory of a single machine. Such large graphs are partitioned and processed in a distributed cluster environment which consists of multiple GPUs and CPUs.
Existing frameworks that facilitate large scale graph processing in the distributed cluster have their own style of programming and require extensive involvement by the user in communication and synchronization aspects. Adaptation of these frameworks appears to be an overhead for a programmer. Furthermore, these frameworks have been developed to target only CPU clusters and lack the ability to harness the GPU architecture.
We provide a back-end framework to the graph Domain Specific Language, Falcon, for large scale graph processing on CPU and GPU clusters. The Motivation behind choosing this DSL as a front-end is its shared-memory based imperative programmability feature. Our framework generates Giraph code for CPU clusters. Giraph code runs on the Hadoop cluster and is known for scalable and fault-tolerant graph processing. For GPU cluster, Our framework applies a set of optimizations to reduce computation and communication latency, and generates efficient CUDA code coupled with MPI.
Experimental evaluations show the scalability and performance of our framework for both CPU and GPU clusters. The performance of the framework generated code is comparable to the manual implementations of various algorithms in distributed environments.
|
84 |
Connecting hitting sets and hitting paths in graphsCamby, Eglantine 30 June 2015 (has links)
Dans cette thèse, nous étudions les aspects structurels et algorithmiques de différents problèmes de théorie des graphes. Rappelons qu’un graphe est un ensemble de sommets éventuellement reliés par des arêtes. Deux sommets sont adjacents s’ils sont reliés par une arête.<p>Tout d’abord, nous considérons les deux problèmes suivants :le problème de vertex cover et celui de dominating set, deux cas particuliers du problème de hitting set. Un vertex cover est un ensemble de sommets qui rencontrent toutes les arêtes alors qu’un dominating set est un ensemble X de sommets tel que chaque sommet n’appartenant pas à X est adjacent à un sommet de X. La version connexe de ces problèmes demande que les sommets choisis forment un sous-graphe connexe. Pour les deux problèmes précédents, nous examinons le prix de la connexité, défini comme étant le rapport entre la taille minimum d’un ensemble répondant à la version connexe du problème et celle d’un ensemble du problème originel. Nous prouvons la difficulté du calcul du prix de la connexité d’un graphe. Cependant, lorsqu’on exige que le prix de la connexité d’un graphe ainsi que de tous ses sous-graphes induits soit borné par une constante fixée, la situation change complètement. En effet, pour les problèmes de vertex cover et de dominating set, nous avons pu caractériser ces classes de graphes pour de petites constantes.<p>Ensuite, nous caractérisons en termes de dominating sets connexes les graphes Pk- free, graphes n’ayant pas de sous-graphes induits isomorphes à un chemin sur k sommets. Beaucoup de problèmes sur les graphes sont étudiés lorsqu’ils sont restreints à cette classe de graphes. De plus, nous appliquons cette caractérisation à la 2-coloration dans les hypergraphes. Pour certains hypergraphes, nous prouvons que ce problème peut être résolu en temps polynomial.<p>Finalement, nous travaillons sur le problème de Pk-hitting set. Un Pk-hitting set est un ensemble de sommets qui rencontrent tous les chemins sur k sommets. Nous développons un algorithme d’approximation avec un facteur de performance de 3. Notre algorithme, basé sur la méthode primal-dual, fournit un Pk-hitting set dont la taille est au plus 3 fois la taille minimum d’un Pk-hitting set. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
85 |
Sur quelques problèmes algorithmiques relatifs à la détermination de structure à partir de données de spectrométrie de masse / Topics in mass spectrometry based structure determinationAgarwal, Deepesh 18 May 2015 (has links)
La spectrométrie de masse, initialement développée pour de petites molécules, a permis au cours de la dernière écoulée d’étudier en phase gazeuse des assemblages macro-moléculaires intacts, posant nombre de questions algorithmiques difficiles, dont trois sont étudiées dans cette thèse. La première contribution concerne la détermination de stoichiométrie (SD), et vise à trouver le nombre de copies de chaque constituant dans un assemblage. On étudie le cas où la masse cible se trouve dans un intervalle dont les bornes rendent compte des incertitudes des mesures des masses. Nous présentons un algorithme de taille mémoire constante (DIOPHANTINE), et un algorithme de complexité sensible à la sortie (DP++), plus performants que l’état de l’art, pour des masses en nombre entier ou flottant. La seconde contribution traite de l’inférence de connectivité à partir d’une liste d’oligomères dont la composition en termes de sous-unités est connue. On introduit le problème d’inférence de connectivité minimale (MCI) et présente deux algorithmes pour le résoudre. On montre aussi un accord excellent entre les contacts trouvés et ceux détermines expérimentalement. La troisième contribution aborde le problème d’inférence de connectivité de poids minimal, lorsque chaque contact potentiel a un poids reflétant sa probabilité d’occurrence. On présente en particulier un algorithme de bootstrap permettant de trouver un ensemble d’arêtes de sensitivité et spécificité meilleures que celles obtenues pour les solutions du problème MCI. / Mass spectrometry (MS), an analytical technique initially invented to deal with small molecules, has emerged over the past decade as a key approach in structural biology. The recent advances have made it possible to transfer large macromolecular assemblies into the vacuum without their dissociation, raising challenging algorithmic problems. This thesis makes contributions to three such problems. The first contribution deals with stoichiometry determination (SD), namely the problem of determining the number of copies of each subunit of an assembly, from mass measurements. We deal with the interval SD problem, where the target mass belongs to an interval accounting for mass measurement uncertainties. We present a constant memory space algorithm (DIOPHANTINE), and an output sensitive dynamic programming based algorithm (DP++), outperforming state-of-the-art methods both for integer type and float type problems. The second contribution deals with the inference of pairwise contacts between subunits, using a list of sub-complexes whose composition is known. We introduce the Minimum Connectivity Inference problem (MCI) and present two algorithms solving it. We also show an excellent agreement between the contacts reported by these algorithms and those determined experimentally. The third contribution deals with Minimum Weight Connectivity Inference (MWCI), a problem where weights on candidate edges are available, reflecting their likelihood. We present in particular a bootstrap algorithm allowing one to report a set of edges with improved sensitivity and specificity with respect to those obtaining upon solving MCI.
|
86 |
Registrace fotografií do 3D modelu terénu / Registration of Photos to 3D ModelDeák, Jaromír January 2017 (has links)
This work refers existing solutions and options for the task registration of photos to 3D model based on the previous knowledge of the geographic position of the camera. The contribution of the work are new ways and possibilities of the solution with the usage of graph algorithms. In this area, the work interests are useful points of interest detection in input data, a construction of graphs and graph matching possibilities.
|
87 |
Modelování rizik v dopravě / Risk modelling in transportationLipovský, Tomáš January 2016 (has links)
This thesis deals with theoretical basics of risk modelling in transportation and optimization using aggregated traffic data. In this thesis is suggested the procedure and implemented the application solving network problem of shortest path between geographical points. The thesis includes method for special paths evaluation depending on the frequency of traffic incidents based on real historical data. The thesis also includes a~graphical interface for presentation of the achieved results.
|
88 |
Two Problems in Computational GenomicsBelal, Nahla Ahmed 22 March 2011 (has links)
This work addresses two novel problems in the field of computational genomics. The first is whole genome alignment and the second is inferring horizontal gene transfer using posets. We define these two problems and present algorithmic approaches for solving them. For the whole genome alignment, we define alignment graphs for representing different evolutionary events, and define a scoring function for those graphs. The problem defined is proven to be NP-complete. Two heuristics are presented to solve the problem, one is a dynamic programming approach that is optimal for a class of sequences that we define in this work as breakable arrangements. And, the other is a greedy approach that is not necessarily optimal, however, unlike the dynamic programming approach, it allows for reversals. For inferring horizontal gene transfer, we define partial order sets among species, with respect to different genes, and infer genes involved in horizontal gene transfer by comparing posets for different genes. The posets are used to construct a tree for each gene. Those trees are then compared and tested for contradiction, where contradictory trees correspond to genes that are candidates of horizontal gene transfer. / Ph. D.
|
89 |
Capacité opérative des réseaux de transfert de pétrole / Operative capacity of crude oil local transportation networksRojas d'Onofrio, Jorge 17 March 2011 (has links)
Cette thèse étudie des systèmes locaux de gestion de transfert de pétrole ayant une architecture de réseau de canalisation. Pour leur représentativité, deux systèmes localisés au Venezuela et appartenant à l'entreprise PDVSA (Pétroles du Venezuela) ont été retenus pour illustrer les méthodes proposées et les valider : le Terminal Maritime de Pétrole de Guaraguao et le Centre de Stockage de Punta de Palmas. Dans ces réseaux des connexions, appelées « alignements », sont établies en ouvrant/fermant des vannes à travers d'un système SCADA (Supervisory Control and Data Acquisition). Le choix d'un alignement doit tenir compte de critères d'optimisation. La minimisation des interférences avec d'autres alignements, liée à la notion de capacité opérative, a été identifiée comme le critère de choix le plus important. Les contributions de cette thèse reposent sur une modélisation sous forme de graphes, et sur des algorithmes appartenant au domaine de la recherche opérationnelle. Elles contribuent à fournir aux opérateurs de supervision des outils d'analyse permettant d'optimiser le choix des alignements. Des indicateurs permettant de quantifier l'impact des opérations d'alignement ou des défaillances, sur la capacité opérative du système, sont proposés. La minimisation de l'impact sur la capacité opérative, va correspondre à la minimisation des interférences avec des alignements potentiels. Un algorithme de calcul de ces indicateurs, est présenté, ainsi que des algorithmes de recherche de chemin, de détermination d'éléments critiques, et de recherche d'alignements utilisant des pompes. Ces algorithmes sont basés sur des algorithmes classiques s'adressant au problème du plus court chemin, du flot maximum et du nombre maximum de chemins disjoints. Cependant, ils utilisent des méthodes innovantes, comme l'ajout de contraintes considérant l'existence de sous-types d'alignements, le calcul dynamique des coûts des chemins à partir de son impact sur la capacité opérative, et la recherche de chemins via un point intermédiaire obligatoire. Les contributions sont potentiellement applicables dans des domaines autres que le transport de pétrole. Les algorithmes ont été mis en œuvre en utilisant le langage Python et ont été testés en utilisant les données réelles des réseaux étudiés. L'objectif à moyen terme de ces travaux est le développement d'un logiciel d'assistance à la prise de décision. / This thesis studies local crude oil transportation systems with a pipe network architecture. Two representative systems, belonging to PDVSA (Venezuelan oil company), have been studied: the Guaraguao Crude Oil Seaport and the Punta de Palmas Tanks Yard. In this systems, connections, called "alignments", are established by opening/closing valves using a SCADA(Supervisory Control and Data Acquisition) system. Alignment choice is made based on optimization criteria. Interferences minimization with other alignments, related to the notion of operative capacity, has been identified as the most important criterion. The contributions of this thesis are based on graph modelling and algorithms from operational research. The main goal is to provide analysis tools allowing alignment choice optimization. Indexes permitting the quantification of alignments or failures impact on the operative capacity of the system are proposed. Minimizing the impact on the operative capacity will correspond to minimizing interferences with potential alignments. An algorithm computing these indexes is presented, as well as complementary developments such as a path search algorithm, an algorithm for critical elements determination, and algorithm for alignments using pumps. These algorithms are based on classical algorithms for the shortest path problem, the maximum flow problem and the maximum disjoint paths problem. However, they use innovative methods such as adding constraints when considering alignment sub-types, the dynamic computation of path costs based on their impact on operative capacity, and path search considering an obligatory intermediate node. These contributions can potentially be applied in areas other than oil transportation. The algorithms had been implemented in Python and had been tested using real data from the studied systems. The middle term goal of these works is the development of assistance software for decision making.
|
90 |
A Modified Sum-Product Algorithm over Graphs with Short CyclesRaveendran, Nithin January 2015 (has links) (PDF)
We investigate into the limitations of the sum-product algorithm for binary low density parity check (LDPC) codes having isolated short cycles. Independence assumption among messages passed, assumed reasonable in all configurations of graphs, fails the most
in graphical structures with short cycles. This research work is a step forward towards
understanding the effect of short cycles on error floors of the sum-product algorithm.
We propose a modified sum-product algorithm by considering the statistical dependency
of the messages passed in a cycle of length 4. We also formulate a modified algorithm in
the log domain which eliminates the numerical instability and precision issues associated
with the probability domain. Simulation results show a signal to noise ratio (SNR) improvement for the modified sum-product algorithm compared to the original algorithm.
This suggests that dependency among messages improves the decisions and successfully
mitigates the effects of length-4 cycles in the Tanner graph. The improvement is significant at high SNR region, suggesting a possible cause to the error floor effects on such graphs. Using density evolution techniques, we analysed the modified decoding algorithm. The threshold computed for the modified algorithm is higher than the threshold computed for the sum-product algorithm, validating the observed simulation results. We also prove that the conditional entropy of a codeword given the estimate obtained using the modified algorithm is lower compared to using the original sum-product algorithm.
|
Page generated in 0.0584 seconds