Global ETD Search

1	Performance Optimization Techniques and Tools for Distributed Graph Processing Kalavri, Vasiliki January 2016 (has links) In this thesis, we propose optimization techniques for distributed graph processing. First, we describe a data processing pipeline that leverages an iterative graph algorithm for automatic classification of web trackers. Using this application as a motivating example, we examine how asymmetrical convergence of iterative graph algorithms can be used to reduce the amount of computation and communication in large-scale graph analysis. We propose an optimization framework for fixpoint algorithms and a declarative API for writing fixpoint applications. Our framework uses a cost model to automatically exploit asymmetrical convergence and evaluate execution strategies during runtime. We show that our cost model achieves speedup of up to 1.7x and communication savings of up to 54%. Next, we propose to use the concepts of semi-metricity and the metric backbone to reduce the amount of data that needs to be processed in large-scale graph analysis. We provide a distributed algorithm for computing the metric backbone using the vertex-centric programming model. Using the backbone, we can reduce graph sizes up to 88% and achieve speedup of up to 6.7x. / <p>QC 20160919</p> Graph processing distributed systems big data
2	Temporal Graph Mining and Distributed Processing Kumar, Rohit 19 June 2018 (has links) With the recent growth of social media platforms and the human desire to interact with the digital world a lot of human-human and human-device interaction data is getting generated every second. With the boom of the Internet of Things (IoT) devices, a lot of device-device interactions are also now on the rise. All these interactions are nothing but a representation of how the underlying network is connecting different entities over time. These interactions when modeled as an interaction network presents a lot of unique opportunities to uncover interesting patterns and to understand the dynamics of the network. Understanding the dynamics of the network is very important because it encapsulates the way we communicate, socialize, consume information and get influenced. To this end, in this PhD thesis, we focus on analyzing an interaction network to understand how the underlying network is being used. We define interaction network as a sequence of time-stamped interactions E over edges of a static graph G=(V, E). Interaction networks can be used to model many real-world networks for example, in a social network or a communication network, each interaction over an edge represents an interaction between two users, e.g. emailing, making a call, re-tweeting, or in case of the financial network an interaction between two accounts to represent a transaction.We analyze interaction network under two settings. In the first setting, we study interaction network under a sliding window model. We assume a node could pass information to other nodes if they are connected to them using edges present in a time window. In this model, we study how the importance or centrality of a node evolves over time. In the second setting, we put additional constraints on how information flows between nodes. We assume a node could pass information to other nodes only if there is a temporal path between them. To restrict the length of the temporal paths we consider a time window in this approach as well. We apply this model to solve the time-constrained influence maximization problem. By analyzing the interaction network data under our model we find the top-k most influential nodes. We test our model both on human-human interaction using social network data as well as on location-location interaction using location-based social network(LBSNs) data. In the same setting, we also mine temporal cyclic paths to understand the communication patterns in a network. Temporal cycles have many applications and appear naturally in communication networks where one person posts a message and after a while reacts to a thread of reactions from peers on the post. In financial networks, on the other hand, the presence of a temporal cycle could be indicative of certain types of fraud. We provide efficient algorithms for all our analysis and test their efficiency and effectiveness on real-world data.Finally, given that many of the algorithms we study have huge computational demands, we also studied distributed graph processing algorithms. An important aspect of these algorithms is to correctly partition the graph data between different machines. A lot of research has been done on efficient graph partitioning strategies but there is no one good partitioning strategy for all kind of graphs and algorithms. Choosing the best partitioning strategy is nontrivial and is mostly a trial and error exercise. To address this problem we provide a cost model based approach to give a better understanding of how a given partitioning strategy is performing for a given graph and algorithm. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished Sciences de l'ingénieur Temporal Graph Graph Mining Distributed Graph Processing
3	Scaling Big Data Cleansing Khayyat, Zuhair 31 July 2017 (has links) Data cleansing approaches have usually focused on detecting and fixing errors with little attention to big data scaling. This presents a serious impediment since identify- ing and repairing dirty data often involves processing huge input datasets, handling sophisticated error discovery approaches and managing huge arbitrary errors. With large datasets, error detection becomes overly expensive and complicated especially when considering user-defined functions. Furthermore, a distinctive algorithm is de- sired to optimize inequality joins in sophisticated error discovery rather than na ̈ıvely parallelizing them. Also, when repairing large errors, their skewed distribution may obstruct effective error repairs. In this dissertation, I present solutions to overcome the above three problems in scaling data cleansing. First, I present BigDansing as a general system to tackle efficiency, scalability, and ease-of-use issues in data cleansing for Big Data. It automatically parallelizes the user’s code on top of general-purpose distributed platforms. Its programming inter- face allows users to express data quality rules independently from the requirements of parallel and distributed environments. Without sacrificing their quality, BigDans- ing also enables parallel execution of serial repair algorithms by exploiting the graph representation of discovered errors. The experimental results show that BigDansing outperforms existing baselines up to more than two orders of magnitude. Although BigDansing scales cleansing jobs, it still lacks the ability to handle sophisticated error discovery requiring inequality joins. Therefore, I developed IEJoin as an algorithm for fast inequality joins. It is based on sorted arrays and space efficient bit-arrays to reduce the problem’s search space. By comparing IEJoin against well- known optimizations, I show that it is more scalable, and several orders of magnitude faster. BigDansing depends on vertex-centric graph systems, i.e., Pregel, to efficiently store and process discovered errors. Although Pregel scales general-purpose graph computations, it is not able to handle skewed workloads efficiently. Therefore, I introduce Mizan, a Pregel system that balances the workload transparently during runtime to adapt for changes in computing needs. Mizan is general; it does not assume any a priori knowledge of the graph structure or the algorithm behavior. Through extensive evaluations, I show that Mizan provides up to 84% improvement over techniques leveraging static graph pre-partitioning. data cleansing inequality join graph processing big data pregel spark
4	Dynamic memory management for the Loci framework Zhang, Yang 08 May 2004 (has links) Resource management is a critical part in high-performance computing software. While management of processing resources to increase performance is the most critical, efficient management of memory resources plays an important role in solving large problems. This thesis research seeks to create an effective dynamic memory management scheme for a declarative data-parallel programming system. In such systems, some sort of automatic resource management is a requirement. Using the Loci framework, this thesis research focuses on exploring such opportunities. We believe there exists an automatic memory management scheme for such declarative data-parallel systems that provides good compromise between memory utilization and performance. In addition to basic memory management, this thesis research also seeks to develop methods that take advantages of the cache memory subsystem and explore balances between memory utilization and parallel communication costs in such declarative data-parallel frameworks. graph processing parallel programming declarative languages software synthesis
5	Using Workload Characterization to Guide High Performance Graph Processing Hassan, Mohamed Wasfy Abdelfattah 24 May 2021 (has links) Graph analytics represent an important application domain widely used in many fields such as web graphs, social networks, and Bayesian networks. The sheer size of the graph data sets combined with the irregular nature of the underlying problem pose a significant challenge for performance, scalability, and power efficiency of graph processing. With the exponential growth of the size of graph datasets, there is an ever-growing need for faster more power efficient graph solvers. The computational needs of graph processing can take advantage of the FPGAs' power efficiency and customizable architecture paired with CPUs' general purpose processing power and sophisticated cache policies. CPU-FPGA hybrid systems have the potential for supporting performant and scalable graph solvers if both devices can work coherently to make up for each other's deficits. This study aims to optimize graph processing on heterogeneous systems through interdisciplinary research that would impact both the graph processing community, and the FPGA/heterogeneous computing community. On one hand, this research explores how to harness the computational power of FPGAs and how to cooperatively work in a CPU-FPGA hybrid system. On the other hand, graph applications have a data-driven execution profile; hence, this study explores how to take advantage of information about the graph input properties to optimize the performance of graph solvers. The introduction of High Level Synthesis (HLS) tools allowed FPGAs to be accessible to the masses but they are yet to be performant and efficient, especially in the case of irregular graph applications. Therefore, this dissertation proposes automated frameworks to help integrate FPGAs into mainstream computing. This is achieved by first exploring the optimization space of HLS-FPGA designs, then devising a domain-specific performance model that is used to build an automated framework to guide the optimization process. Moreover, the architectural strengths of both CPUs and FPGAs are exploited to maximize graph processing performance via an automated framework for workload distribution on the available hardware resources. / Doctor of Philosophy / Graph processing is a very important application domain, which is emphasized by the fact that many real-world problems can be represented as graph applications. For instance, looking at the internet, web pages can be represented as the graph vertices while hyper links between them represent the edges. Analyzing these types of graphs is used for web search engines, ranking websites, and network analysis among other uses. However, graph processing is computationally demanding and very challenging to optimize. This is due to the irregular nature of graph problems, which can be characterized by frequent indirect memory accesses. Such a memory access pattern is dependent on the data input and impossible to predict, which renders CPUs' sophisticated caching policies useless to performance. With the rise of heterogeneous computing that enabled using hardware accelerators, a new research area was born, attempting to maximize performance by utilizing the available hardware devices in a heterogeneous ecosystem. This dissertation aims to improve the efficiency of utilizing such heterogeneous systems when targeting graph applications. More specifically, this research focuses on the collaboration of CPUs and FPGAs (Field Programmable Gate Arrays) in a CPU-FPGA hybrid system. Innovative ideas are presented to exploit the strengths of each available device in such a heterogeneous system, as well as addressing some of the inherent challenges of graph processing. Automated frameworks are introduced to efficiently utilize the FPGA devices, in addition to distributing and scheduling the workload across multiple devices to maximize the performance of graph applications. Configurable Computing Graph Processing Heterogeneous Systems High Performance Computing
6	Road traffic congestion detection and tracking with Spark Streaming analytics Thorri Sigurdsson, Thorsteinn January 2018 (has links) Road traffic congestion causes several problems. For instance, slow moving traffic in congested regions poses a safety hazard to vehicles approaching the congested region and increased commuting times lead to higher transportation costs and increased pollution.The work carried out in this thesis aims to detect and track road traffic congestion in real time. Real-time road congestion detection is important to allow for mechanisms to e.g. improve traffic safety by sending advanced warnings to drivers approaching a congested region and to mitigate congestion by controlling adaptive speed limits. In addition, the tracking of the evolution of congestion in time and space can be a valuable input to the development of the road network. Traffic sensors in Stockholm’s road network are represented as a directed weighted graph and the congestion detection problem is formulated as a streaming graph processing problem. The connected components algorithm and existing graph processing algorithms originally used for community detection in social network graphs are adapted for the task of road congestion detection. The results indicate that a congestion detection method based on the streaming connected components algorithm and the incremental Dengraph community detection algorithm can detect congestion with accuracy at best up to 94% for connected components and up to 88% for Dengraph. A method based on hierarchical clustering is able to detect congestion while missing details such as shockwaves, and the Louvain modularity algorithm for community detection fails to detect congested regions in the traffic sensor graph.Finally, the performance of the implemented streaming algorithms is evaluated with respect to the real-time requirements of the system, their throughput and memory footprint. / Vägtrafikstockningar orsakar flera problem. Till exempel utgör långsam trafik i överbelastade områden en säkerhetsrisk för fordon som närmar sig den överbelastade regionen och ökade pendeltider leder till ökade transportkostnader och ökad förorening.Arbetet i denna avhandling syftar till att upptäcka och spåra trafikstockningar i realtid. Detektering av vägtrafiken i realtid är viktigt för att möjliggöra mekanismer för att t.ex. förbättra trafiksäkerheten genom att skicka avancerade varningar till förare som närmar sig en överbelastad region och för att mildra trängsel genom att kontrollera adaptiva hastighetsgränser. Dessutom kan spårningen av trängselutveckling i tid och rum vara en värdefull inverkan på utvecklingen av vägnätet. Trafikavkännare i Stockholms vägnät representeras som en riktad vägd graf och problemet med överbelastningsdetektering är formulerat som ett problem med behandling av flödesgrafer. Den anslutna komponentalgoritmen och befintliga grafbehandlingsalgoritmer som ursprungligen användes för communitydetektering i sociala nätgravar är anpassade för uppgiften att detektera vägtäthet. Resultaten indikerar att en överbelastningsdetekteringsmetod baserad på den strömmande anslutna komponentalgoritmen och den inkrementella Dengraph communitydetekteringsalgoritmen kan upptäcka överbelastning med noggrannhet i bästa fall upp till 94% för anslutna komponenter och upp till 88% för Dengraph. En metod baserad på hierarkisk klustring kan detektera överbelastning men saknar detaljer som shockwaves, och Louvain modularitetsalgoritmen för communitydetektering misslyckas med att detektera överbelastade områden i trafiksensorns graf.Slutligen utvärderas prestandan hos de implementerade strömmalgoritmerna med hänsyn till systemets realtidskrav, deras genomströmning och minnesfotavtryck. streaming graph processing congestion community detection connected components Computer and Information Sciences Data- och informationsvetenskap
7	Untersuchung zur Eignung von Graphdatenbanksystemen für die Analyse von Informationsnetzwerken Junghanns, Martin 27 February 2018 (has links) In der vorliegenden Masterarbeit werden verschiedene Graphdatenbanksysteme in einer funktionalen und technischen Evaluation hinsichtlich ihrer Eignung für ein aktuelles Forschungsvorhaben der Abteilung Datenbanken der Universität Leipzig untersucht. Ziel des Forschungsprojektes ist die Integration von Unternehmensdaten in ein Informationsnetzwerk und eine darauf aufbauendegraphenorientierte Analyse der Daten. info:eu-repo/classification/ddc/000 ddc:000
8	Scalable Map-Reduce Algorithms for Mining Formal Concepts and Graph Substructures Kumar, Lalit January 2018 (has links) No description available. Computer Science Data Mining Graph Processing Formal Concepts Hadoop, Map-Reduce Maximal Cliques Real Valued Biclusters
9	Distributed Local Trust Propagation Model and its Cloud-based Implementation. Althuru, Dharan Kumar Reddy 23 May 2014 (has links) No description available. Computer Science Trust Propagation Scalable Trust Computation Trust Aware Recommender Systems Distributed Graph Processing Information Extraction
10	Towards Energy Efficient Data Mining & Graph Processing Faisal, S M January 2015 (has links) No description available. Computer Science Energy Efficiency Data Mining Graph Processing Energy Efficient Computing

Search results