Social network data analysis becomes increasingly important today. In order to improve the integration and reuse of their data, many social networks start to apply RDF to present the data. Accordingly, one common approach for social network data analysis is to employ SPARQL to query RDF data.
As the sizes of social networks expand rapidly, queries need to be executed in parallel such as using the MapReduce framework. However, the state-of-the-art translation from SPARQL queries to MapReduce jobs mainly follows a two layer rule, in which SPARQL is first translated to SQL join, is not efficient. In this thesis, we introduce two primitives to enable automatic translation from SPARQL to MapReduce, and to enable efficient execution of the SPARQL queries. We use multiple-join-with-filter to substitute traditional SQL multiple join when feasible, and merge different stages in the MapReduce query workflow. The evaluation on social network benchmarks shows that these two primitives can achieve up to 2x speedup in query running time compared with the original two layer scheme.
Identifer | oai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:theses-2175 |
Date | 01 January 2013 |
Creators | Liu, Liu |
Publisher | ScholarWorks@UMass Amherst |
Source Sets | University of Massachusetts, Amherst |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Masters Theses 1911 - February 2014 |
Page generated in 0.002 seconds