• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Snippet Generation for Provenance Workflows

Bhatti, Ayesha January 2011 (has links)
Scientists often need to know how data was derived in addition to what it is. The detailed tracking of data transformation or provenance allows result reproducibility, knowledge reuse and data analysis. Scientific workflows are increasingly being used to represent provenance as they are capable of recording complicated processes at various levels of detail. In context of knowledge reuse and sharing; search technology is of paramount importance specially considering the huge and ever increasing amount of scientific data. It is computationally hard to produce a single exact answer to the user's query due to sheer volume and complicated structure of provenance.  One solution to this difficult problem is to produce a list of candidate matches and let user select the most relevant result. Here search result presentation becomes very important as the user is required to make the final decision by looking at the workflows in the result list. Presentation of these candidate matches needs to be brief, precise, clear and revealing. This is a challenging task in case of workflows as they contain textual content as well as graphical structure. Current workflow search engines such as Yahoo Pipes! or myExperiment ignore the actual workflow specification and use metadata to create summaries. Workflows which lack metadata do not make good summaries even if they are useful and relevant as search criteria. This work investigates the possibility of creating meaningful and usable summaries or snippets based on structure and specification of workflows. We shall  (1) present relevant published work done regarding snippet building techniques (2) explain how we mapped current techniques to our work (3) describe how we identified techniques from interface design theory in order to make usable graphical interface (4) present implementation of two new algorithms for workflow graph compression and their complexity analysis (5) identify future work in our implementation and outline open research problems in snippet building field.
2

Mining Tera-Scale Graphs: Theory, Engineering and Discoveries

Kang, U 01 May 2012 (has links)
How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such Tera- or Peta-scale graphs? In this thesis, we propose PEGASUS, a large scale graph mining system implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. PEGASUS includes algorithms which help us spot patterns and anomalous behaviors in large graphs. PEGASUS enables the structure analysis on large graphs. We unify many different structure analysis algorithms, including the analysis on connected components, PageRank, and radius/diameter, into a general primitive called GIM-V. GIM-V is highly optimized, achieving good scale-up on the number of edges and available machines. We discover surprising patterns using GIM-V, including the 7-degrees of separation in one of the largest publicly available Web graphs, with 7 billion edges. PEGASUS also enables the inference and the spectral analysis on large graphs. We design an efficient distributed belief propagation algorithm which infer the states of unlabeled nodes given a set of labeled nodes. We also develop an eigensolver for computing top k eigenvalues and eigenvectors of the adjacency matrices of very large graphs. We use the eigensolver to discover anomalous adult advertisers in the who-follows-whom Twitter graph with 3 billion edges. In addition, we develop an efficient tensor decomposition algorithm and use it to analyze a large knowledge base tensor. Finally, PEGASUS allows the management of large graphs. We propose efficient graph storage and indexing methods to answer graph mining queries quickly. We also develop an edge layout algorithm for better compressing graphs.

Page generated in 0.1119 seconds