Global ETD Search

Return to search

Digital Provenance Techniques and Applications

This thesis describes a data provenance framework and other associated frameworks for utilizing provenance for data quality and reproducibility. We first identify the requirements for the design of a comprehensive provenance framework which can be applicable to various applications, supports a rich set of provenance metadata, and is interoperable with other provenance management systems. We then design and develop a provenance framework, called SimP, addressing such requirements. Next, we present four prominent applications and investigate how provenance data can be beneficial to such applications. The first application is the quality assessment of access control policies. Towards this, we design and implement the ProFact framework which uses provenance techniques for collecting comprehensive data about actions which were either triggered due to a network context or a user (i.e., a human or a device) action. Provenance data are used to determine whether the policies meet the quality requirements. ProFact includes two approaches for policy analysis: structure-based and classification-based. For the structure-based approach, we design tree structures to organize and assess the policy set efficiently. For the classification-based approach, we employ several classification techniques to learn the characteristics of policies and predict their quality. In addition, ProFact supports policy evolution and the assessment of its impact on the policy quality. The second application is workflow reproducibility. Towards this, we implement ProWS which is a provenance-based architecture for retrieving workflows. Specifically, ProWS transforms data provenance into workflows and then organizes data into a set of indexes to support efficient querying mechanisms. ProWS supports composite queries on three types of search criteria: keywords of workflow tasks, patterns of workflow structure, and metadata about workflows (e.g., how often a workflow was used). The third application is the access control policy reproducibility. Towards this, we propose a novel framework, Polisma, which generates attribute-based access control policies from data, namely from logs of historical access requests and their corresponding decisions. Polisma combines data mining, statistical, and machine learning techniques, and capitalizes on potential context information obtained from external sources (e.g., LDAP directories) to enhance the learning process. The fourth application is the policy reproducibility by utilizing knowledge and experience transferability. Towards this, we propose a novel framework, FLAP, which transfer attribute-based access control policies between different parties in a collaborative environment, while considering the challenges of minimal sharing of data and support policy adaptation to address conflict. All frameworks are evaluated with respect to performance and accuracy.

10.25394/pgs.12799934.v1

Applied Computer Science

data provenance

access control

policy quality analysis

policy learning

scientific workflows

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/12799934
Date	13 August 2020
Creators	Amani M Abu Jabal (9237002)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Relation	https://figshare.com/articles/thesis/Digital_Provenance_Techniques_and_Applications/12799934

Page generated in 0.0023 seconds

Digital Provenance Techniques and Applications

Description

Links & Downloads

Tags

Additional Fields