Traditional database systems manage data, but often do not address its provenance. In the past, users were often implicitly familiar with data they used, how it was created (and hence how it might be appropriately used), and from which sources it came. Today, users may be physically and organizationally remote from the data they use, so this information may not be easily accessible to them. In recent years, several models have been proposed for recording provenance of data. Our work is motivated by opportunities to make provenance easy to manage and query. For example, current approaches model provenance as expressions that may be easily stored alongside data, but are difficult to parse and reconstruct for querying, and are difficult to query with available languages. We contribute a conceptual model for data and provenance, and evaluate how well it addresses these opportunities. We compare the expressive power of our model's language to that of other models. We also define a benchmark suite with which to study performance of our model, and use this suite to study key model aspects implemented on existing software platforms. We discover some salient performance bottlenecks in these implementations, and suggest future work to explore improvements. Finally, we show that our implementations can comprise a logical model that faithfully supports our conceptual model.
Identifer | oai:union.ndltd.org:pdx.edu/oai:pdxscholar.library.pdx.edu:open_access_etds-1132 |
Date | 01 January 2011 |
Creators | Archer, David William |
Publisher | PDXScholar |
Source Sets | Portland State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Dissertations and Theses |
Page generated in 0.0018 seconds