Spelling suggestions: "subject:"provenance"" "subject:"provenances""
71 |
MPPI: um modelo de procedência para subsidiar processos de integração / MPPI: a provenance model to support data integration processesBruno Tomazela 05 February 2010 (has links)
A procedência dos dados consiste no conjunto de metadados que possibilita identificar as fontes e os processos de transformação aplicados aos dados, desde a criação até o estado atual desses dados. Existem diversas motivações para se incorporar a procedência ao processo de integração, tais como avaliar a qualidade dos dados das fontes heterogêneas, realizar processos de auditoria dos dados e de atribuição de autoria aos proprietários dos dados e reproduzir decisões de integração. Nesta dissertação é proposto o MPPI, um modelo de procedência para subsidiar processos de integração. O modelo enfoca sistemas nos quais as fontes de dados podem ser atualizadas somente pelos seus proprietários, impossibilitando que a integração retifique eventuais conflitos de dados diretamente nessas fontes. O principal requisito do MPPI é que ele ofereça suporte ao tratamento de todas as decisões de integração realizadas em processos anteriores, de forma que essas decisões possam ser reaplicadas automaticamente em processos de integração subsequentes. O modelo MPPI possui quatro características. A primeira delas consiste no mapeamento da procedência dos dados em operações de cópia, edição, inserção e remoção, e no armazenamento dessas operações em um repositório de operações. A segunda característica é o tratamento de operações de sobreposição, por meio da proposta das políticas blind, restrict, undo e redo. A terceira característica consiste na identificação de anomalias decorrentes do fato de que fontes de dados autônomas podem alterar os seus dados entre processos de integração, e na proposta de quatro tipos de validação das operações frente a essas anomalias: validação completa, da origem, do destino, ou nenhuma. A quarta característica consiste na reaplicação de operações, por meio da proposta dos métodos VRS (do inglês Validate and Reapply in Separate) e VRT (do inglês Validate and Reapply in Tandem) e da reordenação segura do repositório, os quais garantem que todas as decisões de integração tomadas pelo usuário em processos de integração anteriores sejam resolvidas automaticamente e da mesma forma em processos de integração subsequentes. A validação do modelo MPPI foi realizada por meio de testes de desempenho que investigaram o tratamento de operações de sobreposição, o método VRT e a reordenação segura, considerando como base as demais características do modelo. Os resultados obtidos mostraram a viabilidade de implementação das políticas propostas para tratamento de operações de sobreposição em sistemas de integração reais. Os resultados também mostraram que o método VRT proporcionou ganhos de desempenho significativos frente à coleta quando o objetivo é restabelecer resultados de processos de integração que já foram executados pelo menos uma vez. O ganho médio de desempenho do método VRT foi de pelo menos 93%. Ademais, os testes também mostraram que reordenar as operações antes da reaplicação pode melhorar ainda mais o desempenho do método VRT / Data provenance is the set of metadata that allows for the identification of sources and transformations applied to data, since its creation to its current state. There are several advantages of incorporating data provenance into data integration processes, such as to estimate data quality and data reliability, to perform data audit, to establish the copyright and ownership of data, and to reproduce data integration decisions. In this master\'s thesis, we propose the MPPI, a novel data provenance model that supports data integration processes. The model focuses on systems in which only owners can update their data sources, i.e., the integration process cannot correct the sources according to integration decisions. The main goal of the MPPI model is to handle decisions taken by the user in previous integration processes, so they can be automatically reapplied in subsequent integration processes. The MPPI model introduces the following properties. It is based on mapping provenance data into operations of copy, edit, insert and remove, which are stored in an operation repository. It also provides four techniques to handle overlapping operations: blind, restrict, undo and redo. Furthermore, it identifies anomalies generated by sources that are updated between two data integration processes and proposes four validation approaches to avoid these anomalies: full validation, source validation, target validation and no validation. Moreover, it introduces two methods that perform the reapplication of operations according to decisions taken by the user, called the VRS (Validate and Reapply in Separate) and the VRT (Validate and Reapply in Tandem) methods, in addition to extending the VRT method with the safe reordering optimization. The MPPI model was validated through performance tests that investigated overlapping operations, the VRT method and the safe reordering optimization. The tests showed that the techniques proposed to handle overlapping operations are feasible to be applied to real integration systems. The results also demonstrated that the VRT method provided significant performance gains over data gathering when the goal is to reestablish previous integration results. The performance gains were of at least 93%. Furthermore, the performance results also showed that reordering the operations before the reapplication process can improve even more the performance of the VRT method
|
72 |
[en] PROVENANCE FOR BIOINFORMATICS WORKFLOWS / [pt] PROVENIÊNCIA PARA WORKFLOWS DE BIOINFORMÁTICALUCIANA DA SILVA ALMENDRA GOMES 25 October 2011 (has links)
[pt] Muitos experimentos científicos são elaborados como fluxos de tarefas
computacionais, que podem ser implementados através do uso de linguagens de
programação. Na área de bioinformática é muito comum o uso de scripts ad-hoc
para construir fluxos de tarefas. Os Sistemas de Gerência de Workflow Científico
(SGWC) surgiram como uma alternativa a estes scripts. Uma das
funcionalidades desses sistemas que têm recebido bastante atenção pela
comunidade científica é a captura automática de dados de proveniência. Estes
permitem averiguar quais foram os recursos e parâmetros utilizados na geração
dos resultados, dentre muitas outras informações indispensáveis para a
validação e publicação de um experimento. Neste trabalho foram levantados
alguns desafios na área de proveniência de dados em SGWCs, como por
exemplo (i) a heterogeneidade de formas de representação dos dados nos
diferentes sistemas, dificultando a compreensão e a interoperabilidade; (ii) o
armazenamento de dados consumidos e produzidos e (iii) a reprodutibilidade de
uma execução específica. Estes desafios motivaram a elaboração de um
esquema conceitual de proveniência de dados para a representação de
workflows. Foi implementada também uma extensão em um SGWC específico
(BioSide) para incluir dados de proveniência e armazená-los utilizando o
esquema conceitual proposto. Foram priorizados neste trabalho alguns requisitos
comumente encontrados em workflows de Bioinformática. / [en] Many scientific experiments are designed as computational workflows,
which can be implemented using traditional programming languages. In the
Bioinformatics domain ad-hoc scripts are often used to build workflows. Scientific
Workflow Management Systems (SWMS) have emerged as an alternative to
those scripts. One particular SWMS feature that has received much attention by
the scientific community is the automatic capture of provenance data. These
allow users to track which resources and parameters were used to obtain the
results, among many other required information to validate and publish an
experiment. In the present work we have elicited some data provenance
challenges in the SWMS context, such as (i) the heterogeneity of data
representation schemes that hinders the understanding and interoperability; (ii)
the storage of consumed and produced data and (iii) the reproducibility of a
specific execution. These challenges have motivated the proposal of a data
provenance conceptual scheme for workflow representation. We have
implemented an extension of a particular SWMS system (Bioside) to include
provenance data and store them using the proposed conceptual scheme. We
have focused on some requirements commonly found in bioinformatics
workflows.
|
73 |
Provenance of visual interpretations in the exploration of dataAl-Naser, Aqeel January 2015 (has links)
The thesis addresses the problem of capturing and tracking multi-user interpretations of 3D spatial datasets. These interpretations are completed after the end of the visualization pipeline to identify and extract features of interest, and are subjective to human intuition and knowledge. Users may also assess regions of these interpretations. Consequently, the thesis proposes a provenance-enabled interpretation pipeline. It adopts and extends the W3C PROV data model, producing a provenance model for visual interpretations. This was implemented for seismic imaging interpretation in a proof-of-concept prototype architecture and application. Accumulation of users' interpretations and annotations are captured by the provenance model in a fine-grained form. The captured provenance information can be utilised to filter data. The work of this thesis was evaluated in three parts. First, a usability evaluation by geoscientists was conducted by postgraduate students in the field of geoscience to illustrate the system's ability in allowing users to amend others' interpretations and trace the history of amendments. Second, a conceptual evaluation of this research was approached by interviewing domain experts. The importance of this research to the industry was assured. Interviewees perceived and shared potential implementations of this work in the workflow of seismic interpretation. Limitations and concerns of the work were highlighted. Third, a performance evaluation was conducted to illustrate the behaviour of the architecture on commodity machines as well as on a multi-node parallel database, such that a new functionality in fine-grained provenance can be implemented simply but with an acceptable performance in realistic visualization tasks. The measures suggested that the current implementation achieved an acceptable performance in comparison to conventional methods. The proposed provenance model in an interpretation pipeline is believed to be a promising shift in methods of data management and storage which can record and preserve interpretations by users as a result of visualization. The approach and software development in this thesis represented a step in this direction.
|
74 |
Sedimentology and basin context of the Numidian Flysch Formation; Sicily and TunisiaThomas, Myron January 2011 (has links)
The Numidian Flysch Formation is a regionally extensive series of deep marine sandstones and mudstones which crop out in Spain, Morocco, Algeria, Tunisia, Sicily, and southern mainland Italy. The formation is dated as Oligocene to mid Miocene and represents an approximately linear series of submarine fans characterised by a quartz rich petrofacies. Their unique regional extent is nearly twice the length of the Angolan margin although issues surrounding provenance and basin context have hampered understanding. The Numidian Flysch Formation was deposited into the Maghrebian Flysch Basin (MFB) which was a foreland basin remnant of the neo-Tethys ocean in the western portion of the present day Mediterranean Basin. The basin was bordered to the north by an active margin which consisted of a southward verging accretionary prism, underlain by European crustal blocks which rode above northwards subducting oceanic crust. To the south, the African margin formed a passive-margin to the basin.The huge amount of geophysical and outcrop data which is becoming increasingly available suggests that submarine slope systems are more complex than previously thought, including topographically complex slopes, a wide variety of density flow types, and flow transformations. This thesis aims to review the sedimentology of the Numidian Flysch Formation in Sicily and Tunisia in light of these developments. Constraining the provenance and basin context of the formation is therefore of paramount importance, and this is also addressed.Commonly used evidence for the provenance of Numidian Flysch sandstones include its quartz rich petrology, an Eburnian and Pan-African age detrital zircon suite, its structural position within the foreland fold and thrust belt, and complex palaeocurrent orientations. when reviewd in their entirety and placed in context of other basin successions, the Numidian Flysch is constrained to a depositional location in the south of the basin, with polycyclic sediment sourced from African basement. The Numidian Flysch Formation is therefore a 'passive margin' sequence as opposed to a flysch sensu stricto. The timing of Numidian Flysch deposition is also coincidental with uplift of the Atlas chain in North Africa, during a period of significantly wetter conditions. A switch from carbonate to clastic deposition results from these conditions, and the Numidian Flysch Formation is considered an offshore extension of this regional sedimentation.Characterisation of outcrops in Sicily and Tunisia shows remarkably similar lithofacies and depositional elements. Sinuous upper slope channel complexes are entrenched within slope deposits to a depth of 100 m and occur within channel systems up to 5.7 km in width. They are filled predominantly with massive ungraded sandstones interpreted to aggrade through quasi-steady turbidity currents, interbedded with normally graded turbidites. Channel elements are subseismic in scale, are nested within complexes and show sinuosity. Coupled with lateral offset stacking, this strongly affects the architecture and facies heterogeneity of channel complexes. When compared to globally reviewed data, the thickness of channel elements as shown through their frequency distribution also suggests a fundamental control upon the degree of slope incision which is as yet unconstrained.In lower slope settings, channel complexes stack aggradationally with a width of over 1000 m. They are also predominantly filled with massive sandstones in fining upwards cycles, and show heterogeneous margins and large scale slumping. In central Sicily, large channel complexes are overlain by a stacked lobe complex, in turn overlain by a channel lobe transition zone. This progression coupled with palaeocurrent variability suggests intraslope deformation strongly impacts transiting flows through changes in flow capacity. Salt tectonics, present in Algeria and Tunisia is a possible forcing mechanism.Taken in context, the sections in Sicily record a proximal to distal palaeogeographic trend which is reconstructed towards the north/northeast once well constrained tectonic rotations are taken into account. Given regional similarities, controls upon slope architecture are interpreted to be similar throughout the basin, and deposits in Sicily therefore provide a good analogue for the remainder of the basin. These results therefore allow for a better constrained fan architecture, along with the allogenic controls upon them. Given the continental extent of this formation, the Numidian Flysch Formation provides a unique opportunity to study controls upon fan architecture once provenance and intraslope topography is factored in.
|
75 |
Reconstructing the depositional history of the Eel River paleo meltwater channel, northeastern Indiana using sediment provenance techniquesGoodwin, Charles B. 03 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The outwash deposits of the Eel River paleo meltwater channel in DeKalb and Allen Counties, Indiana predominantly originated from the Erie Lobe of the Laurentide Ice Sheet, but do contain some sediment from the Saginaw Lobe. This determination helps clarify the ice dynamics and Last Glacial Maximum sediment depostional history in northeastern Indiana, which is complicated because of the interactions between the Erie and Saginaw Lobes. Outwash deposits were analyzed from IGS core SC0802 in the Eel River paleo meltwater channel, which intersects the previously identified Huntertown Formation. The core includes 29.2 m of deposits underlain by the hard glacial till of the Trafalgar formation. Mean grain size, sediment skewness, lithology, magnetic susceptibility, and quantitative X-ray diffraction were used to evaluate the provenance of the outwash deposits. Representative samples of Erie Lobe and Saginaw Lobe deposits were analyzed to develop end member provenance signatures.
A weight of evidence approach was developed and revealed that deposits from 8.0-13.8 m are of mixed origin from the Erie and Saginaw Lobes, whereas the 0-8.0 and 13.8-29.2 m deposits are Erie Lobe in origin. Cluster analysis and discriminant function analysis supported the findings of this approach. These findings suggests that the Eel River paleo meltwater channel was formed as an outwash channel, and that the adjacent Huntertown Formation does not appear to have been directly deposited by the Saginaw Lobe. The sediments of Saginaw origin from ~8-14 m in the Eel River paleo meltwater channel were likely transported from an upgradient source. The sediments from this zone have a larger mean grain size indicating deposition occurred during higher meltwater discharge, such as the release of meltwater from the drainage of proglacial or subglacial lake(s) associated with the disintegration of the Saginaw Lobe, thus resulting in the mixing of Saginaw Lobe deposits with Erie Lobe deposits. However, the majority of the sediment in the Eel River paleo channel near SC0802 is Erie Lobe in origin. Based on the provenance and depositional sequence at SC0802, the Saginaw Lobe disintegrated prior to the Erie Lobe retreat from the Wabash moraine around 16-17 cal ka.
|
76 |
Facilitating reproducible computing via scientific workflows – an integrated system approachCao, Yuan 04 May 2017 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Reproducible computing and research are of great importance for scientific investigation in any discipline. This thesis presents a general approach to provenance in the context of workflows for widely used script languages. Our solution is based on system integration, and is demonstrated by integrating MATLAB with VisTrails, an open source scientific workflow system. The integrated VisTrails-MATLAB system supports reproducible computing with truly prospective and retrospective provenance at multiple granularity levels as scientists choose for their scripts, and at the same time, is very easy to use.
|
77 |
Using U-Pb Dating of Detrital Zircons to Determine Major Ice Stream Flow History in the Weddell Sea Embayment, AntarcticaAgrios, Liana Marie 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Tills from major ice streams (Institute, Foundation, Academy, Recovery, and Slessor) of the Weddell Sea Embayment contain detrital zircons with distinct U-Pb age populations that can be used as a provenance tool to better understand ice stream dynamics. U-Pb ages of detrital zircons were measured in 21 samples of onshore till, erratics, and bedrock of potential source rocks, and 12 samples of offshore till. Grains were analyzed by LA-ICPMS at the University of Arizona (n=5447). Relative probability U-Pb age density plots of till in moraines along the Institute Ice Stream have dominant Grenville (1070 Ma) and secondary Ross/Pan-African peaks (560 Ma, 630 Ma). The Foundation and Academy show prominent Ross/Pan-African peaks (500-530 Ma and 615-650 Ma). The Recovery transports zircons with prominent 530 Ma and 635 Ma peaks along the southern margin, and 1610 and 1770 Ma along the northern margin. The Slessor carries zircons with prominent populations at 1710 Ma and secondary 2260-2420 Ma.
U-Pb ages in zircons from offshore till samples show a general trend of fewer Mesozoic ages from west to east. The western most core, PS 1423, has dominant Jurassic populations while cores 1197 and 1278 have a high proportion of early Ross/Pan-African ages relative to Grenville ages. The similar zircon age distributions between PS 1278 and the Foundation Ice Stream tills suggest that the Foundation switched to an easterly flow path around Berkner Island (BI) at some point during the LGM. In the eastern Weddell Sea (PS 1400), there was a near absence of Proterozoic zircon age populations carried by the Slessor and northern side of the Recovery. Another unexpected find was a lack of Grenville ages in PS 1423 relative to the Institute tills.
The U-Pb data in this study provides a basis for two possible LGM ice flow reconstructions. In the first, the Institute flowed west around the unnamed isolated bedrock highs, deposited tills between PS 1423 and PS 1197, providing a westerly flow path around BI for the Foundation. In the second, the Institute flows over the subglacial topography and deposited till closer to PS 1197, forcing the Foundation east around BI.
|
78 |
Determining the Laurentide Ice Sheet and Bedrock Provenance of Midwestern Till by Applying U-Pb Geochronology to Detrital ZirconsMickey, Jeremiah Lee 10 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / A broad range of samples were collected from the Huron-Erie Lobe, Lake Michigan Lobe, Saginaw Lobe, and Tipton Till Plain of northern Indiana to determine the provenance of Laurentide Ice Sheet till in the Midwest U.S. during the Illinoian and Wisconsinan glaciations. U-Pb age distributions from approximately 300 detrital zircons (DZ) were used as provenance indicators for each till sample. Till from the Lake Michigan Lobe and was found to be largely homogenized. The distinct lobe DZ age distributions are the Lake Michigan Lobe till with a dominant ~1465 Ma peak, the northern Huron-Erie Lobe till with a dominant ~1060 Ma and a secondary peak at ~1450 Ma, the southern Huron-Erie Lobe till with nearly equal peaks at ~1435 Ma, ~1175 Ma, and ~1065 Ma, and the southern Saginaw Lobe till with a dominant peak at ~1095 Ma. Those four DZ age distributions were treated as endmembers in a nonlinear least-squares mixing model to calculate the contribution of each lobe to till in the Tipton Till Plain. Huron-Erie and Saginaw lobe tills were found to be the primary components of the Tipton Till Plain, and Lake Michigan Lobe till was only found in the western Tipton Till Plain. Zircons from the Saginaw Lobe till increased 39 % in the eastern Tipton Till Plain between the Illinoisan and Wisconsinan glaciations. The mixing model was also applied to relate the DZ age distributions of the lobes to bedrock within and near their flow paths. When comparing nearby bedrock to each lobe’s till, mixing model results, yield an approximate maximum transport distance between 500 and 630 kilometers for the matrix
vii
fraction of till in the Lake Michigan, Huron-Erie, and Saginaw lobes. Samples for the southern Huron-Erie Lobe indicate that the most of the zircon ages within the southern Huron-Erie Lobe till in Indiana were specifically entrained between Niagara County, New York and east-central Indiana. Within the model’s error, 93 – 100 % of the detrital zircons in each of the three lobes are relatable to nearby Paleozoic and Precambrian sedimentary and metamorphic bedrock formations.
|
79 |
RECONSTRUCTING PAST ANTARCTIC ICE FLOW PATHS IN THE ROSS EMBAYMENT, ANTARCTICA USING SAND PETROGRAPHY, PARTICLE SIZE AND DETRITAL ZIRCON PROVENANCESchilling, Andrea J. 03 May 2010 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Tills for this study were analyzed from sites in East Antarctica (EA), West Antarctica (WA) and along a transect in the Ross Sea. Particle size, sand petrography, and detrital zircons were used to provide new information on the subglacial geology of Antarctica, as well as assisting in the reconstruction of Last Glacial Maximum (LGM) ice flow paths. Statistical analyses using the Kolmogorov-Smirnov (K-S test) reveal that EA and WA zircon age distributions are distinct at a P-value <0.05. This makes it possible to trace the unique signatures from EA and WA into the Ross Sea.
|
80 |
Blockchain Use for Data Provenance in Scientific WorkflowSigurjonsson, Sindri Már Kaldal January 2018 (has links)
In Scientific workflows, data provenance plays a big part. Through data provenance, the execution of the workflow is documented and information about the data pieces involved are stored. This can be used to reproduce scientific experiments or to proof how the results from the workflow came to be. It is therefore vital that the provenance data that is stored in the provenance database is always synchronized with its corresponding workflow, to verify that the provenance database has not been tampered with. The blockchain technology has been gaining a lot of attention in recent years since Satoshi Nakamoto released his Bitcoin paper in 2009. The blockchain technology consists of a peer-to-peer network where an append-only ledger is stored and replicated across a peer-to-peer network and offers high tamper-resistance through its consensus protocols. In this thesis, the option of whether the blockchain technology is a suitable solution for synchronizing workflow with its provenance data was explored. A system that generates a workflow, based on a definition written in a Domain Specific Language, was extended to utilize the blockchain technology to synchronize the workflow itself and its results. Furthermore, the InterPlanetary File System was utilized to assist with the versioning of individual executions of the workflow. The InterPlanetary File System provided the functionality of comparing individual workflows executions in more detail and to discover how they differ. The solution was analyzed with respect to the 21 CFR Part 11 regulations imposed by the FDA in order to see how it could assist with fulfilling the requirements of the regulations. Analysis on the system shows that the blockchain extension can be used to verify if the synchronization between a workflow and its results has been tampered with. Experiments revealed that the size of the workflow did not have a significant effect on the execution time of the extension. Additionally, the proposed solution offers a constant cost in digital currency regardless of the workflow. However, even though the extension shows some promise of assisting with fulfilling the requirements of the 21 CFR Part 11 regulations, analysis revealed that the extension does not fully comply with it due to the complexity of the regulations / I vetenskapliga arbetsflöden är usprung (eng. provenance) av dataviktigt. Genom att spåra ursprunget av data, i form av dokumentation,kan datas ursprung sparas. Detta kan användas för att återskapavetenskapliga experiment eller för att bevisa hur resultat från arbetsflödegenererats. Det är därför viktigt att datas ursprung, som lagrasi ursprungsdatabasen, alltid är synkroniserad med dess motsvarandearbetsflöde som ett sätt att verifiera att ursprungsdatabasen intehar manipulerats. Blockchainteknologi har fått mycket uppmärksamhetde senaste åren sen Satoshi Nakamoto släppte sin Bitcoin artikelår 2009. Blockchainteknologi består av ett peer-to-peer nätverk där endastbifogning tillåts i en liggare som är replikerad över ett peer-topeernätverk vilken tillhandahåller hög manipuleringsresistans genomkonsensusprotokoll. I denna uppsats undersöks hurvida blockchainteknologi är en passande lösning för arbetsflödessynkronisering avursprungsdata. Ett system som genererar ett arbetsflöde, baserat påen definition som skrivits i ett domänspecifikt språk, var förlängt föratt utnyttja blockchainteknologi för synkronisering av arbetsflödet ochdess resultat. InterPlanetary File System användes för att assistera medversionshanteringen av individuella exekveringar av arbetsflödet. InterPlanetaryFile System tillhandahöll funktionalitet för att jämföra individuellaarbetsflödesexekveringar mer detaljerat samt att upptäckahur de skiljer sig åt. Resultaten är analyserade med hänsyn till 21 CFRPart 11 regleringar från FDA för att se hur resultaten kan assistera medatt uppfylla kraven av förordningarna. Analys av systemen visar attblockchainförlängningen kan användas för att verifiera att synkroniseringenmellan arbetsflödet och dess resultat inte har manipulerats.Experimenten visade att storleken av arbetsflödet inte hade märkbareffekt på exekveringstiden av förlängningen. Därutöver möjliggör denpresenterade lösningen en konstant kostnad i digital valuta oavsett arbetsflödetsstorlek. Även om förlängningen visar lovande resultat förassistering av fullföljande av 21 CFR Part 11 regleringarna påvisar analysatt förlängningen inte fullständigt uppfyller kraven på grund avkomplexiteten av dessa regleringar.
|
Page generated in 0.0362 seconds