Spelling suggestions: "subject:"preservation off 3research data"" "subject:"preservation off 3research mata""
1 |
Data preservation and reproducibility at the LHCb experiment at CERNTrisovic, Ana January 2018 (has links)
This dissertation presents the first study of data preservation and research reproducibility in data science at the Large Hadron Collider at CERN. In particular, provenance capture of the experimental data and the reproducibility of physics analyses at the LHCb experiment were studied. First, the preservation of the software and hardware dependencies of the LHCb experimental data and simulations was investigated. It was found that the links between the data processing information and the datasets themselves were obscure. In order to document these dependencies, a graph database was designed and implemented. The nodes in the graph represent the data with their processing information, software and computational environment, whilst the edges represent their dependence on the other nodes. The database provides a central place to preserve information that was previously scattered across the LHCb computing infrastructure. Using the developed database, a methodology to recreate the LHCb computational environment and to execute the data processing on the cloud was implemented with the use of virtual containers. It was found that the produced physics events were identical to the official LHCb data, meaning that the system can aid in data preservation. Furthermore, the developed method can be used for outreach purposes, providing a streamlined way for a person external to CERN to process and analyse the LHCb data. Following this, the reproducibility of data analyses was studied. A data provenance tracking service was implemented within the LHCb software framework \textsc{Gaudi}. The service allows analysts to capture their data processing configurations that can be used to reproduce a dataset within the dataset itself. Furthermore, to assess the current status of the reproducibility of LHCb physics analyses, the major parts of an analysis were reproduced by following methods described in publicly and internally available documentation. This study allowed the identification of barriers to reproducibility and specific points where documentation is lacking. With this knowledge, one can specifically target areas that need improvement and encourage practices that would improve reproducibility in the future. Finally, contributions were made to the CERN Analysis Preservation portal, which is a general knowledge preservation framework developed at CERN to be used across all the LHC experiments. In particular, the functionality to preserve source code from git repositories and Docker images in one central location was implemented.
|
2 |
Hantering av data genom tid och rum : En records continuum-analys av hur humanistiska forskare hanterar forskningsdata för tillgängliggörande och bevarande / Data management through time and space : A records continuum analysis of how researchers in the humanities manage research data for sharing and preservation purposesSundberg, Sara January 2024 (has links)
While the preservation and sharing of research data are two topics well researched, there is a need to better understand the connections between them, especially from a Swedish perspective. In relation to this, it is interesting to investigate how researchers themselves are involved in these processes – where and how they preserve their data, for what reasons, how they manage their data for preservation and sharing, and furthermore what consequences this might have for the archiving of research data. The method used in this thesis is semi-structured interviews with 10 researchers from various Swedish universities, conducted in person, via Zoom or by e-mail. The researchers were chosen from their sharing activities on research data repositories. The interviews were processed in Taguette, where the data was organized by tags. The tags with their related statements were then organized into themes. Furthermore, a theoretical analysis based on the records continuum model was conducted. The primary reason that was stated for preservation was data sharing. The researchers expressed a wish for their research data to be reused, as well as stating reasons related to transparency. The researchers also expressed that their data management was influenced by future data sharing. One researcher had archived their data at the university, although most of the participants was positive to doing so in the future. It appears that the researchers in this study takes initiative when it comes to the preservation and sharing of their data. Most of the participants view the data repository as a good platform for preservation, possibly because such platforms can fulfil their reasons for participating in preservation activities. This is a two years master’s thesis in Archival Science.
|
Page generated in 0.1379 seconds