Global ETD Search

361	Generic Metadata Handling in Scientific Data Life Cycles Grunzke, Richard 12 April 2016 (has links) Scientific data life cycles define how data is created, handled, accessed, and analyzed by users. Such data life cycles become increasingly sophisticated as the sciences they deal with become more and more demanding and complex with the coming advent of exascale data and computing. The overarching data life cycle management background includes multiple abstraction categories with data sources, data and metadata management, computing and workflow management, security, data sinks, and methods on how to enable utilization. Challenges in this context are manifold. One is to hide the complexity from the user and to enable seamlessness in using resources to usability and efficiency. Another one is to enable generic metadata management that is not restricted to one use case but can be adapted with limited effort to further ones. Metadata management is essential to enable scientists to save time by avoiding the need for manually keeping track of data, meaning for example by its content and location. As the number of files grows into the millions, managing data without metadata becomes increasingly difficult. Thus, the solution is to employ metadata management to enable the organization of data based on information about it. Previously, use cases tended to only support highly specific or no metadata management at all. Now, a generic metadata management concept is available that can be used to efficiently integrate metadata capabilities with use cases. The concept was implemented within the MoSGrid data life cycle that enables molecular simulations on distributed HPC-enabled data and computing infrastructures. The implementation enables easy-to-use and effective metadata management. Automated extraction, annotation, and indexing of metadata was designed, developed, integrated, and search capabilities provided via a seamless user interface. Further analysis runs can be directly started based on search results. A complete evaluation of the concept both in general and along the example implementation is presented. In conclusion, generic metadata management concept advances the state of the art in scientific date life cycle management. info:eu-repo/classification/ddc/004 ddc:004
362	Analýza institucionálních repozitářů provozovaných v systému DSpace v České republice / The analysis of the institutional repositories in the Czech Republic using DSpace system Kovaříková, Lenka January 2019 (has links) The goal of this thesis is to analyze, compare and evaluate available institutional repositories (IR) in Czechia, which are stored in the DSpace system. The development of IR and its circumstances is described through comparison of information available from registry of repositories, journals articles, presentations, and qualitative interviews. Each IR was compared by version, area of expertise, types of stored documents, organization, language mutations, number of records, administration of data entry and data exit processes, system login and user rights. A bigger part of this thesis is dedicated to search options and metadata. As a result of that the analysis of collected data brought a summary of characteristics and system options. It highlighted the positives such as the system integration of repositories, as well as negatives such as overlooking of subject description. The result of the analysis are system atributes and options. brought a summary of the system's features and capabilities. She pointed out the positives as the ability of systematic interconnection of repositories and shortcomings in the form of neglecting the subject description.
363	Nyttiggörande avmaskininlärningsmodeller i verksamheten : Ökad metadatakvalitet med stöd från maskininlärning / Utilization ofmachine learning models in the business Engblom, Emil January 2020 (has links) Photographs, documents and other types of digitised data from the cultural heritage are collected in central databases to be made available to the public. These databases are known as aggregators. The aggregated data often have different purpose and formats, since they are created to suit the purpose of an individual institution. Metadata is data describing other data and is used to streamline the search through the different stored object within the aggregators.If all the stored metadata uses the same decided standard the search among the objects is quick and efficient. It is a common problem within aggregators that the stored metadata is of a lacking quality. When the quality of the metadata is lacking the search among the objects within the aggregator is slow, difficult and timeconsuming. The search may even give faulty results. In some cases data can go lost within large collections of data if the metadata is incorrect or missing. The knowledge about digitalisation and the resources to perform it, are often lacking in eg. a museum. This can sometimes lead to errors in the metadata. In 2019 a modell within machinelearning was developed during a project with the purpose to identify errors in the metadata of the swedish cultural heritage board’s aggregator K-samsök. In this study the modells ability to identify errors been evaluated. This evaluation was used to answer the following question: How can good quality of metadata be maintained whithin a organisation with support from a modell whithin machinelearning? This research contributes to the academy by informing the academy that there is still a problem that the quality of metadata in aggregators is of lacking quality. The research also provides suggestions for solutions to the problem, which in turn can give rise to further research. These solution suggestions are also of value to the Swedish National Heritage Board, as the study has been conducted with a focus on their aggregator K-samsök. The machine learning models can also be further developed and implemented by the Swedish National Heritage Board, which means that the models can provide value in the form of a basis to start from when improving the quality of the metadata stored in K-samsök Artificiell intelligens maskininlärning kulturarv metadata aggregator och metadatakvalitet. Information Systems
364	Metadata Management in Multi-Grids and Multi-Clouds Espling, Daniel January 2011 (has links) Grid computing and cloud computing are two related paradigms used to access and use vast amounts of computational resources. The resources are often owned and managed by a third party, relieving the users from the costs and burdens of acquiring and managing a considerably large infrastructure themselves. Commonly, the resources are either contributed by different stakeholders participating in shared projects (grids), or owned and managed by a single entity and made available to its users with charging based on actual resource consumption (clouds). Individual grid or cloud sites can form collaborations with other sites, giving each site access to more resources that can be used to execute tasks submitted by users. There are several different models of collaborations between sites, each suitable for different scenarios and each posing additional requirements on the underlying technologies. Metadata concerning the status and resource consumption of tasks are created during the execution of the task on the infrastructure. This metadata is used as the primary input in many core management processes, e.g., as a base for accounting and billing, as input when prioritizing and placing incoming task, and as a base for managing the amount of resources allocated to different tasks. Focusing on management and utilization of metadata, this thesis contributes to a better understanding of the requirements and challenges imposed by different collaboration models in both grids and clouds. The underlying design criteria and resulting architectures of several software systems are presented in detail. Each system addresses different challenges imposed by cross-site grid and cloud architectures: The LUTSfed approach provides a lean and optional mechanism for filtering and management of usage data between grid or cloud sites. An accounting and billing system natively designed to support cross-site clouds demonstrates usage data management despite unknown placement and dynamic task resource allocation. The FSGrid system enables fairshare job prioritization across different grid sites, mitigating the problems of heterogeneous scheduling software and local management policies. The results and experiences from these systems are both theoretical and practical, as full scale implementations of each system has been developed and analyzed as a part of this work. Early theoretical work on structure-based service management forms a foundation for future work on structured-aware service placement in cross- site clouds. grid computing cloud computing accounting billing metadata monitoring structure fairshare scheduling federated Computer Sciences Datavetenskap (datalogi)
365	KTHFS – A HIGHLY AVAILABLE ANDSCALABLE FILE SYSTEM D'Souza, Jude Clement January 2013 (has links) KTHFS is a highly available and scalable file system built from the version 0.24 of the Hadoop Distributed File system. It provides a platform to overcome the limitations of existing distributed file systems. These limitations include scalability of metadata server in terms of memory usage, throughput and its availability. This document describes KTHFS architecture and how it addresses these problems by providing a well coordinated distributed stateless metadata server (or in our case, Namenode) architecture. This is backed with the help of a persistence layer such as NDB cluster. Its primary focus is towards High Availability of the Namenode. It achieves scalability and recovery by persisting the metadata to an NDB cluster. All namenodes are connected to this NDB cluster and hence are aware of the state of the file system at any point in time. In terms of High Availability, KTHFS provides Multi-Namenode architecture. Since these namenodes are stateless and have a consistent view of the metadata, clients can issue requests on any of the namenodes. Hence, if one of these servers goes down, clients can retry its operation on the next available namenode. We next discuss the evaluation of KTHFS in terms of its metadata capacity for medium and large size clusters, throughput and high availability of the Namenode and an analysis of the underlying NDBcluster. Finally, we conclude this document with a few words on the ongoing and future work in KTHFS. Namenode NDB cluster MySQL cluster KTHFS HDFS metadata High Availability Scalability throughput Engineering and Technology Teknik och teknologier
366	Algoritm för automatiserad generering av metadata Karlsson, Fredrik, Berg, Fredrik January 2015 (has links) Sveriges Radio stores their data in large archives which makes it hard to retrieve specific information. The sheer size of the archives makes retrieving information about a specific event difficult and causes a big problem. To solve this problem a more consistent use of metadata is needed. This resulted in an investigation about metadata and keyword genera-tion.The appointed task was to automatically generate keywords from transcribed radio shows. This included an investigation of which systems and algorithms that can be used to generate keywords, based on previous works. An application was also developed which suggests keywords based on a text to a user. This application was tested and compared to other al-ready existing software, as well as different methods/techniques based on both linguistic and statistic algorithms. The resulting analysis displayed that the developed application generated many accurate keywords, but also a large amount of keywords in general. The comparison also showed that the recall for the developed algorithm got better results than the already existing software, which in turn produced a better precision in their keywords. / Sveriges Radio sparar sin data i stora arkiv vilket gör det svårt att hitta specifik information. På grund av denna storlek blir uppgiften att hitta specifik information om händelser ett stort problem. För att lösa problemet krävs en mer konsekvent användning av metadata, därför har en undersökning om metadata och nyckelordsgenerering gjorts.Arbetet gick ut på att utveckla en algoritm som automatisk kan generera nyckelord från transkriberade radioprogram. Det ingick också i arbetet att göra en undersökning av tidigare arbeten för att se vilka system och algoritmer som kan användas för att generera nyckelord. Dessutom utvecklades en applikation som generar färdiga nyckelord som förslag till en användare. Denna applikation jämfördes och utvärderades med redan existerande program. Metoderna som använts bygger på både lingvistiska och statistiska algoritmer. En analys av resultaten gjordes och visade att den utvecklade applikationen genererade många precisa nyckelord, men även till antalet stora mängder nyckelord. Jämförelsen med ett redan existe-rande program visade att täckningen var bättre för den utvecklade applikationen, samtidigt som precisionen var bättre för det redan existerande programmet. Metadata nyckelord textutvinning naturliga språk algoritmer. Other Computer and Information Science Annan data- och informationsvetenskap
367	Mapping and identifying misplaced devices on a network by use of metadata Fenn, Edward, Fornling, Eric January 2017 (has links) Context. Network placement of devices is an issue of operational security for most companies today. Since a misplaced device can compromise an entire network and in extension, a company, it is essential to keep track of what is placed where. Knowledge is the key to success, and knowing your network is required to make it secure. Large networks however may be hard to keep track of, since employees can connect or unplug devices and make it hard for the administrators to keep updated on the network at all times. Objectives. This analysis focuses on the creation of an analysis method for network mapping based on metadata. This analysis method is to be implemented in a tool that automatically maps a network based on specific metadata attributes. The motivation and goal for this study is to create a method that will improve network mapping with regard to identifying misplaced devices, and to achieve a better understanding of the impact misplaced systems can have on a network. Method. The method for analyzing the metadata was manually checking the network metadata that was gathered by Outpost24 AB’s proprietary vulnerability scanner. By analyzing this metadata, certain attributes were singled out as necessary for the identification. These attributes were then implemented in a probability function that based on the information determines the device type. The results from the probability function are then presented visually as a network graph. A warning algorithm was then run against these results and prompted warnings when finding misplaced devices on subnets. Results. The proposed method is deemed to be 30 878 times faster than the previous method, i.e. the manual control of metadata. It is however not as accurate with an identification rate of between 80% and 93% of devices and correct device type identification of 95-98% of the identified devices. This is as opposed to the previous method, i.e. the manual control of metadata, with 80-93%% identification rate and 100% correct device type identification. The proposed method also flagged 48.9% of the subnets as misconfigured. Conclusion. In conclusion, the proposed method proves that it is indeed possible to identify misplaced devices on networks based on metadata analysis. The proposed method is also considerably faster than the previous method, but does need some further work to be as efficient as the previous method and reach a 100% device type identification rate. / Kontext. Placeringen av enheter i nätverk har idag blivit en säkerhetsfråga för de flesta företagen. Eftersom en felplacerad enhet kan äventyra ett helt nätverk, och i förlängning, ett företag så är det essentiellt att ha koll på vad som är placerat vart. Kunskap är nyckeln till framgång, och att ha kunskap om sin nätverksstruktur är avgörande för att göra nätverket säkert. Stora nätverk kan dock vara svåra att ha koll på om anställda kan lägga till eller ta bort enheter, och på så sätt göra det svårt för administratören att ständigt hålla sig uppdaterad om vad som finns vart. Mål. Den här studien fokuserar på skapandet av en analysmetod för att kartlägga ett nätverk baserat på metadata från nätverket. Analysmetoden ska sedan implementeras i ett verktyg som sedan automatiskt kartlägger nätverket utifrån den metadata som valts ut i analysmetoden. Motivationen och målet med den här studien är att skapa en metod som förbättrar nätverkskartläggning med syftet att identifiera felplacerade enheter, och att uppnå en större förståelse för den inverkan felplacerade enheter kan få för ett nätverk. Metod. Metoden för att analysera metadatan var att genom att för hand leta igenom den metadata som Outpost24 ABs sårbarhetsskanner samlade in när den letade efter sårbarheter i ett nätverk. Genom att analysera metadatan så kunde vi singla ut enskilda bitar som vi ansåg vara nödvändiga för att identifiera enhetens typ. Dessa attribut implementerades sedan i en sannolikhetsfunktion som avgjorde vilken typ en enhet hade, baserat på informationen i metadatan. Resultatet från denna sannolikhetsfunktion presenterades sedan visuellt som en graf. En algoritm som matade ut varningar om den hittade felkonfigurerade subnät kördes sedan mot resultaten från sannolikhetsfunktionen. Resultat. Den i den här rapporten föreslagna metoden är fastställt till att vara cirka 30 878 gånger snabbare än föregående metoder, i.e. att leta igenom metadatan för hand. Dock så är den föreslagna metoden inte lika exakt då den har en identifikationsgrad på 80-93% av enheterna på nätverket, och en korrekt identifikationsgrad på enhetstypen på 95-98% av de identifierade enheterna. Detta till skillnad från den föregående metoden som hade 80-93% respektive 100% identifikationsgrad. Den föreslagna metoden identifierade också 48.9% av alla subnät som felkonfigurerade. Sammanfattning. För att sammanfatta så bevisar den föreslagna metoden att det är möjligt att identifiera felplacerade enheter på ett nätverk utifrån en analys av nätverkets metadata. Den föreslagna metoden är dessutom avsevärt snabbare än föregående metoder, men behöver utvecklas mer för att nå samma identifikationsgrad som föregående metoder. Det här arbetet kan ses som ett proof-of-concept gällande identifikation av enheter baserat på metadata, och behöver därför utvecklas för att nå sin fulla potential. metadata analysis network mapping visualization device placement Engineering and Technology Teknik och teknologier
368	Enhancing digital text collections with detailed metadata to improve retrieval Ball, Liezl Hilde January 2020 (has links) Digital text collections are increasingly important, as they enable researchers to explore new ways of interacting with texts through the use of technology. Various tools have been developed to facilitate exploring and searching in text collections at a fairly low level of granularity. Ideally, it should be possible to filter the results at a greater level of granularity to retrieve only specific instances in which the researcher is interested. The aim of this study was to investigate to what extent detailed metadata could be used to enhance texts in order to improve retrieval. To do this, the researcher had to identify metadata that could be useful to filter according to and find ways in which these metadata can be applied to or encoded in texts. The researcher also had to evaluate existing tools to determine to what extent current tools support retrieval on a fine-grained level. After identifying useful metadata and reviewing existing tools, the researcher could suggest a metadata framework that could be used to encode texts on a detailed level. Metadata in five different categories were used, namely morphological, syntactic, semantic, functional and bibliographic. A further contribution in this metadata framework was the addition of in-text bibliographic metadata, to use where sections in a text have different properties than those in the main text. The suggested framework had to be tested to determine if retrieval was indeed improved. In order to do so, a selection of texts was encoded with the suggested framework and a prototype was developed to test the retrieval. The prototype receives the encoded texts and stores the information in a database. A graphical user interface was developed to enable searching in the database in an easy and intuitive manner. The prototype demonstrates that it is possible to search for words or phrases with specific properties when detailed metadata are applied to texts. The fine-grained metadata from five different categories enable retrieval on a greater level of granularity and specificity. It is therefore recommended that detailed metadata are used to encode texts in order to improve retrieval in digital text collections. Keywords: metadata, digital humanities, digital text collections, retrieval, encoding / Thesis (DPhil (Information Science))--University of Pretoria, 2020. / Information Science / DPhil (Information Science) / Unrestricted UCTD Information science metadata digital humanities retrieval encoding digital text collections
369	Správa, vyhledávání a zpřístupňování elektronických vysokoškolských kvalifikačních prací / Management, Retrieval and Access to Electronic Theses and Dissertations Mach, Jan January 2015 (has links) The dissertation is devoted to analysis of current practice and trends in providing repositories of electronic theses and dissertation (ETDs) in terms of their management, searching and dissemination. The first part presents terminology and the current state of access to ETDs in Czech and foreign repositories and includes results of a survey of the state of access to ETDs in the Czech Republic which was completed in 2014 by all public universities. In the second part, a metadata standard is presented, particularly the possibility of mapping EVSKP-MS metadata elements to other metadata formats and utilization within the OAI-PMH protocol. The issue of access to ETDs is dealt with further in terms of metrics for an evaluation of usage of distributed ETDs. Searching for ETDs is also described in case studies as are recommendations for public tenders for a discovery service and for creating an ETD metadata search server and an associated user interface with faceted search. The final part of the thesis focuses on the issue of plagiarism. This incorporates a presentation and analysis of the most important plagiarism detection systems and a case study of the development of the portal Validátor VŠE to provide access to results of document analysis.
370	Ranked Similarity Search of Scientific Datasets: An Information Retrieval Approach Megler, Veronika Margaret 04 June 2014 (has links) In the past decade, the amount of scientific data collected and generated by scientists has grown dramatically. This growth has intensified an existing problem: in large archives consisting of datasets stored in many files, formats and locations, how can scientists find data relevant to their research interests? We approach this problem in a new way: by adapting Information Retrieval techniques, developed for searching text documents, into the world of (primarily numeric) scientific data. We propose an approach that uses a blend of automated and curated methods to extract metadata from large repositories of scientific data. We then perform searches over this metadata, returning results ranked by similarity to the search criteria. We present a model of this approach, and describe a specific implementation thereof performed at an ocean-observatory data archive and now running in production. Our prototype implements scanners that extract metadata from datasets that contain different kinds of environmental observations, and a search engine with a candidate similarity measure for comparing a set of search terms to the extracted metadata. We evaluate the utility of the prototype by performing two user studies; these studies show that the approach resonates with users, and that our proposed similarity measure performs well when analyzed using standard Information Retrieval evaluation methods. We performed performance tests to explore how continued archive growth will affect our goal of interactive response, developed and applied techniques that mitigate the effects of that growth, and show that the techniques are effective. Lastly, we describe some of the research needed to extend this initial work into a true "Google for data". Big data Metadata -- Management Scientific archives -- Research Data Storage Systems

Search results