Return to search

Enhancing digital text collections with detailed metadata to improve retrieval

Digital text collections are increasingly important, as they enable researchers to explore new ways of interacting with texts through the use of technology. Various tools have been developed to facilitate exploring and searching in text collections at a fairly low level of granularity. Ideally, it should be possible to filter the results at a greater level of granularity to retrieve only specific instances in which the researcher is interested.
The aim of this study was to investigate to what extent detailed metadata could be used to enhance texts in order to improve retrieval. To do this, the researcher had to identify metadata that could be useful to filter according to and find ways in which these metadata can be applied to or encoded in texts. The researcher also had to evaluate existing tools to determine to what extent current tools support retrieval on a fine-grained level. After identifying useful metadata and reviewing existing tools, the researcher could suggest a metadata framework that could be used to encode texts on a detailed level. Metadata in five different categories were used, namely morphological, syntactic, semantic, functional and bibliographic. A further contribution in this metadata framework was the addition of in-text bibliographic metadata, to use where sections in a text have different properties than those in the main text.
The suggested framework had to be tested to determine if retrieval was indeed improved. In order to do so, a selection of texts was encoded with the suggested framework and a prototype was developed to test the retrieval. The prototype receives the encoded texts and stores the information in a database. A graphical user interface was developed to enable searching in the database in an easy and intuitive manner.
The prototype demonstrates that it is possible to search for words or phrases with specific properties when detailed metadata are applied to texts. The fine-grained metadata from five different categories enable retrieval on a greater level of granularity and specificity. It is therefore recommended that detailed metadata are used to encode texts in order to improve retrieval in digital text collections.
Keywords: metadata, digital humanities, digital text collections, retrieval, encoding / Thesis (DPhil (Information Science))--University of Pretoria, 2020. / Information Science / DPhil (Information Science) / Unrestricted

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:up/oai:repository.up.ac.za:2263/79015
Date January 2020
CreatorsBall, Liezl Hilde
ContributorsBothma, T.J.D. (Theodorus Jan Daniel), liezl.ball@up.ac.za
PublisherUniversity of Pretoria
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Rights© 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.

Page generated in 0.0021 seconds