Global ETD Search

1	Simple Digital Libraries Phiri, Lighton 01 August 2013 (has links) The design of Digital Library Systems (DLSes) has evolved overtime, both in sophistication and complexity, to complement the complex nature and sheer size of digital content being curated. However, there is also a growing demand from content curators, with relatively small-size collections, for simpler and more manageable tools and services to manage their content. The reasons for this particular need are driven by the assumption that simplicity and manageability might ultimately translate to lower costs of maintenance of such systems. This research proposes and advocates for a minimalist and simplistic approach to the overall design of DLSes. It is hypothesised that Digital Library (DL) tools and services based on such designs could potentially be easy to use and manage. A meta-analysis of existing DL and non-DL tools was conducted to aid the derivation of design principles for simple DLSes. The desig n principles were then mapped to design decisions applied to the design of a prototype simple repository. In order to assess the effectiveness of the simple repository design, two real-world case study collections were implemented based on the design. In addition, a developer-oriented study was conducted using one of the case study collections to evaluate the simplicity and ease of use of the prototype system. Furthermore, performance experiments were conducted to establish the extent to which such a simple design approach would scale and also establish comparative advantages to existing designs. In general, the study outlined some possible implications of simplifying DLS design; specifically the results from the developer-oriented user study indicate that simplicity in the design of the DLS repository sub-layer does not severely impact the interaction between the service sub-layer and the repository sub-layer. Furthermore, the scalability experiments indicate that desirable performance results for small- and medium-sized collections are attainable. The practical implication of the proposed design approach is two-fold: firstly the minimalistic design has the potential to be used to design simple and yet easy to use tools with comparable features to those exhibited by well-established DL tools; and secondly, the principled design approach has the potential to be applied to the design of non-DL application domains. Read more H.3 INFORMATION STORAGE AND RETRIEVAL
2	A Hybrid Scavenger Grid Approach to Intranet Search Nakashole, Ndapandula 01 February 2009 (has links) According to a 2007 global survey of 178 organisational intranets, 3 out of 5 organisations are not satisﬁed with their intranet search services. However, as intranet data collections become large, effective full-text intranet search services are needed more than ever before. To provide an effective full-text search service based on current information retrieval algorithms, organisations have to deal with the need for greater computational power. Hardware architectures that can scale to large data collections and can be obtained and maintained at a reasonable cost are needed. Web search engines address scalability and cost-effectiveness by using large-scale centralised cluster architectures. The scalability of cluster architectures is evident in the ability of Web search engines to respond to millions of queries within a few seconds while searching very large data collections. Though more cost-effective than high-end supercomputers, cluster architectures still have relatively high acquisition and maintenance costs. Where information retrieval is not the core business of an organisation, a cluster-based approach may not be economically viable. A hybrid scavenger grid is proposed as an alternative architecture — it consists of a combination of dedicated and dynamic resources in the form of idle desktop workstations. From the dedicated resources, the architecture gets predictability and reliability whereas from the dynamic resources it gets scalability. An experimental search engine was deployed on a hybrid scavenger grid and evaluated. Test results showed that the resources of the grid can be organised to deliver the best performance by using the optimal number of machines and scheduling the optimal combination of tasks that the machines perform. A system efﬁciency and cost-effectiveness comparison of a grid and a multi-core machine showed that for workloads of modest to large sizes, the grid architecture delivers better throughput per unit cost than the multi-core, at a system efﬁciency that is comparable to that of the multi-core. The study has shown that a hybrid scavenger grid is a feasible search engine architecture that is cost-effective and scales to medium- to large-scale data collections. Read more H.3 INFORMATION STORAGE AND RETRIEVAL C.4 PERFORMANCE OF SYSTEMS
3	Meta-standardisation of Interoperability Protocols Paihama, Jorgina Kaumbe do Rosário 01 June 2012 (has links) The current medley of interoperability protocols is potentially problematic. Each protocol is designed by a different group, each provides a single service, and has its own syntax and vocabulary. Popular protocols such as RSS are designed with simple and easy to understand documentation, which is a key factor for the high adoption levels. But the majority of protocols are complex, making them relatively difficult for programmers to understand and implement. This research proposes a possible new direction for high-level interoperability protocols design. The High-level Interoperability Protocol - Common Framework (HIP-CF) is designed and evaluated as a proof of concept that if interoperability is made simpler, then it can increase adoption levels, making it easier for programmers to understand and implement protocols, therefore leading to more interoperable systems. HIP-CF is not suggested as the alternative to current production protocols. Rather it is suggested that the design approach taken by HIP-CF can be applied to other protocols, and also that a suite of simpler protocols is a better solution than various simple individual protocols. Evaluation results show that current protocols can be substantially improved on. These improvements could and maybe should be the result of a deeper analysis of the goals of today’s protocols and also a collaboration amongst the different groups that design high-level interoperability protocols. This research presents a new approach and suggests future experimental research options for the field of high-level interoperability protocol design. Read more H.1 MODELS AND PRINCIPLES H.3 INFORMATION STORAGE AND RETRIEVAL
4	Learning to Read Bushman: Automatic Handwriting Recognition for Bushman Languages Williams, Kyle 01 January 2012 (has links) The Bleek and Lloyd Collection contains notebooks that document the tradition, language and culture of the Bushman people who lived in South Africa in the late 19th century. Transcriptions of these notebooks would allow for the provision of services such as text-based search and text-to-speech. However, these notebooks are currently only available in the form of digital scans and the manual creation of transcriptions is a costly and time-consuming process. Thus, automatic methods could serve as an alternative approach to creating transcriptions of the text in the notebooks. In order to evaluate the use of automatic methods, a corpus of Bushman texts and their associated transcriptions was created. The creation of this corpus involved: the development of a custom method for encoding the Bushman script, which contains complex diacritics; the creation of a tool for creating and transcribing the texts in the notebooks; and the running of a series of workshops in which the tool was used to create the corpus. The corpus was used to evaluate the use of various techniques for automatically transcribing the texts in the corpus in order to determine which approaches were best suited to the complex Bushman script. These techniques included the use of Support Vector Machines, Artificial Neural Networks and Hidden Markov Models as machine learning algorithms, which were coupled with different descriptive features. The effect of the texts used for training the machine learning algorithms was also investigated as well as the use of a statistical language model. It was found that, for Bushman word recognition, the use of a Support Vector Machine with Histograms of Oriented Gradient features resulted in the best performance and, for Bushman text line recognition, Marti & Bunke features resulted in the best performance when used with Hidden Markov Models. The automatic transcription of the Bushman texts proved to be difficult and the performance of the different recognition systems was largely affected by the complexities of the Bushman script. It was also found that, besides having an influence on determining which techniques may be the most appropriate for automatic handwriting recognition, the texts used in a automatic handwriting recognition system also play a large role in determining whether or not automatic recognition should be attempted at all. Read more I.7 DOCUMENT AND TEXT PROCESSING H.3 INFORMATION STORAGE AND RETRIEVAL
5	Cloud Computing for Digital Libraries Poulo, Lebeko Bearnard 01 May 2013 (has links) Information management systems (digital libraries/repositories, learning management systems, content management systems) provide key technologies for the storage, preservation and dissemination of knowledge in its various forms, such as research documents, theses and dissertations, cultural heritage documents and audio files. These systems can make use of cloud computing to achieve high levels of scalability, while making services accessible to all at reasonable infrastructure costs and on-demand. This research aims to develop techniques for building scalable digital information management systems based on efficient and on-demand use of generic grid-based technologies such as cloud computing. In particular, this study explores the use of existing cloud computing resources offered by some popular cloud computing vendors such as Amazon Web Services. This involves making use of Amazon Simple Storage Service (Amazon S3) to store large and increasing volumes of data, Amazon Elastic Compute Cloud (Amazon EC2) to provide the required computational power and Amazon SimpleDB for querying and data indexing on Amazon S3. A proof-of-concept application comprising typical digital library services was developed and deployed in the cloud environment and evaluated for scalability when the demand for more data and services increases. The results from the evaluation show that it is possible to adopt cloud computing for digital libraries in addressing issues of massive data handling and dealing with large numbers of concurrent requests. Existing digital library systems could be migrated and deployed into the cloud. Read more H.3 INFORMATION STORAGE AND RETRIEVAL C.1 PROCESSOR ARCHITECTURES
6	Designing an interface to provide new functionality for the post-processing of web-based annotations. du Toit, Nicola 01 March 2014 (has links) Systems to annotate online content are becoming increasingly common on the World Wide Web. While much research and development has been done for interfaces that allow users to make and view annotations, few annotation systems provide functionality that extends beyond this and allows users to also manage and process collections of existing annotations. Siyavula Education is a social enterprise that publishes high school Maths and Science textbooks online. The company uses annotations to collate collaborator and volunteer feedback (corrections, opinions, suggestions) about its books at various phases in the book-writing life cycle. Currently the company captures annotations on PDF versions of their books. The web-based software they use allows for some filtering and sorting of existing annotations, but the system is limited and not ideal for their rather specialised requirements. In an attempt to move away from a proprietary, PDF-based system Siyavula implemented annotator (http://okfnlabs.org/annotator/), software which allowed for the annotation of HTML pages. However, this software was not coupled with a back-end interface that would allow users to interact with a database of saved annotations. To enable this kind of interaction, a prototype interface was designed and is presented here. The purpose of the interface was to give users new and improved functionality for querying and manipulating a collection of web-based annotations about Siyavula’s online content. Usability tests demonstrated that the interface was successful at giving users this new and necessary functionality (including filtering, sorting and searching) to process annotations. Once integrated with front-end software (such as Annotator) and issue tracking software (such as GitHub) the interface could form part of a powerful new tool for the making and management of annotations on the Web. Read more H.3 INFORMATION STORAGE AND RETRIEVAL
7	Optimising Information Retrieval from the Web in Low-bandwidth Environments Balluck, Ashwinkoomarsing 01 June 2007 (has links) The Internet has potential to deliver information to Web users that have no other way of getting to those resources. However, information on the Web is scattered without any proper semantics for classifying them and thus this makes information discovery difficult. Thus, to ease the querying of this huge bin of information, developers have built tools amongst which are the search engines and Web directories. However, for these tools to give optimal results, two factors need to be given due importance: the users’ ability to use these tools and the bandwidth that is present in these environments. Unfortunately, after an initial study, none of these two factors were present in Mauritius where low bandwidth prevails. Hence, this study helps us get a better idea of how users use the search tools. To achieve this, we designed a survey where Web users were asked about their skills in using search tools. Then, a jump page using the search boxes of different search engines was developed to provide directed guidance for effective searching in low bandwidth environments. We then conducted a further evaluation, using a sample of users to see if there were any changes in the way users access the search tools. The results from this study were then examined. We noticed that the users were initially unaware about the specificities of the different search tools thus preventing efficient use. However, during the survey, they were educated on how to use those tools and this was fruitful when a further evaluation was performed. Hence the efficient use of the search tools helped in reducing the traffic flow in low bandwidth environments. Read more H.3 INFORMATION STORAGE AND RETRIEVAL C.2 COMPUTER-COMMUNICATION NETWORKS
8	Individual Document Management Techniques: an Explorative Study Sello, Mpho Constance 01 June 2007 (has links) Individuals are generating, storing and accessing more information than ever before. The information comes from a variety of sources such as the World Wide Web, email and books. Storage media is becoming larger and cheaper. This makes accumulation of information easy. When information is kept in large volumes, retrieving it becomes a problem unless there is a system in place for managing this. This study examined the techniques that users have devised to make retrieval of their documents easy and timely. A survey of user document management techniques was done through interviews. The uncovered techniques were then used to build an expert system that provides assistance with document management decision-making. The system provides recommendations on file naming and organization, document backup and archiving as well as suitable storage media. The system poses a series of questions to the user and offers recommendations on the basis of the responses given. The system was evaluated by two categories of users: those who had been interviewed during data collection and those who had not been interviewed. Both categories of users found the recommendations made by the system to be reasonable and indicated that the system was easy to use. Some users thought the system could be of great benefit to people new to computers. Read more H.4 INFORMATION SYSTEMS APPLICATIONS E.2 DATA STORAGE REPRESENTATIONS E.5 FILES H.3 INFORMATION STORAGE AND RETRIEVAL
9	An End-to-End Solution for Complex Open Educational Resources Mohamed Nour, Morwan 01 November 2012 (has links) Open access and open resources have gained much attention from the world in the last few years. The interest in sharing information freely by the use of the World Wide Web has grown rapidly in many different fields. Now, information is available in many different data forms because of the continuous evolution in technology. The main objective of this thesis is to provide content creators and educators with a solution that simplifies the process of depositing into digital repositories. We created a desktop tool named ORchiD, Open educational Resources Depositor, to achieve this goal. The tool encompasses educational metadata and content packaging standards to create packages while conforming to a deposit protocol to ingest resources to repositories. A test repository was installed and adapted to handle Open Educational Resources. The solution proposed is centered on the front-end application which handles the complex objects on the user desktop. The desktop application allows the user to select and describe his/her resource(s) then creates the package and forwards it to the specified repository using the deposit protocol. The solution is proved to be simple for users but also in need of further improvements specifically in association to the metadata standard presented to user. Read more H.0 GENERAL H.3 INFORMATION STORAGE AND RETRIEVAL
10	Improving searchability of automatically transcribed lectures through dynamic language modelling Marquard, Stephen 01 December 2012 (has links) Recording university lectures through lecture capture systems is increasingly common. However, a single continuous audio recording is often unhelpful for users, who may wish to navigate quickly to a particular part of a lecture, or locate a specific lecture within a set of recordings. A transcript of the recording can enable faster navigation and searching. Automatic speech recognition (ASR) technologies may be used to create automated transcripts, to avoid the significant time and cost involved in manual transcription. Low accuracy of ASR-generated transcripts may however limit their usefulness. In particular, ASR systems optimized for general speech recognition may not recognize the many technical or discipline-specific words occurring in university lectures. To improve the usefulness of ASR transcripts for the purposes of information retrieval (search) and navigating within recordings, the lexicon and language model used by the ASR engine may be dynamically adapted for the topic of each lecture. A prototype is presented which uses the English Wikipedia as a semantically dense, large language corpus to generate a custom lexicon and language model for each lecture from a small set of keywords. Two strategies for extracting a topic-specific subset of Wikipedia articles are investigated: a naïve crawler which follows all article links from a set of seed articles produced by a Wikipedia search from the initial keywords, and a refinement which follows only links to articles sufficiently similar to the parent article. Pair-wise article similarity is computed from a pre-computed vector space model of Wikipedia article term scores generated using latent semantic indexing. The CMU Sphinx4 ASR engine is used to generate transcripts from thirteen recorded lectures from Open Yale Courses, using the English HUB4 language model as a reference and the two topic-specific language models generated for each lecture from Wikipedia. Three standard metrics – Perplexity, Word Error Rate and Word Correct Rate – are used to evaluate the extent to which the adapted language models improve the searchability of the resulting transcripts, and in particular improve the recognition of specialist words. Ranked Word Correct Rate is proposed as a new metric better aligned with the goals of improving transcript searchability and specialist word recognition. Analysis of recognition performance shows that the language models derived using the similarity-based Wikipedia crawler outperform models created using the naïve crawler, and that transcripts using similarity-based language models have better perplexity and Ranked Word Correct Rate scores than those created using the HUB4 language model, but worse Word Error Rates. It is concluded that English Wikipedia may successfully be used as a language resource for unsupervised topic adaptation of language models to improve recognition performance for better searchability of lecture recording transcripts, although possibly at the expense of other attributes such as readability. Read more I.2 ARTIFICIAL INTELLIGENCE I.7 DOCUMENT AND TEXT PROCESSING H.3 INFORMATION STORAGE AND RETRIEVAL

Search results