Global ETD Search

1	Measuring the applicability of Open Data Standards to a single distributed organisation: an application to the COMESA Secretariat Munalula, Themba 01 January 2008 (has links) Open data standardization has many known benefits, including the availability of tools for standard encoding formats, interoperability among systems and long term preservation of data. Mark-up languages and their use on the World Wide Web have implied further ease for data sharing. The Extensible Markup Language (XML), in particular, has succeeded due to its simplicity and ease of use. Its primary purpose is to facilitate the sharing of data across different information systems, particularly systems connected via the Internet. Whether open and standardized or not, organizations generate data daily. Offline exchange of documents and data is undertaken using existing formats that are typically defined by the organizations that generate the data in the documents. With the Internet, the realization of data exchange has had a direct implication on the need for interoperability and comparability. As much as standardization is the accepted approach for online data exchange, little is understood about how a specific organization’s data “fits” a given data standard. This dissertation develops data metrics that represent the extent to which data standards can be applied to an organization’s data. The research identified key issues that affect data interoperability or the feasibility of a move towards interoperability. This research tested the unwritten rule that organizational setups tend to regard and design data requirements more from internal needs than interoperability needs. Essentially, by generating metrics that affect a number of data attributes, the research quantified the extent of the gap that exists between organizational data and data standards. Key data attributes, i.e. completeness, concise representation, relevance and complexity, were selected and used as the basis for metric generation. Additional to the generation of attribute-based metrics, hybrid metrics representing a measure of the “goodness of fit” of the source data to standard data were generated. Regarding the completeness attribute, it was found that most Common Market for Eastern and Southern Africa (COMESA) head office data clusters had lower than desired metrics to match the gap highlighted above. The same applied to the concise representation attribute. Most data clusters had more concise representation for the COMESA data than the data standard. The complexity metrics generated confirmed the fact that the number of data elements is a key determinant in any move towards the adoption of data standards. This fact was also borne out by the magnitude of the hybrid metrics which to some extent depended on the complexity metrics. An additional contribution of the research was the inclusion of expert users’ weights to the data elements and recalculation of all metrics. A comparison with the unweighted metrics yielded a mixed picture. Among the completeness metrics and for the data retention rate in particular, increases were recorded for data clusters for which greater weight was allocated to mapped elements than to those that were not mapped. The same applied to the relative elements ratio. The complexity metrics showed general declines when user-weighted elements were used in the computation as opposed to the unweighted elements. This again was due to the fact that these metrics are dependent on the number of elements. Hence for the former case, the weights were evenly distributed while for the latter case, some elements were given lower weights by the expert users, hence leading to an overall decline in the metric. A number of implications emerged for COMESA. COMESA would have to determine the extent to which its source data rely on data sources for which international standards are being promoted. Secondly, an inventory of users and collectors of the data COMESA uses is necessary in order to determine who would be the beneficiary of a standards-based information system. Thirdly, and from an organizational perspective, COMESA needs to designate a team to guide the process of creation of such a standards-based information system. Lastly there is need for involvement in consortia that are responsible for these data standards. This has an implication on organizational resources. In totality, this research provided a methodology for determination of the feasibility of a move towards standardization and hence makes it possible to answer the critical first stage questions such a move begs answers to. H.1 MODELS AND PRINCIPLES E.2 DATA STORAGE REPRESENTATIONS
2	Individual Document Management Techniques: an Explorative Study Sello, Mpho Constance 01 June 2007 (has links) Individuals are generating, storing and accessing more information than ever before. The information comes from a variety of sources such as the World Wide Web, email and books. Storage media is becoming larger and cheaper. This makes accumulation of information easy. When information is kept in large volumes, retrieving it becomes a problem unless there is a system in place for managing this. This study examined the techniques that users have devised to make retrieval of their documents easy and timely. A survey of user document management techniques was done through interviews. The uncovered techniques were then used to build an expert system that provides assistance with document management decision-making. The system provides recommendations on file naming and organization, document backup and archiving as well as suitable storage media. The system poses a series of questions to the user and offers recommendations on the basis of the responses given. The system was evaluated by two categories of users: those who had been interviewed during data collection and those who had not been interviewed. Both categories of users found the recommendations made by the system to be reasonable and indicated that the system was easy to use. Some users thought the system could be of great benefit to people new to computers. H.4 INFORMATION SYSTEMS APPLICATIONS E.2 DATA STORAGE REPRESENTATIONS E.5 FILES H.3 INFORMATION STORAGE AND RETRIEVAL
3	Transcription of the Bleek and Lloyd Collection using the Bossa Volunteer Thinking Framework Munyaradzi, Ngoni 01 November 2013 (has links) The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the earliest habitants of Southern Africa. Previous attempts have been made to recognize the complex text in the notebooks using machine learning techniques, but due to the complexity of the manuscripts the recognition accuracy was low. In this research, a crowdsourcing based method is proposed to transcribe the historical handwritten manuscripts, where volunteers transcribe the notebooks online. An online crowdsourcing transcription tool was developed and deployed. Experiments were conducted to determine the quality of transcriptions and accuracy of the volunteers compared with a gold standard. The results show that volunteers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for \|Xam text and 95% for English text. When the \|Xam text transcriptions produced by the volunteers are compared with the gold standard, the volunteers achieve an average accuracy of 69.69%. Findings show that there exists a positive linear correlation between the inter-transcriber agreement and the accuracy of transcriptions. The user survey revealed that volunteers found the transcription process enjoyable, though it was difficult. Results indicate that volunteer thinking can be used to crowdsource intellectually-intensive tasks in digital libraries like transcription of handwritten manuscripts. Volunteer thinking outperforms machine learning techniques at the task of transcribing notebooks from the Bleek and Lloyd Collection. H.4 INFORMATION SYSTEMS APPLICATIONS E.0 GENERAL E.2 DATA STORAGE REPRESENTATIONS H.3 INFORMATION STORAGE AND RETRIEVAL
4	Unscharfe Suche für Terme geringer Frequenz in einem großen Korpus / Fuzzy Search for Infrequent Terms in a Large Corpus Gerhards, Karl 10 January 2011 (has links) Until now infrequent terms have been neglected in searching in order to save time and memory. With the help of a cascaded index and the introduced algorithms, such considerations are no longer necessary. A fast and efficient method was developed in order to find all terms in the largest freely available corpus of texts in the German language by exact search, part-word-search and fuzzy search. The process can be extended to include transliterated passages. In addition, documents that contain the term with a modified spelling, can also be found by a fuzzy search. Time and memory requirements are determined and fall considerably below the requests of common search engines. Suche Retrieval Assoziativspeicher 54.82 - Textverarbeitung 06.74 - Informationssysteme E.2 - DATA STORAGE REPRESENTATIONS I.5.2 - Design Methodology ddc:020 ddc:830

1

Page generated in 0.1647 seconds