Multilingual access to information using an intermediate language: Proefschrift voorgelegd tot het behalen van de graad van doctor in de Taal- en Letterkunde aan de Universiteit AntwerpenFrancu, Victoria January 2003 (has links)
While being theoretically so widely available, information can be restricted from a more general use by linguistic barriers. The linguistic aspects of the information languages and particularly the chances of an enhanced access to information by means of multilingual access facilities will make the substance of this thesis. The main problem of this research is thus to demonstrate that information retrieval can be improved by using multilingual thesaurus terms based on an intermediate or switching language to search with. Universal classification systems in general can play the role of switching languages for reasons dealt with in the forthcoming pages. The Universal Decimal Classification (UDC) in particular is the classification system used as example of a switching language for our objectives. The question may arise: why a universal classification system and not another thesaurus? Because the UDC like most of the classification systems uses symbols therefore, it is language independent and the problems of compatibility between such a thesaurus and different other thesauri in different languages are avoided. Another question may still arise? Why not then, assign running numbers to the descriptors in a thesaurus and make a switching language out of the resulting enumerative system? Because of some other characteristics of the UDC: hierarchical structure and terminological richness, consistency and control. One big problem to find an answer to is: can a thesaurus be made having as a basis a classification system in any and all its parts? To what extent this question can be given an affirmative answer? This depends much on the attributes of the universal classification system which can be favourably used to this purpose. Examples of different situations will be given and discussed upon beginning with those classes of UDC which are best fitted for building a thesaurus structure out of them (classes which are both hierarchical and faceted)...
The fast evolution of the World Wide Web has offered the possibility to publish a huge amount of linked documents. Each such document represents a valuable piece of information. Linked Data is the term used to describe a method of exposing and connecting such documents. Even if this method is still in an experimental phase, it is already hard to process all existing data sources and the most obvious solution is to try and index them. The study addresses questions on how to design an index that will be capable to operate with millions of such entries. It analyses the existing projects and describes an index that may fulfill the requirements. The prototype implementation and the provided test results offer additional information about the index structure and effectiveness.
Computer-assisted retrospective periodical indexing in musicology : La Chronique Musicale as RIPMxix prototypeGíslason, Donald Garth January 1985 (has links)
The music periodical literature of the 19th century has largely remained unavailable to musical scholarship due to a lack of adequate indexing. While several indexing efforts have been attempted in the past century, that proposed by the recently established Répertoire international de la presse musicale du dix-neuviéme siécle (RIPMxix) sets itself apart by its comprehensiveness and its use of computer technology. This thesis tests the new system by preparing a prototype RIPMxix Series A catalogue of a major 19th-century French music journal, La Chronique Musicale (1873-1876). The prototype is in five parts: 1) a Title Catalogue, or chronological checklist of the titles, authors and pagination of all sections in the journal; 2) an Iconography Appendix, listing the captions, dimensions and pagination of all iconography in the journal; 3) a List of Variants, giving alternate pagination references in copies of the journal held by selected major institutions; 4) a Keyword Index of important words contained in article titling; and 5) an Author Index. The indexing of La Chronique Musicale was carried out according to the regulations established in the RIPMxix Series A Guidelines, incorporating minor improvements in presentation, and adjudicating certain indexing situations not addressed in them. A data entry system was developed and the typescript catalogue was entered into computer file space. Detailed formatting based on the general design presented in the RIPMxix Series A Guidelines was specified for the Title Catalogue, Iconography Appendix and List of Variants. To produce the remaining portions of the prototype (viz., the Keyword and Author Indexes), design options were studied, specific designs adopted and detailed formatting established. Production of the prototype involved the development of three computer programmes: a single programme for the Title Catalogue, Iconography Appendix and List of Variants; a separate programme for the Keyword Index; and a third programme for the Author Index. It is concluded that the title-derivative approach taken by the RIPMxix system is a valid one, and suggestions are made for further research. / Arts, Faculty of / Music, School of / Graduate
Martin, Russell Lewis
The last decade has seen an unprecedented flood of material coming into archival repositories. As a result, there is a great need for procedures which provide a high degree of intellectual control over records. One such procedure is the indexing of archival materials. An archival index provides access to a large number of name and subject terms, without being bound by the traditional archival structures dictated by provenance. This process has not traditionally been widely understood by archivists, but it is important to grasp the fundamental principles of archival indexing, as well as the problems and issues that follow. This is especially true in a period when methods of automated information processing have reached new levels of sophistication. This thesis is an exploration of these problems and issues. The place of indexing in a complete system of archival description is established, and the process defended as a valid part of archival retrieval. The thesis also offers guidelines for conducting the actual indexing process, and making several basic decisions faced by archival indexers with regard to the implementation of indexing in an archival descriptive system. In addition, the merits of such alternative methods as controlled-vocabulary and uncontrolled-vocabulary indexing, and coordination of desired terms before and after index creation, are weighed, and the positive and negative aspects of certain recently-developed systems evaluated. The thesis concludes by stating ways in which archivists must re-evaluate the indexing process for it to be used effectively in the future. / Arts, Faculty of / Library, Archival and Information Studies (SLAIS), School of / Graduate
Nwachukwu, Izuchukwu Udochi
Directly mapped caches are an attractive option for processor designers as they combine fast lookup times with reduced complexity and area. However, directly-mapped caches are prone to higher miss-rates as there are no candidates for replacement on a cache miss, hence data residing in a cache set would have to be evicted to the next level cache. Another issue that inhibits cache performance is the non-uniformity of accesses exhibited by most applications: some sets are under-utilized while others receive the majority of accesses. This implies that increasing the size of caches may not lead to proportionally improved cache hit rates. Several solutions that address cache non-uniformity have been proposed in the literature. These techniques have been proposed over the past decade and each proposal independently claims the benefit of reduced conflict misses. However, because the published results use different benchmarks and different experimental setups, (there is no established frame of reference for comparing these results) it is not easy to compare them. In this work we report a side-by-side comparison of these techniques. Finally, we propose and Adaptive-Partitioned cache for multi-threaded applications. This design limits inter-thread thrashing while dynamically reducing traffic to heavily accessed sets.
The main factors which prompted the present study were: (1) PRECIS has a linguistic universal feature for computerized subject indexing; (2) the largest Chinese bibliography and index published by the National Central Library of Taiwan still lack subject indexes; (3) both mainland China and Taiwan have created their bibliographic databases based on UNIMARC; and (4) the field 670 of the UNIMARC is reserved for PRECIS. This study has aimed to experiment with PRECIS for indexing Chinese documents, generate Chinese subject indexes using PRECIS, and suggest the use of PRECIS in online retrieval in Chinese bibliographic databases. The last objective is an assumption which was based on the achievement of the first objective.
'n Ondersoek na persoonlike indekseerstelsels, insluitende gerekenariseerde stelsels, met spesiale verwysing na die indekseringsbehoeftes van individuele akademici in Wes-KaaplandBekker, G D January 1989 (has links)
Summary in English. / Bibliography: pages 185-195. / The investigation into personal indexing systems consists of (a) a study of the literature and (b) an empirical survey of the indexing needs of academics in the Western Cape. The literature study was used, inter alia, to determine certain ''characteristics" of personal indexing systems. Characteristics are defined as those features of personal indexing systems that are generally agreed upon by most authors and users as mandatory to ensure effective utilisation of such systems. These characteristics are later employed to derive models of personal indexing systems that may have practical applications for academics. The empirical study provides conclusive proof that dissatisfaction with the academic library is not a reason for setting up a personal indexing system and that academics have a need for professional help when they start their own indexing system. Journal articles are of utmost importance in all document collections, but books, conference papers, theses and clippings are also important. The number of documents contained in such systems vary between 200 and 48 800 with an average of 2 492,76. According to Soper scientists tend to keep their documents at the workplace while humanists tend to keep their documents at home. Social scientists fall between these groups and keep some of their documents at the workplace and some at home. For scientists and social scientists Soper's observations were confirmed. Lack of data made it impossible to come to any conclusion in the case of humanists. The main difference between large indexing systems and personal indexing systems is the number of records. The smaller system can be simpler, but it was not possible, with the data available, to state conclusively that a thesaurus is not necessary. Although the advantages of computerised systems were indicated it is acknowledged that many academics would prefer a manual system. An index on a computer should provide for variable length fields. The researcher comes to the conclusion that a combination of a classification system and free search terms would be the most effective method to use in subject searches. He suggests that the main classes of the Dewey Decimal Classification Scheme may be used as an outline and that for his specialised field of study the user should devise his own scheme.
Lay, William Michael
No description available.
Antoine, Elizabeth Arockiarani
The requirement to store and manipulate data that represents the location and extent of objects, like roads, cities, rivers, etc. (spatial data), led to the evolution of spatial database systems. The domains that led to an increased interest in spatial database systems are earth science, robotics, resource management, urban planning, autonomous navigation and geographic information systems (GIS). To handle the spatial data efficiently, spatial database systems require indexing mechanisms that can retrieve spatial objects based on their locations using direct look-ups as opposed to the sequential search. Indexing structures designed for relational database systems cannot be used for objects with non-zero size. The fact that there is no total ordering of objects in space makes the conventional indexes, such as the B+ tree, incapable of handling spatial data. / Extensive work has been done on spatial indexing and indexing methods are categorized in terms of their efficiency for the type of spatial objects or the type of queries. Queries in spatial database system are classified as single-scan and multi-scan queries. Spatial join is the most important multi-scan query in a spatial database system and the execution time of such queries is super linear to the number of objects. Among the indexing structures available for spatial join queries, Filter trees perform better than its counterparts, such as Hilbert R-trees. Filter tree join algorithm outperforms the R-tree join algorithm by reading each block of entities at most once. Filter trees combine the recursive partitioning, size separation and space filling curves to achieve this efficiency. However, for data sets of low join selectivity, the number of blocks processed for Filter trees is excessive compared to the number of blocks that have intersecting entities. / The goal of this work is to provide a method for accelerating spatial join operations by using Spatial Join Bitmap (SJB) indices. The file organization is based on the concepts introduced in Filter trees. The SJB indices keep track of blocks that have intersecting entities and make the algorithm process only those blocks. We provide algorithms for generating SJB indices dynamically and for maintaining SJB indices when the data sets are updated. Although maintaining SJB indices for updates increases the cost in terms of response time, the cost saving in terms of the join operation is substantial and this makes the overall behaviour of the spatial system very efficient. / We have performed an extensive study using both real and synthetic data sets of various data distributions. The results show that the use of SJB indices produces a substantial speed-up, ranging from 25% to 150% when compared to Filter trees. This method is highly beneficial in a real world scenario, as the number of times the data set is updated is fairly low when compared to the number of times the join processing is done on the data sets. / The spatial indexing structures can be extended to handle data of higher dimensions including time. The position of the geometries, like points, lines, areas or volumes changing over time, represents moving objects. The need for storing and processing moving object data arises in a wide range of applications, including digital battlefields (battlefield simulators), air-traffic control, and mobile communication systems. The successive locations of the object are gathered as the object moves around the space and the locations that are ordered in time are interpolated to obtain the movement of the object: this is called as trajectory of the object. R-tree variations, such as the three dimensional R-trees, TB-trees, FNR trees, STR trees, MON trees and SETI trees, are found to be effective for storing and manipulating past locations of moving objects. The SETI tree is a combination of the R-tree in the time dimension and the partition based technique in the space dimension, and outperforms the other R-tree indexing structures in handling coordinate based queries. However, SETI increases the computational time when handling trajectory queries that retrieve the whole or part of the trajectories. / We propose a methodology for using the recursive partitioning technique for indexing trajectories, called the Recursively Partitioned Trajectory Index (RPTI). RPTI uses a two-level indexing structure that is similar to the SETI and maintains separate indices for the space and time dimensions. However, the splitting of trajectory segments in SETI, which increases the computational time, does not arise in RPTI. We present the algorithms for constructing the RPTI and the algorithms for updates, which include insertion and deletion. We have conducted an experimental study of this method and have demonstrated that RPTI is better than SETI in handling trajectory queries and is competitive with SETI in handling coordinate based queries. Contrary to the SETI structure, RPTI recursively partitions the space and avoids the splitting of line segments, making it efficient for query processing. / Deletion is often ignored while proposing a trajectory index as a result of the assumption that deleting the trajectory of a moving object is meaningless after the transmitted positions are recorded. However, deletions are necessary when the trajectory of a moving object is no longer useful. We have also shown that deletion of a trajectory can be efficiently done using the RPTI structure. The structure of RPTI can be easily implemented by using any of the existing spatial indexing structures. The only design parameters required are the standard disk page size and maximum level of recursive partitioning. However, in SETI, the number of spatial partitions, which is a crucial parameter in any spatial partitioning strategy, is highly dependent on the distribution of data sets.
Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing.Csomai, Andras 05 1900 (has links)
This research addresses the problem of automatic keyphrase extraction from large documents and back of the book indexing. The potential benefits of automating this process are far reaching, from improving information retrieval in digital libraries, to saving countless man-hours by helping professional indexers creating back of the book indexes. The dissertation introduces a new methodology to evaluate automated systems, which allows for a detailed, comparative analysis of several techniques for keyphrase extraction. We introduce and evaluate both supervised and unsupervised techniques, designed to balance the resource requirements of an automated system and the best achievable performance. Additionally, a number of novel features are proposed, including a statistical informativeness measure based on chi statistics; an encyclopedic feature that taps into the vast knowledge base of Wikipedia to establish the likelihood of a phrase referring to an informative concept; and a linguistic feature based on sophisticated semantic analysis of the text using current theories of discourse comprehension. The resulting keyphrase extraction system is shown to outperform the current state of the art in supervised keyphrase extraction by a large margin. Moreover, a fully automated back of the book indexing system based on the keyphrase extraction system was shown to lead to back of the book indexes closely resembling those created by human experts.
Page generated in 0.1339 seconds