Spelling suggestions: "subject:"informationretrieval."" "subject:"informationsretrieval.""
31 |
The impact of specificity on the retrieval power of a UDC-based multilingual thesaurusFrancu, Victoria January 2003 (has links)
The article describes the research done over a bibliographic database in order to show the impact the specificity of the knowledge organising tools may have on information retrieval. For this purpose two multilingual UDC-based thesauri having different degrees of specificity are considered. Issues of harmonising a classificatory structure with a thesaurus structure are introduced and significant aspects of information retrieval in a multilingual environment are argued in an extensive manner. Aspects of complementarity are discussed with particular emphasis on the real impact produced by alternative search facilities on IR. Finally a number of conclusions are formulated as they arise from the study.
|
32 |
Indexing and retrieving images in a multilingual world (extended abstract)Ménard, Elaine January 2007 (has links)
The Internet constitutes a vast universe of knowledge and human culture, allowing the dissemination of ideas and information without borders. The Web also became an important media for the diffusion of multilingual resources. However, linguistic differences still form a major obstacle to scientific, cultural, and educational exchange. With the ever increasing size of the Web and the availability of more and more documents in various languages, this problem becomes all the more pervasive. Besides this linguistic diversity, a multitude of databases and collections now contain documents in various formats, which may also adversely affect the retrieval process.
This paper presents the context, the problem statement, and the experiment carried out of a research project aiming to verify the existing relations between two different indexing approaches: (1) traditional image indexing recommending the use of controlled vocabularies or (2) free image indexing using uncontrolled vocabulary, and their respective performance for image retrieval, in a multilingual context. The use of controlled vocabularies or uncontrolled vocabularies raises a certain number of difficulties for the indexing process. These difficulties will necessarily entail consequences at the time of image retrieval. Indexing with controlled or uncontrolled vocabularies is a question extensively discussed in the literature. However, it is clear that many searchers recognize the advantages of either form of vocabulary according to circumstances (Arsenault, 2006). It appears that the many difficulties associated with free indexing using uncontrolled vocabularies can only be understood via a comparative analysis with controlled vocabulary indexing (Macgregor & McCulloch, 2006).
This research compares image retrieval within two contexts: a monolingual context where the language of the query is the same as the indexing language; and a multilingual context where the language of the query is different from the indexing language. This research will indicate if one of these indexing approaches surpasses the other, in terms of effectiveness, efficiency, and satisfaction of the image searchers. For this research, three data collection methods are used: (1) the analysis of the vocabularies used for image indexing in order to examine the multiplicity of term types applied to images (generic description, identification, and interpretation) and the degree of indexing difficulty due to the subject and the nature of the image; (2) the simulation of the retrieval process with a subset of images indexed according to each indexing approach studied, and finally, (3) the administration of a questionnaire to gather information on searcher satisfaction during and after the retrieval process. The quantification of the retrieval performance of each indexing approach is based on the usability measures recommended by the standard ISO 9241-11, i.e. effectiveness, efficiency, and satisfaction of the user (AFNOR, 1998).
The need to retrieve a particular image from a collection is shared by several user communities including teachers, artists, journalists, scientists, historians, filmmakers and librarians, all over the world. Image collections also have many areas of application: commercial, scientific, educational, and cultural. Until recently, image collections were difficult to access due to limitations in dissemination and duplication procedures. This research underlines the pressing necessity to optimize the methods used for image processing, in order to facilitate the imagesâ retrieval and their dissemination in multilingual environments. The results of this study will offer preliminary information to deepen our understanding of the influence of the vocabulary used in image indexing. In turn, these results can be used to enhance access to digital collections of visual material in multilingual environments.
|
33 |
Representing and Aligning Thesauri for an Integrated Access to Cultural Heritage ResourcesIsaac, Antoine, Matthezing, Henk January 2007 (has links)
In this paper, we show how Semantic Web techniques can help to solve semantic interoperability issues in the cultural heritage domain. In particular, these techniques can enable integrated access to heterogeneous collections by representing their controlled description vocabularies (e.g. thesauri) in a standardized format â Simple Knowledge
Organization System (SKOS). We also present existing automatic alignment procedures that can assist cultural heritage practitioners to connect such vocabularies at the semantic level, building similarity links between the concepts they contain.
|
34 |
Multilingual access to information using an intermediate language: Proefschrift voorgelegd tot het behalen van de graad van doctor in de Taal- en Letterkunde aan de Universiteit AntwerpenFrancu, Victoria January 2003 (has links)
While being theoretically so widely available, information can be restricted from a more general use by linguistic barriers. The linguistic aspects of the information languages and particularly the chances of an enhanced access to information by means of multilingual access facilities will make the substance of this thesis. The main problem of this research is thus to demonstrate that information retrieval can be improved by using multilingual thesaurus terms based on an intermediate or switching language to search with. Universal classification systems in general can play the role of switching languages for reasons dealt with in the forthcoming pages. The Universal Decimal Classification (UDC) in particular is the classification system used as example of a switching language for our objectives. The question may arise: why a universal classification system and not another thesaurus? Because the UDC like most of the classification systems uses symbols therefore, it is language independent and the problems of compatibility between such a thesaurus and different other thesauri in different languages are avoided. Another question may still arise? Why not then, assign running numbers to the descriptors in a thesaurus and make a switching language out of the resulting enumerative system? Because of some other characteristics of the UDC: hierarchical structure and terminological richness, consistency and control. One big problem to find an answer to is: can a thesaurus be made having as a basis a classification system in any and all its parts? To what extent this question can be given an affirmative answer? This depends much on the attributes of the universal classification system which can be favourably used to this purpose. Examples of different situations will be given and discussed upon beginning with those classes of UDC which are best fitted for building a thesaurus structure out of them (classes which are both hierarchical and faceted)...
|
35 |
Focused RetrievalItakura, Kalista Yuki January 2010 (has links)
Traditional information retrieval applications, such as Web search, return atomic units of retrieval, which are generically called ``documents''. Depending on the application, a document may be a Web page, an email message, a journal article, or any similar object. In contrast to this traditional approach, focused retrieval helps users better pin-point their exact information needs by returning results at the sub-document level. These results may consist of predefined document components~---~such as pages, sections, and paragraphs~---~or they may consist of arbitrary passages, comprising any sub-string of a document. If a document is marked up with XML, a focused retrieval system might return individual XML elements or ranges of elements. This thesis proposes and evaluates a number of approaches to focused retrieval, including methods based on XML markup and methods based on arbitrary passages. It considers the best unit of retrieval, explores methods for efficient sub-document retrieval, and evaluates formulae for sub-document scoring. Focused retrieval is also considered in the specific context of the Wikipedia, where methods for automatic vandalism detection and automatic link generation are developed and evaluated.
|
36 |
Inverted Index Partitioning Strategies for a Distributed Search EnginePatel, Hiren 17 December 2010 (has links)
One of the greatest challenges in information retrieval is to develop an intelligent system for user and machine interaction that supports users in their quest for relevant information. The dramatic increase in the amount of Web content gives rise to the need for a large-scale distributed information retrieval system, targeted to support millions of users and terabytes of data. To retrieve information from such a large amount of data in an efficient manner, the index is split among the servers in a distributed information retrieval system. Thus, partitioning the index among these collaborating nodes plays an important role in enhancing the performance of a distributed search engine. The two widely known inverted index partitioning schemes for a distributed information retrieval system are document partitioning and term partitioning. %In a document partitioned system, each of the server hosts a subset of the documents in the collection, and execute every query against its local sub-collection. In a term partitioned index, each node is responsible for a subset of the terms in the collection, and serves them to a central node as they are required for query evaluation.
In this thesis, we introduce the Document over Term inverted index distribution scheme, which splits a set of nodes into several groups (sub-clusters) and then performs document partitioning between the groups and term partitioning within the group. As this approach is based on the term and document index partitioning approaches, we also refer it as a Hybrid Inverted Index. This approach retains the disk access benefits of term partitioning and the benefits of sharing computational load, scalability, maintainability, and availability of the document partitioning. We also introduce the Document over Document index partitioning scheme, based on the document partitioning approach. In this approach, a set of nodes is split into groups and documents in the collection are partitioned between groups and also within each group. This strategy retains all the benefits of the document partitioning approach, but reduces the computational load more effectively and uses resources more efficiently.
We compare distributed index approaches experimentally and show that in terms of efficiency and scalability, document partition based approaches perform significantly better than the others. The Document over Term partitioning offers efficient utilization of search-servers and lowers disk access, but suffers from the problem of load imbalance. The Document over Document partitioning emerged to be the preferred method during high workload.
|
37 |
A tightness continuum measure of Chinese semantic units, and its application to information retrievalXu, Ying Unknown Date
No description available.
|
38 |
PLUS : a system architecture for Personalized Library User SupportBanwell, Linda M. January 1992 (has links)
No description available.
|
39 |
Arabic root-based clustering : an algorithm for identifying roots based on n-grams and morphological similarityAl-Fares, Waleed January 2001 (has links)
No description available.
|
40 |
The development of a computerised word-of-mouth emulatorHarvey, Clare Frances January 1994 (has links)
No description available.
|
Page generated in 0.0996 seconds