Spelling suggestions: "subject:"informationretrieval."" "subject:"informationsretrieval.""
81 |
Automated assistance in the formulation of search statements for bibliographic databasesOakes, Michael Philip January 1994 (has links)
No description available.
|
82 |
Navigating hyperspace : assessing usabilitySmith, Pauline January 1994 (has links)
No description available.
|
83 |
An Approach to Language Modelling for Intelligent Document Retrieval SystemKamma, Aditya January 2017 (has links)
No description available.
|
84 |
The metacognitive knowledge of adolescent students during the information search processBowler, Leanne January 2008 (has links)
No description available.
|
85 |
Detection and management of redundancy for information retrievalBernstein, Yaniv, ybernstein@gmail.com January 2006 (has links)
The growth of the web, authoring software, and electronic publishing has led to the emergence of a new type of document collection that is decentralised, amorphous, dynamic, and anarchic. In such collections, redundancy is a significant issue. Documents can spread and propagate across such collections without any control or moderation. Redundancy can interfere with the information retrieval process, leading to decreased user amenity in accessing information from these collections, and thus must be effectively managed. The precise definition of redundancy varies with the application. We restrict ourselves to documents that are co-derivative: those that share a common heritage, and hence contain passages of common text. We explore document fingerprinting, a well-known technique for the detection of co-derivative document pairs. Our new lossless fingerprinting algorithm improves the effectiveness of a range of document fingerprinting approaches. We empirically show that our algorithm can be highly effective at discovering co-derivative document pairs in large collections. We study the occurrence and management of redundancy in a range of application domains. On the web, we find that document fingerprinting is able to identify widespread redundancy, and that this redundancy has a significant detrimental effect on the quality of search results. Based on user studies, we suggest that redundancy is most appropriately managed as a postprocessing step on the ranked list and explain how and why this should be done. In the genomic area of sequence homology search, we explain why the existing techniques for redundancy discovery are increasingly inefficient, and present a critique of the current approaches to redundancy management. We show how document fingerprinting with a modified version of our algorithm provides significant efficiency improvements, and propose a new approach to redundancy management based on wildcards. We demonstrate that our scheme provides the benefits of existing techniques but does not have their deficiencies. Redundancy in distributed information retrieval systems - where different parts of the collection are searched by autonomous servers - cannot be effectively managed using traditional fingerprinting techniques. We thus propose a new data structure, the grainy hash vector, for redundancy detection and management in this environment. We show in preliminary tests that the grainy hash vector is able to accurately detect a good proportion of redundant document pairs while maintaining low resource usage.
|
86 |
Methods for Distributed Information RetrievalCraswell, Nicholas Eric, Nick.Craswell@anu.edu.au January 2001 (has links)
Published methods for distributed information retrieval generally rely on cooperation from search servers. But most real servers, particularly the tens of thousands available on the Web, are not engineered for such cooperation. This means that the majority of methods proposed, and evaluated in simulated environments of homogeneous cooperating servers, are never applied in practice.
¶
This thesis introduces new methods for server selection and results merging. The methods do not require search servers to cooperate, yet are as effective as the best methods which do. Two large experiments evaluate the new methods against many previously published methods. In contrast to previous experiments they simulate a Web-like environment, where servers employ varied retrieval algorithms and tend not to sub-partition documents from a single source.
¶
The server selection experiment uses pages from 956 real Web servers, three different retrieval systems and TREC ad hoc topics. Results show that a broker using queries to sample servers documents can perform selection over non-cooperating servers without loss of effectiveness. However, using the same queries to estimate the effectiveness of servers, in order to favour servers with high quality retrieval systems, did not consistently improve selection effectiveness.
¶
The results merging experiment uses documents from five TREC sub-collections, five different retrieval systems and TREC ad hoc topics. Results show that a broker using a reference set of collection statistics, rather than relying on cooperation to collate true statistics, can perform merging without loss of effectiveness. Since application of the reference statistics method requires that the broker download the documents to be merged, experiments were also conducted on effective merging based on partial documents. The new ranking method developed was not highly effective on partial documents, but showed some promise on fully downloaded documents.
¶
Using the new methods, an effective search broker can be built, capable of addressing any given set of available search servers, without their cooperation.
|
87 |
Inhaltsbasierte Bildsuche mittels visueller Merkmale eine Alternative zur Erschliessung digitaler BildinformationenVolmer, Stephan January 2006 (has links)
Zugl.: Darmstadt, Techn. Univ., Diss., 2006
|
88 |
Ontology based code generation for dataloggerMehalingam, Senthilkumar. January 2006 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Dept. of Computer Science, 2006. / Includes bibliographical references.
|
89 |
An Evaluation of Projection Techniques for Document Clustering: Latent Semantic Analysis and Independent Component AnalysisJonathan L. Elsas 6 July 2005 (has links)
Dimensionality reduction in the bag-of-words vector space document representation model has been widely studied for the purposes of improving accuracy and reducing computational load of document retrieval tasks. These techniques, however, have not been studied to the same degree with regard to document clustering tasks. This study evaluates the effectiveness of two popular dimensionality reduction techniques for clustering, and their effect on discovering accurate and understandable topical groupings of documents. The two techniques studied are Latent Semantic Analysis and Independent Component Analysis, each of which have been shown to be effective in the past for retrieval purposes.
|
90 |
Wikipedia-Based Semantic Enhancements for Information Nugget RetrievalMacKinnon, Ian January 2008 (has links)
When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document often will not be used in the most relevant nugget in the document for the query. In this thesis a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected either automatically or by human assessors for the query. Evaluated with the Nuggeteer automatic scoring software, which we show to have a high correlation with
human assessor scores for the ciQA 2006 topics, an increase in the F-scores is found from the TREC Complex Interactive Question Answering task when integrating this expansion into an already high-performing baseline system. In addition, the method for finding synonyms using Wikipedia is evaluated using more common synonym detection tasks.
|
Page generated in 0.083 seconds