Spelling suggestions: "subject:"informationretrieval."" "subject:"informationsretrieval.""
381 |
An Application of the NTCIR-WEB Raw-data Archive Dataset for User ExperimentsTAKAKU, Masao, EGUSA, Yuka, SAITO, Hitomi, TERAI, Hitoshi, 寺井, 仁 January 2007 (has links) (PDF)
No description available.
|
382 |
Evaluating Information Retrieval Systems With Multiple Non-Expert AssessorsLi, Le January 2013 (has links)
Many current test collections require the use of expert judgments during construction. The true label of each document is given by an expert assessor. However, the cost and effort associated with expert training and judging are typically quite high in the event where we have a high number of documents to judge. One way to address this issue is to have each document judged by multiple non-expert assessors at a lower expense. However, there are two key factors that can make this method difficult: the variability across assessors' judging abilities, and the aggregation of the noisy labels into one single consensus label. Much previous work has shown how to utilize this method to replace expert labels in the relevance evaluation. However, the effects of relevance judgment errors on the ranking system evaluation have been less explored.
This thesis mainly investigates how to best evaluate information retrieval systems with noisy labels, where no ground-truth labels are provided, and where each document may receive multiple noisy labels. Based on our simulation results on two datasets, we find that conservative assessors that tend to label incoming documents as non-relevant are preferable. And there are two important factors affect the overall conservativeness of the consensus labels: the assessor's conservativeness and the relevance standard. This important observation essentially provides a guideline on what kind of consensus algorithms or assessors are needed in order to preserve the high correlation with expert labels in ranking system evaluation. Also, we systematically investigate how to find the consensus labels for those documents with equal confidence to be either relevant or non-relevant. We investigate a content-based consensus algorithm which links the noisy labels with document content. We compare it against the state-of-art consensus algorithms, and find that, depending on the document collection, this content-based approach may help or hurt the performance.
|
383 |
Query-Based data mining for the webPoblete Labra, Bárbara 01 October 2009 (has links)
El objetivo de esta tesis es estudiar diferentes aplicaciones de la minería de consultas Web para mejorar el ranking en motores de búsqueda, mejorar la recuperación de información en la Web y mejorar los sitios Web. La principal motivación de este trabajo es aprovechar la información implícita que los usuarios dejan como rastro al navegar en la Web. A través de este trabajo buscamos demostrar el valor de la "sabiduría de las masas", que entregan las consultas, para muchas aplicaciones. Estas aplicaciones permiten un mejor entendimiento de las necesidades de los usuarios en la Web, mejorando en forma directa la interacción general que tienen los visitantes con los sitios Web y los buscadores. / The objective of this thesis is to study different applications of Web query mining for the improvement of search engine ranking, Web information retrieval and Web site enhancement. The main motivation of this work is to take advantage of the implicit feedback left in the trail of users while navigating through the Web. Throughout this work we seek to demonstrate the value of queries to extract interesting rules, patterns and information about the documents they reach. The models, created in this doctoral work, show that the "wisdom of the crowds" conveyed in queries has many applications that overall provide a better understanding of users' needs in the Web. This allows to improve the general interaction of visitors with Web sites and search engines in a straightforward way.
|
384 |
Multi-User File System SearchBuettcher, Stefan January 2007 (has links)
Information retrieval research usually deals with globally visible, static document collections. Practical applications, in contrast, like file system search and enterprise search, have to cope with highly dynamic text collections and have to take into account user-specific access permissions when generating the results to a search query.
The goal of this thesis is to close the gap between information retrieval research and the requirements exacted by these real-life applications. The algorithms and data structures presented in this thesis can be used to implement a file system search engine that is able to react to changes in the file system by updating its index data in real time. File changes (insertions, deletions, or modifications) are reflected by the search results within a few seconds, even under a very high system workload. The search engine exhibits a low main memory consumption. By integrating security restrictions into the query processing logic, as opposed to applying them in a postprocessing step, it produces search results that are guaranteed to be consistent with the access permissions defined by the file system.
The techniques proposed in this thesis are evaluated theoretically, based on a Zipfian model of term distribution, and through a large number of experiments, involving text collections of non-trivial size --- varying between a few gigabytes and a few hundred gigabytes.
|
385 |
Lexical Affinities and Language ApplicationsTerra, Egidio January 2004 (has links)
Understanding interactions among words is fundamental for natural language applications. However, many statistical NLP methods still ignore this important characteristic of language. For example, information retrieval models still assume word independence.
This work focuses on the creation of lexical affinity models and their applications to natural language problems. The thesis develops two approaches for computing lexical affinity. In the first, the co-occurrence frequency is the calculated by point estimation. The second uses parametric models for co-occurrence distances.
For the point estimation approach, we study several alternative methods for computing the degree of affinity by making use of point estimates for co-occurrence frequency. We propose two new point estimators for co-occurrence and evaluate the measures and the estimation procedures with synonym questions. In our evaluation, synonyms are checked directly by their co-occurrence and also by comparing them indirectly, using other lexical units as supporting evidence.
For the parametric approach, we address the creation of lexical affinity models by using two parametric models for distance co-occurrence: an independence model and an affinity model. The independence model is based on the geometric distribution; the affinity model is based on the gamma distribution. Both fit the data by maximizing likelihood. Two measures of affinity are derived from these parametric models and applied to the synonym questions, resulting in the best absolute performance on these questions by a method not trained to the task.
We also explore the use of lexical affinity in information retrieval tasks. A new method to score missing terms by using lexical affinities is proposed. In particular, we adapt two probabilistic scoring functions for information retrieval to allow all query terms to be scored. One is a document retrieval method and the other is a passage retrieval method. Our new method, using replacement terms, shows significant improvement over the original methods.
|
386 |
Multi-User File System SearchBuettcher, Stefan January 2007 (has links)
Information retrieval research usually deals with globally visible, static document collections. Practical applications, in contrast, like file system search and enterprise search, have to cope with highly dynamic text collections and have to take into account user-specific access permissions when generating the results to a search query.
The goal of this thesis is to close the gap between information retrieval research and the requirements exacted by these real-life applications. The algorithms and data structures presented in this thesis can be used to implement a file system search engine that is able to react to changes in the file system by updating its index data in real time. File changes (insertions, deletions, or modifications) are reflected by the search results within a few seconds, even under a very high system workload. The search engine exhibits a low main memory consumption. By integrating security restrictions into the query processing logic, as opposed to applying them in a postprocessing step, it produces search results that are guaranteed to be consistent with the access permissions defined by the file system.
The techniques proposed in this thesis are evaluated theoretically, based on a Zipfian model of term distribution, and through a large number of experiments, involving text collections of non-trivial size --- varying between a few gigabytes and a few hundred gigabytes.
|
387 |
Similarity and Diversity in Information RetrievalAkinyemi, John 25 April 2012 (has links)
Inter-document similarity is used for clustering, classification, and other purposes within information retrieval. In this thesis, we investigate several aspects of document similarity. In particular, we investigate the quality of several measures of inter-document similarity, providing a framework suitable for measuring and comparing the effectiveness of inter-document similarity measures. We also explore areas of research related to novelty and diversity in information retrieval. The goal of diversity and novelty is to be able to satisfy as many users as possible while simultaneously minimizing or eliminating duplicate and redundant information from search results. In order to evaluate the effectiveness of diversity-aware retrieval functions, user query logs and other information captured from user interactions with commercial search engines are mined and analyzed in order to uncover various informational aspects underlying queries, which are known as subtopics. We investigate the suitability of implicit associations between document content as an alternative to subtopic mining. We also explore subtopic mining from document anchor text and anchor links. In addition, we investigate the suitability of inter-document similarity as a measure for diversity-aware retrieval models, with the aim of using measured inter-document similarity as a replacement for diversity-aware evaluation models that rely on subtopic mining. Finally, we investigate the suitability and application of document similarity for requirements traceability. We present a fast algorithm that uncovers associations between various versions of frequently edited documents, even in the face of substantial changes.
|
388 |
A Semantic-based Approach to Web Services DiscoveryTsai, Yu-Huai 13 June 2011 (has links)
Service-oriented Architecture is now an important issue when it comes to program development. However, there is not yet an efficient and effective way for developer to obtain appropriate component. Current researches mostly focus on either textual meaning or ontology relation of the services. In this research we propose a hybrid approach that integrates both types of information. It starts by defining important attributes and their weights for web service discovery using Multiple Criteria Decision Making. Then a method of similarity calculation based on both textual and ontological information is applied. In the experiment, we collect 103 real-world Web services, and the experimental results show that our approach generally performs better than the existing ones.
|
389 |
Cross-Lingual Question Answering for Corpora with Question-Answer PairsHuang, Shiuan-Lung 02 August 2005 (has links)
Question answering from a corpus of question-answer (QA) pairs accepts a user question in a natural language, and retrieves relevant QA pairs in the corpus. Most of existing question answering techniques are monolingual in nature. That is, the language used for expressing a user question is identical to that for the QA pairs in the corpus. However, with the globalization of business environments and advances in Internet technology, more and more online information and knowledge are stored in the question-answer pair format on the Internet or intranet in different languages. To facilitate users¡¦ access to these QA-pair documents using natural language queries in such a multilingual environment, there is a pressing need for the support of cross-lingual question answering (CLQA). In response, this study designs a thesaurus based CLQA technique. We empirically evaluate our proposed CLQA technique, using a monolingual question answering technique and a machine translation based CLQA technique as performance benchmarks. Our empirical evaluation results show that our proposed CLQA technique achieves a satisfactory effectiveness when using that of the monolingual question answering technique as a performance reference. Moreover, our empirical evaluation results suggest our proposed thesaurus based CLQA technique significantly outperforms the benchmark machine translation based CLQA technique.
|
390 |
Design of Document-Driven Decision Support SystemsLiu, Yu-liang 10 February 2006 (has links)
Documents are records of activities and related knowledge generated from organizational operations. With the trend of more and more documents being stored in the digital form, how to manage this knowledge has become an important issue. The first step is to manage knowledge is to activate the content of knowledge. These documents stored in an organization can not only be retrieved for future reference but also be analyzed to assist managers in making the right decision. Therefore, it is important for a organization to develop document-driven DSS that can discover useful knowledge from a large amount of documents in an organization.
Most previous research on document management are focused on the indexing, retrieval and mining of documents, few applications have investigate how this technology can be used to construct knowledge for decision support. The purpose of this research is to propose an approach for developing document-driven DSS. In particular, the research proposes a methodology that combines ontology, indexing, and information retrieval technology to develop event scheme that can be used as a basis for the document-driven DSS. A prototype system has been designed to analyze documents collected from a journal ranking exercise to demonstrate the feasibility of the proposed approach.
|
Page generated in 0.0924 seconds