Spelling suggestions: "subject:"forminformation retrieval"" "subject:"informationation retrieval""
531 |
Query-Time Optimization Techniques for Structured Queries in Information RetrievalCartright, Marc-Allen 01 September 2013 (has links)
The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective, this translates into an increasing computational cost to generate the final ranked list in response to a query. Therefore we encounter an increasing tension in the trade-off between retrieval effectiveness (quality of result list) and efficiency (the speed at which the list is generated). This tension creates a strong need for optimization techniques to improve the efficiency of ranking with respect to these more complex retrieval models
This thesis presents three new optimization techniques designed to deal with different aspects of structured queries. The first technique involves manipulation of interpolated subqueries, a common structure found across a large number of retrieval models today. We then develop an alternative scoring formulation to make retrieval models more responsive to dynamic pruning techniques. The last technique is delayed execution, which focuses on the class of queries that utilize term dependencies and term conjunction operations. In each case, we empirically show that these optimizations can significantly improve query processing efficiency without negatively impacting retrieval effectiveness.
Additionally, we implement these optimizations in the context of a new retrieval system known as Julien. As opposed to implementing these techniques as one-off solutions hard-wired to specific retrieval models, we treat each technique as a ``behavioral'' extension to the original system. This allows us to flexibly stack the modifications to use the optimizations in conjunction, increasing efficiency even further. By focusing on the behaviors of the objects involved in the retrieval process instead of on the details of the retrieval algorithm itself, we can recast these techniques to be applied only when the conditions are appropriate. Finally, the modular design of these components illustrates a system design that allows improvements to be implemented without disturbing the existing retrieval infrastructure.
|
532 |
Incorporating semantic and syntactic information into document representation for document clusteringWang, Yong 06 August 2005 (has links)
Document clustering is a widely used strategy for information retrieval and text data mining. In traditional document clustering systems, documents are represented as a bag of independent words. In this project, we propose to enrich the representation of a document by incorporating semantic information and syntactic information. Semantic analysis and syntactic analysis are performed on the raw text to identify this information. A detailed survey of current research in natural language processing, syntactic analysis, and semantic analysis is provided. Our experimental results demonstrate that incorporating semantic information and syntactic information can improve the performance of our document clustering system for most of our data sets. A statistically significant improvement can be achieved when we combine both syntactic and semantic information. Our experimental results using compound words show that using only compound words does not improve the clustering performance for our data sets. When the compound words are combined with original single words, the combined feature set gets slightly better performance for most data sets. But this improvement is not statistically significant. In order to select the best clustering algorithm for our document clustering system, a comparison of several widely used clustering algorithms is performed. Although the bisecting K-means method has advantages when working with large datasets, a traditional hierarchical clustering algorithm still achieves the best performance for our small datasets.
|
533 |
Hierarchical Geographical Identifiers As An Indexing Technique For Geographic Information RetrievalLakey, John Christopher 13 December 2008 (has links)
Location plays an ever increasing role in modern web-based applications. Many of these applications leverage off-the-shelf search engine technology to provide interactive access to large collections of data. Unfortunately, these commodity search engines do not provide special support for location-based indexing and retrieval. Many applications overcome this constraint by applying geographic bounding boxes in conjunction with range queries. We propose an alternative technique based on geographic identifiers and suggest it will yield faster query evaluation and provide higher search precision. Our experiment compared the two approaches by executing thousands of unique queries on a dataset with 1.8 million records. Based on the quantitative results obtained, our technique yielded drastic performance improvements in both query execution time and precision.
|
534 |
The effect of collection homogeneity on term association as a method of request expansion in information retrievalElkalifa, Elsuni Sidahmed January 1991 (has links)
No description available.
|
535 |
Sparsification for Topic Modeling and Applications to Information RetrievalMuoh, Chibuike 30 November 2009 (has links)
No description available.
|
536 |
Document Classification using Characteristic SignaturesMondal, Abhro Jyoti January 2017 (has links)
No description available.
|
537 |
A PERSONALIZED INFORMATION ENVIRONMENT SYSTEM FOR INFORMATION RETRIEVALYU, HONGMING 02 September 2003 (has links)
No description available.
|
538 |
Algorithms and Models for Collaborative Filtering from Large Information CorporaStrunjas, Svetlana January 2008 (has links)
No description available.
|
539 |
Finding Course Literature: Exposing Overlooked Alternatives and Streamlining Targeted Information RetrievalBengtegård, Sebastian, Lundén, Martin January 2013 (has links)
När en student idag utbildar sig vid ett svenskt lärosäte behöver denne införskaffa sig kurslitteratur som ett komplement till undervisningen. Det finns inget självklart tillvägagångssätt för hur studenten införskaffar sig sin kurslitteratur och information om litteraturen presenteras inkonsekvent över olika källor .Vi utvecklar därför ett stödsystem för sökning av kurslitteratur, med syfte att effektivisera sökprocessen samt att exponera studenten för eventuellt tidigare förbisedda källor till kurslitteratur.Systemet utvärderas i förhållande etablerade sökstrategier hos 22 studenter. Resultaten visar att användandet av detta stödsystem inte bara minskar antalet steg markant utan även minskar antalet tjänster studenten använder för att införskaffa sin kurslitteratur jämfört med studenters egna sökstrategi idag. / When a student is studying at a university or at a college university in Sweden it requires him or her to acquire course literature as a compliment to teaching. This is often taken for granted, but there is currently no equally obvious approach to how the student obtains his or hers course literature and there is a lack of a structure on how the information is presented.Therefore, we develop a prototype, a search tool which will help students locate their course literature. We do this to find a more appropriate method on how to search for course literature. Firstly, we wish to streamline the student's path to acquire their course literature, reducing the number of steps they need to take. Secondly, we wish to expose the student to previously overlooked sources of course literature. We do this as an experiment with the ambition to show how a possible solution could look like, if availability increased and guidelines was introduced on how to present course literature at Swedish universities and college universities.This system is then evaluated in relation to the established search strategies which the student is currently using to find their course literature.
|
540 |
Generalizability and Reproducibility of Search Engine Online User StudiesXu, Zijian 11 June 2020 (has links)
Research in interactive information retrieval (IR) usually relies on lab user studies or online ones. A key concern of these studies is the generalizability and reproducibility of the results, especially when the studies involved only a limited number of participants. The interactive IR community, however, does not have a commonly agreed guideline regarding how many participants should recruit. We study this fundamental research protocol issue by examining the generalizability and reproducibility of results with respect to a different number of participants using simulation-based approaches. Specifically, we collect a relatively large number of participants' observations for a representative interactive IR experiment setting from online user studies using crowdsourcing. We sample smaller numbers of participants' results from the collected observations to simulate the results of online user studies with a smaller scale. We empirically analyze the patterns of generalizability and reproducibility regarding different dependent variables and draw conclusions related to the optimal number of participants. Our study contributes to interactive information retrieval research by 1) establishing a methodology for evaluating the generalizability and reproducibility of results, and 2) providing guidelines regarding the optimal number of participants for search engine user studies. / Master of Science / In the domain of Information Retrieval, researchers or scientists usually require human participants to interact, test and evaluate a novel system, which is usually called user studies. However, researchers usually perform these studies with small sample size, some of them recruited fewer than 20 participants, which casts doubt on the generalizability and reproducibility of these studies. Generalizability means how reliable the results of relatively small sample size in an experimental setting can be generalized to the outcomes of a larger population. Reproducibility means whether the results from two groups with the same amount of sample size are consistent with each other. In order to examine the generalizability and reproducibility of online user studies in interactive information retrieval systems, we conducted an online user study with large sample size. We reproduced a well-recognized lab user study from Kelly et al. (2015) in an online environment. We established a simulation-based methodology for evaluating the generalizability and reproducibility of the results and then provided guidelines regarding the optimal number of participants for search engine user studies.
|
Page generated in 0.1286 seconds