Global ETD Search

531	Query-Time Optimization Techniques for Structured Queries in Information Retrieval Cartright, Marc-Allen 01 September 2013 (has links) The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective, this translates into an increasing computational cost to generate the final ranked list in response to a query. Therefore we encounter an increasing tension in the trade-off between retrieval effectiveness (quality of result list) and efficiency (the speed at which the list is generated). This tension creates a strong need for optimization techniques to improve the efficiency of ranking with respect to these more complex retrieval models This thesis presents three new optimization techniques designed to deal with different aspects of structured queries. The first technique involves manipulation of interpolated subqueries, a common structure found across a large number of retrieval models today. We then develop an alternative scoring formulation to make retrieval models more responsive to dynamic pruning techniques. The last technique is delayed execution, which focuses on the class of queries that utilize term dependencies and term conjunction operations. In each case, we empirically show that these optimizations can significantly improve query processing efficiency without negatively impacting retrieval effectiveness. Additionally, we implement these optimizations in the context of a new retrieval system known as Julien. As opposed to implementing these techniques as one-off solutions hard-wired to specific retrieval models, we treat each technique as a ``behavioral'' extension to the original system. This allows us to flexibly stack the modifications to use the optimizations in conjunction, increasing efficiency even further. By focusing on the behaviors of the objects involved in the retrieval process instead of on the details of the retrieval algorithm itself, we can recast these techniques to be applied only when the conditions are appropriate. Finally, the modular design of these components illustrates a system design that allows improvements to be implemented without disturbing the existing retrieval infrastructure. algorithms information retrieval optimization search Artificial Intelligence and Robotics Computer Sciences
532	Incorporating semantic and syntactic information into document representation for document clustering Wang, Yong 06 August 2005 (has links) Document clustering is a widely used strategy for information retrieval and text data mining. In traditional document clustering systems, documents are represented as a bag of independent words. In this project, we propose to enrich the representation of a document by incorporating semantic information and syntactic information. Semantic analysis and syntactic analysis are performed on the raw text to identify this information. A detailed survey of current research in natural language processing, syntactic analysis, and semantic analysis is provided. Our experimental results demonstrate that incorporating semantic information and syntactic information can improve the performance of our document clustering system for most of our data sets. A statistically significant improvement can be achieved when we combine both syntactic and semantic information. Our experimental results using compound words show that using only compound words does not improve the clustering performance for our data sets. When the compound words are combined with original single words, the combined feature set gets slightly better performance for most data sets. But this improvement is not statistically significant. In order to select the best clustering algorithm for our document clustering system, a comparison of several widely used clustering algorithms is performed. Although the bisecting K-means method has advantages when working with large datasets, a traditional hierarchical clustering algorithm still achieves the best performance for our small datasets. Natural Language Processing Information Retrieval Text Data Mining Document Clustering
533	Hierarchical Geographical Identifiers As An Indexing Technique For Geographic Information Retrieval Lakey, John Christopher 13 December 2008 (has links) Location plays an ever increasing role in modern web-based applications. Many of these applications leverage off-the-shelf search engine technology to provide interactive access to large collections of data. Unfortunately, these commodity search engines do not provide special support for location-based indexing and retrieval. Many applications overcome this constraint by applying geographic bounding boxes in conjunction with range queries. We propose an alternative technique based on geographic identifiers and suggest it will yield faster query evaluation and provide higher search precision. Our experiment compared the two approaches by executing thousands of unique queries on a dataset with 1.8 million records. Based on the quantitative results obtained, our technique yielded drastic performance improvements in both query execution time and precision. textual index spatial index Location-based search geographic information retrieval
534	The effect of collection homogeneity on term association as a method of request expansion in information retrieval Elkalifa, Elsuni Sidahmed January 1991 (has links) No description available. Information Science
535	Sparsification for Topic Modeling and Applications to Information Retrieval Muoh, Chibuike 30 November 2009 (has links) No description available. Computer Science PLSA information retrieval clustering topic model
536	Document Classification using Characteristic Signatures Mondal, Abhro Jyoti January 2017 (has links) No description available. Computer Science Text classification Template matching Signatures Information retrieval
537	A PERSONALIZED INFORMATION ENVIRONMENT SYSTEM FOR INFORMATION RETRIEVAL YU, HONGMING 02 September 2003 (has links) No description available. Computer Science personalized working environment HITS information retrieval
538	Algorithms and Models for Collaborative Filtering from Large Information Corpora Strunjas, Svetlana January 2008 (has links) No description available. Computer Science collaborative filtering collaborative partitioning clustering information retrieval
539	Finding Course Literature: Exposing Overlooked Alternatives and Streamlining Targeted Information Retrieval Bengtegård, Sebastian, Lundén, Martin January 2013 (has links) När en student idag utbildar sig vid ett svenskt lärosäte behöver denne införskaffa sig kurslitteratur som ett komplement till undervisningen. Det finns inget självklart tillvägagångssätt för hur studenten införskaffar sig sin kurslitteratur och information om litteraturen presenteras inkonsekvent över olika källor .Vi utvecklar därför ett stödsystem för sökning av kurslitteratur, med syfte att effektivisera sökprocessen samt att exponera studenten för eventuellt tidigare förbisedda källor till kurslitteratur.Systemet utvärderas i förhållande etablerade sökstrategier hos 22 studenter. Resultaten visar att användandet av detta stödsystem inte bara minskar antalet steg markant utan även minskar antalet tjänster studenten använder för att införskaffa sin kurslitteratur jämfört med studenters egna sökstrategi idag. / When a student is studying at a university or at a college university in Sweden it requires him or her to acquire course literature as a compliment to teaching. This is often taken for granted, but there is currently no equally obvious approach to how the student obtains his or hers course literature and there is a lack of a structure on how the information is presented.Therefore, we develop a prototype, a search tool which will help students locate their course literature. We do this to find a more appropriate method on how to search for course literature. Firstly, we wish to streamline the student's path to acquire their course literature, reducing the number of steps they need to take. Secondly, we wish to expose the student to previously overlooked sources of course literature. We do this as an experiment with the ambition to show how a possible solution could look like, if availability increased and guidelines was introduced on how to present course literature at Swedish universities and college universities.This system is then evaluated in relation to the established search strategies which the student is currently using to find their course literature. Information Retrieval Course Literature Streamlining Engineering and Technology Teknik och teknologier
540	Generalizability and Reproducibility of Search Engine Online User Studies Xu, Zijian 11 June 2020 (has links) Research in interactive information retrieval (IR) usually relies on lab user studies or online ones. A key concern of these studies is the generalizability and reproducibility of the results, especially when the studies involved only a limited number of participants. The interactive IR community, however, does not have a commonly agreed guideline regarding how many participants should recruit. We study this fundamental research protocol issue by examining the generalizability and reproducibility of results with respect to a different number of participants using simulation-based approaches. Specifically, we collect a relatively large number of participants' observations for a representative interactive IR experiment setting from online user studies using crowdsourcing. We sample smaller numbers of participants' results from the collected observations to simulate the results of online user studies with a smaller scale. We empirically analyze the patterns of generalizability and reproducibility regarding different dependent variables and draw conclusions related to the optimal number of participants. Our study contributes to interactive information retrieval research by 1) establishing a methodology for evaluating the generalizability and reproducibility of results, and 2) providing guidelines regarding the optimal number of participants for search engine user studies. / Master of Science / In the domain of Information Retrieval, researchers or scientists usually require human participants to interact, test and evaluate a novel system, which is usually called user studies. However, researchers usually perform these studies with small sample size, some of them recruited fewer than 20 participants, which casts doubt on the generalizability and reproducibility of these studies. Generalizability means how reliable the results of relatively small sample size in an experimental setting can be generalized to the outcomes of a larger population. Reproducibility means whether the results from two groups with the same amount of sample size are consistent with each other. In order to examine the generalizability and reproducibility of online user studies in interactive information retrieval systems, we conducted an online user study with large sample size. We reproduced a well-recognized lab user study from Kelly et al. (2015) in an online environment. We established a simulation-based methodology for evaluating the generalizability and reproducibility of the results and then provided guidelines regarding the optimal number of participants for search engine user studies. Interactive information retrieval online user studies generalizability reproducibility

Search results