• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 834
  • 247
  • 94
  • 83
  • 57
  • 39
  • 36
  • 30
  • 25
  • 19
  • 16
  • 15
  • 10
  • 10
  • 9
  • Tagged with
  • 1706
  • 1706
  • 386
  • 279
  • 256
  • 237
  • 232
  • 221
  • 198
  • 197
  • 192
  • 178
  • 141
  • 138
  • 137
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
531

Incorporating semantic and syntactic information into document representation for document clustering

Wang, Yong 06 August 2005 (has links)
Document clustering is a widely used strategy for information retrieval and text data mining. In traditional document clustering systems, documents are represented as a bag of independent words. In this project, we propose to enrich the representation of a document by incorporating semantic information and syntactic information. Semantic analysis and syntactic analysis are performed on the raw text to identify this information. A detailed survey of current research in natural language processing, syntactic analysis, and semantic analysis is provided. Our experimental results demonstrate that incorporating semantic information and syntactic information can improve the performance of our document clustering system for most of our data sets. A statistically significant improvement can be achieved when we combine both syntactic and semantic information. Our experimental results using compound words show that using only compound words does not improve the clustering performance for our data sets. When the compound words are combined with original single words, the combined feature set gets slightly better performance for most data sets. But this improvement is not statistically significant. In order to select the best clustering algorithm for our document clustering system, a comparison of several widely used clustering algorithms is performed. Although the bisecting K-means method has advantages when working with large datasets, a traditional hierarchical clustering algorithm still achieves the best performance for our small datasets.
532

Hierarchical Geographical Identifiers As An Indexing Technique For Geographic Information Retrieval

Lakey, John Christopher 13 December 2008 (has links)
Location plays an ever increasing role in modern web-based applications. Many of these applications leverage off-the-shelf search engine technology to provide interactive access to large collections of data. Unfortunately, these commodity search engines do not provide special support for location-based indexing and retrieval. Many applications overcome this constraint by applying geographic bounding boxes in conjunction with range queries. We propose an alternative technique based on geographic identifiers and suggest it will yield faster query evaluation and provide higher search precision. Our experiment compared the two approaches by executing thousands of unique queries on a dataset with 1.8 million records. Based on the quantitative results obtained, our technique yielded drastic performance improvements in both query execution time and precision.
533

The effect of collection homogeneity on term association as a method of request expansion in information retrieval

Elkalifa, Elsuni Sidahmed January 1991 (has links)
No description available.
534

Sparsification for Topic Modeling and Applications to Information Retrieval

Muoh, Chibuike 30 November 2009 (has links)
No description available.
535

Document Classification using Characteristic Signatures

Mondal, Abhro Jyoti January 2017 (has links)
No description available.
536

A PERSONALIZED INFORMATION ENVIRONMENT SYSTEM FOR INFORMATION RETRIEVAL

YU, HONGMING 02 September 2003 (has links)
No description available.
537

Algorithms and Models for Collaborative Filtering from Large Information Corpora

Strunjas, Svetlana January 2008 (has links)
No description available.
538

Finding Course Literature: Exposing Overlooked Alternatives and Streamlining Targeted Information Retrieval

Bengtegård, Sebastian, Lundén, Martin January 2013 (has links)
När en student idag utbildar sig vid ett svenskt lärosäte behöver denne införskaffa sig kurslitteratur som ett komplement till undervisningen. Det finns inget självklart tillvägagångssätt för hur studenten införskaffar sig sin kurslitteratur och information om litteraturen presenteras inkonsekvent över olika källor .Vi utvecklar därför ett stödsystem för sökning av kurslitteratur, med syfte att effektivisera sökprocessen samt att exponera studenten för eventuellt tidigare förbisedda källor till kurslitteratur.Systemet utvärderas i förhållande etablerade sökstrategier hos 22 studenter. Resultaten visar att användandet av detta stödsystem inte bara minskar antalet steg markant utan även minskar antalet tjänster studenten använder för att införskaffa sin kurslitteratur jämfört med studenters egna sökstrategi idag. / When a student is studying at a university or at a college university in Sweden it requires him or her to acquire course literature as a compliment to teaching. This is often taken for granted, but there is currently no equally obvious approach to how the student obtains his or hers course literature and there is a lack of a structure on how the information is presented.Therefore, we develop a prototype, a search tool which will help students locate their course literature. We do this to find a more appropriate method on how to search for course literature. Firstly, we wish to streamline the student's path to acquire their course literature, reducing the number of steps they need to take. Secondly, we wish to expose the student to previously overlooked sources of course literature. We do this as an experiment with the ambition to show how a possible solution could look like, if availability increased and guidelines was introduced on how to present course literature at Swedish universities and college universities.This system is then evaluated in relation to the established search strategies which the student is currently using to find their course literature.
539

Generalizability and Reproducibility of Search Engine Online User Studies

Xu, Zijian 11 June 2020 (has links)
Research in interactive information retrieval (IR) usually relies on lab user studies or online ones. A key concern of these studies is the generalizability and reproducibility of the results, especially when the studies involved only a limited number of participants. The interactive IR community, however, does not have a commonly agreed guideline regarding how many participants should recruit. We study this fundamental research protocol issue by examining the generalizability and reproducibility of results with respect to a different number of participants using simulation-based approaches. Specifically, we collect a relatively large number of participants' observations for a representative interactive IR experiment setting from online user studies using crowdsourcing. We sample smaller numbers of participants' results from the collected observations to simulate the results of online user studies with a smaller scale. We empirically analyze the patterns of generalizability and reproducibility regarding different dependent variables and draw conclusions related to the optimal number of participants. Our study contributes to interactive information retrieval research by 1) establishing a methodology for evaluating the generalizability and reproducibility of results, and 2) providing guidelines regarding the optimal number of participants for search engine user studies. / Master of Science / In the domain of Information Retrieval, researchers or scientists usually require human participants to interact, test and evaluate a novel system, which is usually called user studies. However, researchers usually perform these studies with small sample size, some of them recruited fewer than 20 participants, which casts doubt on the generalizability and reproducibility of these studies. Generalizability means how reliable the results of relatively small sample size in an experimental setting can be generalized to the outcomes of a larger population. Reproducibility means whether the results from two groups with the same amount of sample size are consistent with each other. In order to examine the generalizability and reproducibility of online user studies in interactive information retrieval systems, we conducted an online user study with large sample size. We reproduced a well-recognized lab user study from Kelly et al. (2015) in an online environment. We established a simulation-based methodology for evaluating the generalizability and reproducibility of the results and then provided guidelines regarding the optimal number of participants for search engine user studies.
540

Improving Web Search Ranking Using the Internet Archive

Li, Liyan 02 June 2020 (has links)
Current web search engines retrieve relevant results only based on the latest content of web pages stored in their indices despite the fact that many web resources update frequently. We explore possible techniques and data sources for improving web search result ranking using web page historical content change. We compare web pages with previous versions and separately model texts and relevance signals in the newly added, retained, and removed parts. We particularly examine the Internet Archive, the largest web archiving service thus far, for its effectiveness in improving web search performance. We experiment with a few possible retrieval techniques, including language modeling approaches using refined document and query representations built based on comparing current web pages to previous versions and Learning-to-rank methods for combining relevance features in different versions of web pages. Experimental results on two large-scale retrieval datasets (ClueWeb09 and ClueWeb12) suggest it is promising to use web page content change history to improve web search performance. However, it is worth mentioning that the actual effectiveness at this moment is affected by the practical coverage of the Internet Archive and the amount of regularly-changing resources among the relevant information related to search queries. Our work is the first step towards a promising area combining web search and web archiving, and discloses new opportunities for commercial search engines and web archiving services. / Master of Science / Current web search engines show search documents only based on the most recent version of web pages stored in their database despite the fact that many web resources update frequently. We explore possible techniques and data sources for improving web search result ranking using web page historical content change. We compare web pages with previous versions and get the newly added, retained, and removed parts. We examine the Internet Archive in particular, the largest web archiving service now, for its effectiveness in improving web search performance. We experiment with a few possible retrieval techniques, including language modeling approaches using refined document and query representations built based on comparing current web pages to previous versions and Learning-to-rank methods for combining relevance features in different versions of web pages. Experimental results on two large-scale retrieval datasets (ClueWeb09 and ClueWeb12) suggest it is promising to use web page content change history to improve web search performance. However, it is worth mentioning that the actual effectiveness at this point is affected by the practical coverage of the Internet Archive and the amount of ever-changing resources among the relevant information related to search queries. Our work is the first step towards a promising area combining web search and web archiving, and discloses new opportunities for commercial search engines and web archiving services.

Page generated in 0.1428 seconds