• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • Tagged with
  • 5
  • 5
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Easy to Find: Creating Query-Based Multi-Document Summaries to Enhance Web Search

Qumsiyeh, Rani Majed 15 March 2011 (has links) (PDF)
Current web search engines, such as Google, Yahoo!, and Bing, rank the set of documents S retrieved in response to a user query Q and display each document with a title and a snippet, which serves as an abstract of the corresponding document in S. Snippets, however, are not as useful as they are designed for, i.e., to assist search engine users to quickly identify results of interest, if they exist, without browsing through the documents in S, since they (i) often include very similar information and (ii) do not capture the main content of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user's intended request. Furthermore, a document title retrieved by web search engines is not always a good indicator of the content of the corresponding document, since it is not always informative. All these design problems can be solved by our proposed query-based, web informative summarization engine, denoted Q-WISE. Q-WISE clusters documents in S, which allows users to view segregated document collections created according to the specific topic covered in each collection, and generates a concise/comprehensive summary for each collection/cluster of documents. Q-WISE is also equipped with a query suggestion module that provides a guide to its users in formulating a keyword query, which facilitates the web search and improves the precision and recall of the search results. Experimental results show that Q-WISE is highly effective and efficient in generating a high quality summary for each cluster of documents on a specific topic, retrieved in response to a Q-WISE user's query. The empirical study also shows that Q-WISE's clustering algorithm is highly accurate, labels generated for the clusters are useful and often reflect the topic of the corresponding clustered documents, and the performance of the query suggestion module of Q-WISE is comparable to commercial web search engines.
2

Search Term Selection and Document Clustering for Query Suggestion

Zhang, Xiaomin 06 1900 (has links)
In order to improve a user's query and help the user quickly satisfy his/her information need, most search engines provide query suggestions that are meant to be relevant alternatives to the user's query. This thesis builds on the query suggestion system and evaluation methodology described in Shen Jiang's Masters thesis (2008). Jiang's system constructs query suggestions by searching for lexical aliases of web documents and then applying query search to the lexical aliases. A lexical alias for a web document is a list of terms that return the web document in a top-ranked position. Query search is a search process that finds useful combinations of search terms. The main focus of this thesis is to supply alternatives for the components of Jiang's system. We suggest three term scoring mechanisms and generalize Jiang's lexical alias search to be a general search for terms that are useful for constructing good query suggestions. We also replace Jiang's top-down query search by a bottom-up beam search method. We experimentally show that our query suggestion method improves Jiang's system by 30% for short queries and 90% for long queries using Jiang's evaluation method. In addition, we add new evidence supporting Jiang's conclusion that terms in the user's initial query terms are important to include in the query suggestions. In addition, we explore the usefulness of document clustering in creating query suggestions. Our experimental results are the opposite of what we expected: query suggestion based on clustering does not perform nearly as well, in terms of the "coverage" scores we are using for evaluation, as our best method that is not based on document clustering.
3

Search Term Selection and Document Clustering for Query Suggestion

Zhang, Xiaomin Unknown Date
No description available.
4

Spelling Correction in a Music Entity Search Engine by Learning from Historical Search Queries / Stavningskorrigering i en sökmotor för musik genom att lära av historiska söksträngar

Movin, Maria January 2018 (has links)
Query spelling correction is an important component of modern search engines that can help users to express their intent, and thus improve search quality. In this study, we investigated with what accuracy a sequence-to-sequence recurrent neural network (RNN) can recognise and correct misspellings in a music search engine, when the model is trained with old search queries. A sequence-to-sequence RNN was chosen as the model in this study since it has achieved state-of-the-art performance on similar tasks, such as machine translation and speech recognition. The findings from the study imply that the model learns to correct and complete queries with higher accuracy compared to a baseline model that returns the input query. However, we suggest that, for a model that would be good enough for production, more work needs to be done. Especially, work on creating a cleaner, less biased training dataset. Nevertheless, our work strengthens the idea that sequence-to-sequence RNNs could be used as a spell correction system in search engines. / Stavningskorrigering av söksträngar är en viktig komponent i moderna sökmotorer. Stavningskorrigering kan hjälpa användarna att uttrycka sig och därmed förbättra kvaliteten i sökningen. I det här arbetet undersökte vi med vilken noggrannhet en Recurrent neural network (RNN) modell kan lära sig att korrigera felstavningar i söksträngar från en sökmotor för musik. RNN modellen tränades med söksträngar från historiska sökningar från sökmotorn. Anledningen till att RNN valdes som modell i den här studien var för att den har uppnått hittills bästa möjliga resultat på liknande uppgifter, såsom maskinöversättning och taligenkänning. Resultaten från vår studie visar att modellen lär sig att korrigera och komplettera söksträngar med högre noggrannhet än en basmodell som enbart returnerar indatasträngen. För att utveckla en modell som är tillräckligt bra för produktion föreslår vi emellertid att mer arbete måste utföras. Framför allt är vi övertygade om att ett renare, mindre systematiskt avvikande träningsdataset skulle förbättra modellen. På det hela taget stärker dock vårt arbete hypothesen att RNN modeller kan användas som stavningskorrigeringssystem i sökmotorer.
5

Learning representations for Information Retrieval

Sordoni, Alessandro 03 1900 (has links)
La recherche d'informations s'intéresse, entre autres, à répondre à des questions comme: est-ce qu'un document est pertinent à une requête ? Est-ce que deux requêtes ou deux documents sont similaires ? Comment la similarité entre deux requêtes ou documents peut être utilisée pour améliorer l'estimation de la pertinence ? Pour donner réponse à ces questions, il est nécessaire d'associer chaque document et requête à des représentations interprétables par ordinateur. Une fois ces représentations estimées, la similarité peut correspondre, par exemple, à une distance ou une divergence qui opère dans l'espace de représentation. On admet généralement que la qualité d'une représentation a un impact direct sur l'erreur d'estimation par rapport à la vraie pertinence, jugée par un humain. Estimer de bonnes représentations des documents et des requêtes a longtemps été un problème central de la recherche d'informations. Le but de cette thèse est de proposer des nouvelles méthodes pour estimer les représentations des documents et des requêtes, la relation de pertinence entre eux et ainsi modestement avancer l'état de l'art du domaine. Nous présentons quatre articles publiés dans des conférences internationales et un article publié dans un forum d'évaluation. Les deux premiers articles concernent des méthodes qui créent l'espace de représentation selon une connaissance à priori sur les caractéristiques qui sont importantes pour la tâche à accomplir. Ceux-ci nous amènent à présenter un nouveau modèle de recherche d'informations qui diffère des modèles existants sur le plan théorique et de l'efficacité expérimentale. Les deux derniers articles marquent un changement fondamental dans l'approche de construction des représentations. Ils bénéficient notamment de l'intérêt de recherche dont les techniques d'apprentissage profond par réseaux de neurones, ou deep learning, ont fait récemment l'objet. Ces modèles d'apprentissage élicitent automatiquement les caractéristiques importantes pour la tâche demandée à partir d'une quantité importante de données. Nous nous intéressons à la modélisation des relations sémantiques entre documents et requêtes ainsi qu'entre deux ou plusieurs requêtes. Ces derniers articles marquent les premières applications de l'apprentissage de représentations par réseaux de neurones à la recherche d'informations. Les modèles proposés ont aussi produit une performance améliorée sur des collections de test standard. Nos travaux nous mènent à la conclusion générale suivante: la performance en recherche d'informations pourrait drastiquement être améliorée en se basant sur les approches d'apprentissage de représentations. / Information retrieval is generally concerned with answering questions such as: is this document relevant to this query? How similar are two queries or two documents? How query and document similarity can be used to enhance relevance estimation? In order to answer these questions, it is necessary to access computational representations of documents and queries. For example, similarities between documents and queries may correspond to a distance or a divergence defined on the representation space. It is generally assumed that the quality of the representation has a direct impact on the bias with respect to the true similarity, estimated by means of human intervention. Building useful representations for documents and queries has always been central to information retrieval research. The goal of this thesis is to provide new ways of estimating such representations and the relevance relationship between them. We present four articles that have been published in international conferences and one published in an information retrieval evaluation forum. The first two articles can be categorized as feature engineering approaches, which transduce a priori knowledge about the domain into the features of the representation. We present a novel retrieval model that compares favorably to existing models in terms of both theoretical originality and experimental effectiveness. The remaining two articles mark a significant change in our vision and originate from the widespread interest in deep learning research that took place during the time they were written. Therefore, they naturally belong to the category of representation learning approaches, also known as feature learning. Differently from previous approaches, the learning model discovers alone the most important features for the task at hand, given a considerable amount of labeled data. We propose to model the semantic relationships between documents and queries and between queries themselves. The models presented have also shown improved effectiveness on standard test collections. These last articles are amongst the first applications of representation learning with neural networks for information retrieval. This series of research leads to the following observation: future improvements of information retrieval effectiveness has to rely on representation learning techniques instead of manually defining the representation space.

Page generated in 0.0611 seconds