1 |
Text Document Topical Recursive Clustering and Automatic Labeling of a Hierarchy of Document ClustersLi, Xiaoxiao Unknown Date
No description available.
|
2 |
Improving Web Search Ranking Using the Internet ArchiveLi, Liyan 02 June 2020 (has links)
Current web search engines retrieve relevant results only based on the latest content of web pages stored in their indices despite the fact that many web resources update frequently. We explore possible techniques and data sources for improving web search result ranking using web page historical content change. We compare web pages with previous versions and separately model texts and relevance signals in the newly added, retained, and removed parts. We particularly examine the Internet Archive, the largest web archiving service thus far, for its effectiveness in improving web search performance. We experiment with a few possible retrieval techniques, including language modeling approaches using refined document and query representations built based on comparing current web pages to previous versions and Learning-to-rank methods for combining relevance features in different versions of web pages. Experimental results on two large-scale retrieval datasets (ClueWeb09 and ClueWeb12) suggest it is promising to use web page content change history to improve web search performance. However, it is worth mentioning that the actual effectiveness at this moment is affected by the practical coverage of the Internet Archive and the amount of regularly-changing resources among the relevant information related to search queries. Our work is the first step towards a promising area combining web search and web archiving, and discloses new opportunities for commercial search engines and web archiving services. / Master of Science / Current web search engines show search documents only based on the most recent version of web pages stored in their database despite the fact that many web resources update frequently. We explore possible techniques and data sources for improving web search result ranking using web page historical content change. We compare web pages with previous versions and get the newly added, retained, and removed parts. We examine the Internet Archive in particular, the largest web archiving service now, for its effectiveness in improving web search performance. We experiment with a few possible retrieval techniques, including language modeling approaches using refined document and query representations built based on comparing current web pages to previous versions and Learning-to-rank methods for combining relevance features in different versions of web pages. Experimental results on two large-scale retrieval datasets (ClueWeb09 and ClueWeb12) suggest it is promising to use web page content change history to improve web search performance. However, it is worth mentioning that the actual effectiveness at this point is affected by the practical coverage of the Internet Archive and the amount of ever-changing resources among the relevant information related to search queries. Our work is the first step towards a promising area combining web search and web archiving, and discloses new opportunities for commercial search engines and web archiving services.
|
3 |
Platsbaserad sökning : En metod för filtrering och sortering av sökresultat / Location-based search : A method for filtering and sorting of search resultsBouvin, Anita January 2013 (has links)
Informationssökningar av olika slag sker dagligen världen över och sökresultatet kan många gånger vara så stort att användarna har svårt att veta vilka sökresultat som är relevanta. I denna uppsats har syftet varit att undersöka hur sökresultat kan filtreras och sorteras med hjälp av platsbaserad sökning för att det ska bli mer relevant för användaren. Genom litteraturstudie och intervjuer har det blivit möjligt att ta reda på hur ett sökresultat skulle kunna filtreras och sorteras för att möta användarnas förväntningar. De teorier och slutsatser som framkom tillämpades vid utvecklingen av en prototyp. Prototypen testades och utvärderades sedan genom ett användartest där resultatet visar att filtreringen och sorteringen som används i studien kan göra sökresultatet mer relevant för användaren. / Information searches of various kinds take place daily around the world and search results can often be so large that users have difficulty knowing which results are relevant. In this paper the aim has been to examine how search results can be filtered and sorted by using location-based search to make the results more relevant to the user. Through a literature review and interviews it was possible to investigate how a search result can be filtered and sorted to meet user expectations. The theories and conclusions that emerged were applied in the development of a prototype. Usability tests were performed on the prototype and the results show that the filtering used in the study can provide a more relevant search result.
|
4 |
Improving Search Result Clustering By Integrating Semantic Information From WikipediaCalli, Cagatay 01 September 2010 (has links) (PDF)
Suffix Tree Clustering (STC) is a search result clustering (SRC) algorithm focused on generating overlapping clusters with meaningful labels in linear time. It showed the feasibility of SRC but in time, subsequent studies introduced description-first algorithms that generate better labels and achieve higher precision. Still, STC remained as the fastest SRC algorithm and there appeared studies concerned with different problems of STC. In this thesis, semantic relations between cluster labels and documents are exploited to filter out noisy labels and improve merging phase of STC. Wikipedia is used to identify these relations and methods for integrating semantic information to STC are suggested. Semantic features are shown to be effective for SRC task when used together with term frequency vectors. Furthermore, there were no SRC studies on Turkish up to now. In this thesis, a dataset for Turkish is introduced and a number of methods are tested on Turkish.
|
5 |
Alternative Search : From efficiency to experienceHenriksson, Adam January 2014 (has links)
Search engines of today are focusing on efficiently and accurately generating search results.Yet, there is much to be explored in the way people interact with the applications and relate to the content. Individuals are commonly unique, with complex preferences, motives and expectations. Not only is it important to be sensitive to these differences, but to accommodate the extremes. Enhancing a search engine does not only rely on technological development, but to explore potential user experiences in broader perspectives - which not only gratifies the needs for information, but supports a diversity of journeys. The aim of the project is to develop an alternate search engine with different functionality based on new values that reflects contemporary needs. The result, Exposeek, is an experiential prototype supporting exploratory browsing based on principles of distributed infrastructure, transparent computation and serendipitous information. Suggestive queries, legible algorithms and augmented results provide additional insights and present an alternative way to seek and peruse the Web. / Search Engines, Interaction Design
|
6 |
Using clickthrough data to optimize search result ranking : An evaluation of clickthrough data in terms of relevancy and efficiency / Användning av clickthrough data för att optimera rankning av sökresultat : En utvärdering av clickthrough data gällande relevans och effektivitetPaulsson, Anton January 2017 (has links)
Search engines are in a constant need for improvements as the rapid growth of information is affecting the search engines ability to return documents with high relevance. Search results are being lost in between pages and the search algorithms are being exploited to gain a higher ranking on the documents. This study attempts to minimize those two issues, as well as increasing the relevancy of search results by usage of clickthrough data to add another layer of weighting the search results. Results from the evaluation indicate that clickthrough data in fact can be used to gain more relevant search results.
|
7 |
Användarvänlig sökfunktionalitet och resultatvisning på webben / User Frendly Search Functionaly and Search Results on the WebFong, Cheng January 2011 (has links)
I detta arbete undersöks hur man konstruerar en sökfunktion och hur sökresultatvisningen görs på bästa sätt. Sökfunktionen är oftast en central del på en webbsida idag och fyller en viktig roll för om en webbsida skall bli framgångsrik eller inte. Om informationen inte går att hitta på ett enkelt sätt kan webbsidan tappa sina användare ganska snabbt. Även presentationen av sökresultaten kan vara en vital del där logik, design och layout är viktiga aspekter som inte får förbises. Det finns således en hel del faktorer att ta hänsyn till som webbutvecklare. Rapporten går igenom några utvalda best-practice metoder från områden som layout, sökfält, sökresultat och paginering med syftet att identifiera viktiga och användbara metoder. En analys av 10 webbsidor samt en enkätundersökning genomfördes för att undersöka om metoderna tillämpades i verkligheten samt om det var metoder som användare efterfrågar. Arbetet resulterade i en sammanställning av viktiga faktorer som rekommenderas vid konstruktion av en användarvänlig sökfunktion och presentation av sökresultat.
|
8 |
Interactive Visualization of Search Results of Large Document SetsAnderson, James D. January 2018 (has links)
No description available.
|
9 |
Diversified query expansionBouchoucha, Arbi 06 1900 (has links)
La diversification des résultats de recherche (DRR) vise à sélectionner divers documents à partir des résultats de recherche afin de couvrir autant d’intentions que possible. Dans les approches existantes, on suppose que les résultats initiaux sont suffisamment diversifiés et couvrent bien les aspects de la requête. Or, on observe souvent que les résultats initiaux n’arrivent pas à couvrir certains aspects.
Dans cette thèse, nous proposons une nouvelle approche de DRR qui consiste à diversifier l’expansion de requête (DER) afin d’avoir une meilleure couverture des aspects. Les termes d’expansion sont sélectionnés à partir d’une ou de plusieurs ressource(s) suivant le principe de pertinence marginale maximale. Dans notre première contribution, nous proposons une méthode pour DER au niveau des termes où la similarité entre les termes est mesurée superficiellement à l’aide des ressources. Quand plusieurs ressources sont utilisées pour DER, elles ont été uniformément combinées dans la littérature, ce qui permet d’ignorer la contribution individuelle de chaque ressource par rapport à la requête. Dans la seconde contribution de cette thèse, nous proposons une nouvelle méthode de pondération de ressources selon la requête. Notre méthode utilise un ensemble de caractéristiques
qui sont intégrées à un modèle de régression linéaire, et génère à partir de chaque ressource un nombre de termes d’expansion proportionnellement au poids de cette ressource.
Les méthodes proposées pour DER se concentrent sur l’élimination de la redondance entre les termes d’expansion sans se soucier si les termes sélectionnés couvrent effectivement les différents aspects de la requête. Pour pallier à cet inconvénient, nous introduisons dans la troisième contribution de cette thèse une nouvelle méthode pour DER au niveau des aspects. Notre méthode est entraînée de façon supervisée selon le principe que les termes reliés doivent correspondre au même aspect. Cette méthode permet de sélectionner des termes d’expansion à un niveau sémantique latent afin de couvrir autant que possible différents aspects de la requête. De plus, cette méthode autorise l’intégration de plusieurs ressources afin de suggérer des termes d’expansion, et supporte l’intégration de plusieurs contraintes telles que la contrainte de dispersion.
Nous évaluons nos méthodes à l’aide des données de ClueWeb09B et de trois collections de requêtes de TRECWeb track et montrons l’utilité de nos approches par rapport aux méthodes existantes. / Search Result Diversification (SRD) aims to select diverse documents from the search results in order to cover as many search intents as possible. For the existing approaches, a prerequisite is that the initial retrieval results contain diverse documents and ensure a good coverage of the query aspects.
In this thesis, we investigate a new approach to SRD by diversifying the query, namely diversified query expansion (DQE). Expansion terms are selected either from a single resource or from multiple resources following the Maximal Marginal Relevance principle. In the first contribution, we propose a new term-level DQE method in which word similarity is determined at the surface (term) level based on the resources.
When different resources are used for the purpose of DQE, they are combined in a uniform way, thus totally ignoring the contribution differences among resources. In practice the usefulness of a resource greatly changes depending on the query. In the second contribution, we propose a new method of query level resource weighting for DQE. Our method is based on a set of features which are integrated into a linear regression model and generates for a resource a number of expansion candidates that is proportional to the weight of that resource.
Existing DQE methods focus on removing the redundancy among selected expansion terms and no attention has been paid on how well the selected expansion terms can indeed cover the query aspects. Consequently, it is not clear how we can cope with the semantic relations between terms. To overcome this drawback, our third contribution in this thesis aims to introduce a novel method for aspect-level DQE which relies on an explicit modeling of query aspects based on embedding. Our method (called latent semantic aspect embedding) is trained in a supervised manner according to the principle that related terms should correspond to the same aspects. This method allows us to select expansion terms at a latent semantic level in order to cover as much as possible the aspects of a given query. In addition, this method also incorporates several different external resources to suggest potential expansion terms, and supports several constraints, such as the sparsity constraint.
We evaluate our methods using ClueWeb09B dataset and three query sets from TRECWeb tracks, and show the usefulness of our proposed approaches compared to the state-of-the-art approaches.
|
Page generated in 0.0541 seconds