Global ETD Search

21	Mining Clickthrough Data To Improve Search Engine Results Veilumuthu, Ashok 05 1900 (has links) (PDF) In this thesis, we aim at improving the search result quality by utilizing the search intelligence (history of searches) available in the form of click-through data. We address two key issues, namely 1) relevance feedback extraction and fusion, and 2) deciphering search query intentions. Relevance Feedback Extraction and Fusion: The existing search engines depend heavily on the web linkage structure in the form of hyperlinks to determine the relevance and importance of the documents. But these are collective judgments given by the page authors and hence, prone to collaborated spamming. To overcome the spamming attempts and language semantic issues, it is also important to incorporate the user feedback on the documents' relevance. Since users can be hardly motivated to give explicit/direct feedback on search quality, it becomes necessary to consider implicit feedback that can be collected from search engine logs. Though a number of implicit feedback measures have been proposed in the literature, we have not been able to identify studies that aggregate those feedbacks in a meaningful way to get a final ranking of documents. In this thesis, we first evaluate two implicit feedback measures namely 1) click sequence and 2) time spent on the document for their content uniqueness. We develop a mathematical programming model to collate the feedbacks collected from different sessions into a single ranking of documents. We use Kendall's τ rank correlation to determine the uniqueness of the information content present in the individual feedbacks. The experimental evaluation on top 30 select queries from an actual search log data confirms that these two measures are not in perfect agreement and hence, incremental information can potentially be derived from them. Next, we study the feedback fusion problem in which the user feedbacks from various sessions need to be combined meaningfully. Preference aggregation is a classical problem in economics and we study a variation of it where the rankers, i.e., the feedbacks, possess different expertise. We extend the generalized Mallows' model to model the feedback rankings given in user sessions. We propose a single stage and two stage aggregation framework to combine different feedbacks into one final ranking by taking their respective expertise into consideration. We show that the complexity of the parameter estimation problem is exponential in number of documents and queries. We develop two scalable heuristics namely, 1) a greedy algorithm, and 2) a weight based heuristic, that can closely approximate the solution. We also establish the goodness of fit of the model by testing it on actual log data through log-likelihood ratio test. As the independent evaluation of documents is not available, we conduct experiments on synthetic datasets devised appropriately to examine the various merits of the heuristics. The experimental results confirm the possibility of expertise oriented aggregation of feedbacks by producing orderings better than both the best ranker as well as equi-weight aggregator. Motivated with this result, we extend the aggregation framework to hold infinite rankings for the meta-search applications. The aggregation results on synthetic datasets are found to be ensuring the extension fruitful and scalable. Deciphering Search Query Intentions: The search engine often retrieves a huge list of documents based on their relevance scores for a given query. Such a presentation strategy may work if the submitted query is very specific, homogeneous and unambiguous. But many a times it so happen that the queries posed to the search engine are too short to be specific and hence ambiguous to identify clearly the exact information need, (eg. "jaguar"). These ambiguous and heterogeneous queries invite results from diverse topics. In such cases, the users may have to sift through the entire list to find their needed information and that could be a difficult task. Such a task can be simplified by organizing the search results under meaningful subtopics, which would help the users to directly move on to their topic of interest and ignore the rest. We develop a method to determine the various possible intentions of a given short generic and ambiguous query using information from the click-through data. We propose a two stage clustering framework to co-cluster the queries and documents into intentions that can readily be presented whenever it is demanded. For this problem, we adapt the spectral bipartite partitioning by extending it to automatically determine the number of clusters hidden in the log data. The algorithm has been tested on selected ambiguous queries and the results demonstrate the ability of the algorithm in distinguishing among the user intentions. Data Mining (Special Computer Methods) Search Engine (Computer Science) Information Search and Retrieval Clickthrough Data Mining Implicit Releveance Feedback Rank Aggregation Query Clustering Intent Clustering Search Engine Log Files Search Engine Query Log Search Engine Log Queries Ranking Models Computer Science
22	Decentralized Web Search Haque, Md Rakibul 08 June 2012 (has links) Centrally controlled search engines will not be sufficient and reliable for indexing and searching the rapidly growing World Wide Web in near future. A better solution is to enable the Web to index itself in a decentralized manner. Existing distributed approaches for ranking search results do not provide flexible searching, complete results and ranking with high accuracy. This thesis presents a decentralized Web search mechanism, named DEWS, which enables existing webservers to collaborate with each other to form a distributed index of the Web. DEWS can rank the search results based on query keyword relevance and relative importance of websites in a distributed manner preserving a hyperlink overlay on top of a structured P2P overlay. It also supports approximate matching of query keywords using phonetic codes and n-grams along with list decoding of a linear covering code. DEWS supports incremental retrieval of search results in a decentralized manner which reduces network bandwidth required for query resolution. It uses an efficient routing mechanism extending the Plexus routing protocol with a message aggregation technique. DEWS maintains replica of indexes, which reduces routing hops and makes DEWS robust to webservers failure. The standard LETOR 3.0 dataset was used to validate the DEWS protocol. Simulation results show that the ranking accuracy of DEWS is close to the centralized case, while network overhead for collaborative search and indexing is logarithmic on network size. The results also show that DEWS is resilient to changes in the available pool of indexing webservers and works efficiently even in the presence of heavy query load. Decentralized search engine P2P webserver ranking pagerank bm25 Computer Science
23	Plánovač spojení ve městě / Urban transport planner Pokorný, Tomáš January 2017 (has links) Travelling in the city is a part of everyday life for many people. It is sometimes difficult to choose the right combination of walking and public transport especially in unfamiliar parts of the city. We processed publicly available data and made a search engine for multimodal paths. The search engine was designed to be able to personalise results according to user needs and could be used as a web application or a shared library.
24	Internetový marketing se zaměřením na SEO a SEM / Internet marketing with focuse on SEO and SEM Hanušová, Kateřina January 2008 (has links) The thesis defines and describes tools of Internet marketing. Within the frame of these tools it describes search engine marketing and website search engine optimization in detail. This information is presented in context of real e-shop and it focuses on SEO and SEM tools in real use.
25	Development of a New Client-Server Architecture for Context Aware Mobile Computing Gui, Feng 25 March 2009 (has links) This dissertation studies the context-aware application with its proposed algorithms at client side. The required context-aware infrastructure is discussed in depth to illustrate that such an infrastructure collects the mobile user’s context information, registers service providers, derives mobile user’s current context, distributes user context among context-aware applications, and provides tailored services. The approach proposed tries to strike a balance between the context server and mobile devices. The context acquisition is centralized at the server to ensure the usability of context information among mobile devices, while context reasoning remains at the application level. Hence, a centralized context acquisition and distributed context reasoning are viewed as a better solution overall. The context-aware search application is designed and implemented at the server side. A new algorithm is proposed to take into consideration the user context profiles. By promoting feedback on the dynamics of the system, any prior user selection is now saved for further analysis such that it may contribute to help the results of a subsequent search. On the basis of these developments at the server side, various solutions are consequently provided at the client side. A proxy software-based component is set up for the purpose of data collection. This research endorses the belief that the proxy at the client side should contain the context reasoning component. Implementation of such a component provides credence to this belief in that the context applications are able to derive the user context profiles. Furthermore, a context cache scheme is implemented to manage the cache on the client device in order to minimize processing requirements and other resources (bandwidth, CPU cycle, power). Java and MySQL platforms are used to implement the proposed architecture and to test scenarios derived from user’s daily activities. To meet the practical demands required of a testing environment without the impositions of a heavy cost for establishing such a comprehensive infrastructure, a software simulation using a free Yahoo search API is provided as a means to evaluate the effectiveness of the design approach in a most realistic way. The integration of Yahoo search engine into the context-aware architecture design proves how context aware application can meet user demands for tailored services and products in and around the user’s environment. The test results show that the overall design is highly effective,providing new features and enriching the mobile user’s experience through a broad scope of potential applications. artificial intelligence data cache context awareness mobile computing search engine
26	Using clickstream data as implicit feedback in information retrieval systems / Användning av klickströmsdata som implicit återkoppling i informationssökningssystem Johansson, Henrik January 2018 (has links) This Master's thesis project aims to investigate if Wikipedia's clickstream data can be used to improve the retrieval performance of information retrieval systems. The project is conducted under the assumption that a traversal between two article connects the two articles in regards to content. To extract useful terms out of the clickstream data, it needed to be structured so that it given a Wikipedia article it is possible to find all of the in-going or out-going article traversals.The project settled on using the clickstream data in an automatic query expansion approach.Two expansion methods were investigated, one based on expanding with full article title so that the context would be preserved, and the other expanded with individual terms from the article titles.The structure of the data and two proposed methods were evaluated using a set of queries and relevance judgments. The results of the evaluation shows that the method that expands with individual terms performed better than the full article title expansion method and that the individual term method managed to increase the MAP with 11.24%. The expansion method was evaluated on two different query collections, and it was found that the proposed expansion method only improves the results where the average recall of the original queries are low.The thesis conclusion is that the clickstream can be used to improve retrieval performance for an information retrieval system. / Det här examensarbetets mål är att undersöka om Wikipedias klickströmsdata kan användas för att förbättra sökprestanda för informationsökningssystem. Arbetet har utförts under antagandet att en övergång mellan två artiklar på Wikipedia sammankopplar artiklarnas innehåll och är av intresse för användaren. För att kunna utnyttja klickströmsdatan krävs det att den struktureras på ett användbart sätt så att det givet en artikel går att se hur läsare har förflyttat sig ut eller in mot artikeln. Vi valde att utnyttja datamängden genom en automatisk sökfrågeexpansion. Två olika metoder togs fram, där den första expanderar sökfrågan med hela artikeltitlar medans den andra expanderar med enskilda ord ur en artikeltitel.Undersökningens resultat visar att den ordbaserade expansionsmetoden presterar bättre än metoden som expanderar med hela artikeltitlar. Den ordbaserade expansionsmetoden lyckades uppnå en förbättring för måttet MAP med 11.21%. Från arbetet kan man också se att expansionmetoden enbart förbättrar prestandan när täckningen för den ursprungliga sökfrågan är liten. Gällande strukturen på klickströmsdatan så presterade den utgående strukturen bättre än den ingående. Examensarbetets slutsats är att denna klickströmsdata lämpar sig bra för att förbättra sökprestanda för ett informationsökningssystem. query expansion search engine elasticsearch clickstream Computer Sciences Datavetenskap (datalogi)
27	The Psychology of a Web Search Engine Ogbonna, Antoine I. January 2011 (has links) No description available. Computer Science Mathematics Search Yoool Search engine Text mining
28	Using the Architectural Tradeoff Analysis Method to Evaluate the Software Architecture of a Semantic Search Engine: A Case Study Chatra Raveesh, Sandeep January 2013 (has links) No description available. Computer Science Semantic Hadoop Big data Search engine
29	Matematický vyhledávač / Mathematical Search Engine Mišutka, Jozef January 2013 (has links) Mathematics has been used to describe phenomena and problems in many re- search fields for centuries. The basic elements used in the description are formu- lae which express information symbolically. However, searching for mathematical knowledge in digital form using available tools is still cumbersome. We address this issue by presenting the mathematical search engine EgoMath, based on a full text searching, which can search for mathematical formulae and text. We perform an eval- uation over a large collection of documents showing that our solution is usable. Our approach can be used with huge document collections by applying one specialised technique. In order to provide a valuable evaluation of the quality, we built an al- ternative mathematical search engine using the feature extraction technique proposed by Ma et al. We propose important improvements to this solution achieving interest- ing results. We perform the first ever cross-evaluation of mathematical search engines based on different algorithms. A comprehensive survey of existing techniques avail- able, presented in this thesis, completes the picture of mathematical searching.
30	STREAMLINE THE SEARCH ENGINE MARKETING STRATEGY : Generational Driven Search Behavior on Google Nilsson, Rebecca, Alanko, Christa January 2018 (has links) The expanded internet usage has resulted in an increased activity at web-based search engines. Companies are therefore devoting a large portion of their online marketing budget on Search Engine Marketing (abbreviated SEM) in order to reach potential online consumers searching for products. SEM comprises Search Engine Advertising (SEA) and Search Engine Optimization (SEO) which are two dissimilar marketing tools companies can invest in to reach the desired customer segments. It is therefore of great interest for companies in different product markets to have knowledge of which SEM strategy to utilize. The statement leads to the purpose of the thesis which is to investigate which SEM strategy is the most suitable for companies in different markets, SEA or SEO?. The purpose of the thesis is derived to the research problem: How does the search behavior of consumers differ between the two SEM tools, SEO and SEA?. Initially, in order to answer the research problem, a theoretical framework was conducted consisting of theories from previous research. To collect primary data observations of 60 test subjects was performed in accordance with the Experimental Vignette Methodology. The analysis consists of a comparison between the collected data and the theories included in the frame of reference, to identify similarities and differences. The SPSS analysis of the result revealed numerous findings such as the two-way interactions of the factors degree of involvement and the click rate of SEM, as well as the choice of either a head or a tail keyword and the degree of involvement. The analysis further revealed a three-way interaction which suggests that the degree of involvement, and the use of either a head or tail keyword affects the choice of SEM. Additionally, the result shows that customers using brands as keywords are more likely to click on an organic link rather than on a paid ad. However, when adding the factor age to the analysis the results turn insignificant. As the area of search behavior of customers using search engines is relatively scientifically unexplored, the thesis has contributed with knowledge useful for companies, marketing agencies, among others. However, due to the ongoing expansion of search engine usage, it is of great interest to conduct further research in the area to reveal additional findings. Search Engine Marketing Search Engine Advertising Search Engine Optimization Search Behavior Generation X Generation Y Business Administration Företagsekonomi

Search results