Global ETD Search

11	Effective web crawlers Ali, Halil, hali@cs.rmit.edu.au January 2008 (has links) Web crawlers are the component of a search engine that must traverse the Web, gathering documents in a local repository for indexing by a search engine so that they can be ranked by their relevance to user queries. Whenever data is replicated in an autonomously updated environment, there are issues with maintaining up-to-date copies of documents. When documents are retrieved by a crawler and have subsequently been altered on the Web, the effect is an inconsistency in user search results. While the impact depends on the type and volume of change, many existing algorithms do not take the degree of change into consideration, instead using simple measures that consider any change as significant. Furthermore, many crawler evaluation metrics do not consider index freshness or the amount of impact that crawling algorithms have on user results. Most of the existing work makes assumptions about the change rate of documents on the Web, or relies on the availability of a long history of change. Our work investigates approaches to improving index consistency: detecting meaningful change, measuring the impact of a crawl on collection freshness from a user perspective, developing a framework for evaluating crawler performance, determining the effectiveness of stateless crawl ordering schemes, and proposing and evaluating the effectiveness of a dynamic crawl approach. Our work is concerned specifically with cases where there is little or no past change statistics with which predictions can be made. Our work analyses different measures of change and introduces a novel approach to measuring the impact of recrawl schemes on search engine users. Our schemes detect important changes that affect user results. Other well-known and widely used schemes have to retrieve around twice the data to achieve the same effectiveness as our schemes. Furthermore, while many studies have assumed that the Web changes according to a model, our experimental results are based on real web documents. We analyse various stateless crawl ordering schemes that have no past change statistics with which to predict which documents will change, none of which, to our knowledge, has been tested to determine effectiveness in crawling changed documents. We empirically show that the effectiveness of these schemes depends on the topology and dynamics of the domain crawled and that no one static crawl ordering scheme can effectively maintain freshness, motivating our work on dynamic approaches. We present our novel approach to maintaining freshness, which uses the anchor text linking documents to determine the likelihood of a document changing, based on statistics gathered during the current crawl. We show that this scheme is highly effective when combined with existing stateless schemes. When we combine our scheme with PageRank, our approach allows the crawler to improve both freshness and quality of a collection. Our scheme improves freshness regardless of which stateless scheme it is used in conjunction with, since it uses both positive and negative reinforcement to determine which document to retrieve. Finally, we present the design and implementation of Lara, our own distributed crawler, which we used to develop our testbed. Web Crawler Search Engines Index Consistency Efficiency
12	Search engine optimisation or paid placement systems-user preference / Neethling, Riaan. January 2007 (has links) Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2007. / Includes bibliographical references (leaves 98-113). Also available online.
13	An application of machine learning techniques to interactive, constraint-based search Harbert, Christopher W. Shang, Yi, January 2005 (has links) Thesis (M.S.)--University of Missouri-Columbia, 2005. / The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (December 12, 2006) Includes bibliographical references.
14	Search algorithms for discovery of Web services Hicks, Janette M. January 2005 (has links) Thesis (M.S.)--State University of New York at Binghamton, Watson School of Engineering and Applied Science (Computer Science), 2005. / Includes bibliographical references. World Wide Web Web search engines.
15	Evaluation and comparison of search engines Mtshontshi, Lindiwe 12 1900 (has links) Thesis (MPhil)--Stellenbosch University, 2004. / ENGLISH ABSTRACT: A growing body of studies is developing approaches to evaluate human interaction with Web search engines. Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of the human relevance judgements involved. However, both for business enterprises and people it is important to know the most effective Web search engine, since such search engines help their users find a higher number of relevant Web pages with less effort. Furthermore, this information can be used for several practical purposes. This study does not attempt to describe all the currently available search engines, but provides a comparison of some, which are deemed to be among the most useful. It concentrates on search engines and their characteristics only. The goal is to help a new user get the most useful "hits" when using the various tools. / AFRIKAANSE OPSOMMING: Al hoe meer studies word gedoen om benaderings te ontwikkel vir die evaluasie van menslike interaksie met Web-soekenjins. Om te meet hoe effektief 'n soekenjin inligting op die Wêreldwye Web kan opspoor, is duur vanweë die mens se relevansiebeoordeling wat daarby betrokke is. Dit is egter belangrik dat die bestuurders van sake-ondememings en ander mense sal weet watter die mees doeltreffende soekenjins is, aangesien sulke soekenjins hulle gebruikers help om 'n hoër aantal relevante Webblaaie met minder inspanning te vind. Hierdie inligting kan ook gebruik word om 'n paar praktiese doelwitte te verwesenlik. Daar word nie gepoog om al die soekenjins wat tans beskikbaar is, te beskryf nie, maar sommige van die soekenjins wat as die nuttigste beskou word, word vergelyk. Daar word alleenlik op soekenjins en hulle kenmerke gekonsentreer. Die doel is om die nuwe gebruiker te help om die nuttigste inligting te verkry deur gebruik te maak van verskeie hulpmiddels. Search engines -- Evaluation Web search engines -- Evaluation
16	Search engine optimisation or paid placement systems: user preference Neethling, Riaan January 2007 (has links) Thesis submitted in fulfilment of the requirements for the degree Magister Technologiae in Information Technology in the Faculty of Informatics and Design at the CAPE PENINSULA UNIVERSITY OF TECHNOLOGY 2007 / The objective of this study was to investigate and report on user preference of Search Engine Optimisation (SEO), versus Pay Per Click (PPC) results. This will assist online advertisers to identify their optimal Search Engine Marketing (SEM) strategy for their specific target market. Research shows that online advertisers perceive PPC as a more effective SEM strategy than SEO. However, empirical evidence exists that PPC may not be the best strategy for online advertisers, creating confusion for advertisers considering a SEM campaign. Furthermore, not all advertisers have the funds to implement a dual strategy and as a result advertisers need to choose between a SEO and PPC campaign. In order for online advertisers to choose the most relevant SEM strategy, it is of importance to understand user perceptions of these strategies. A quantitative research design was used to conduct the study, with the purpose to collect and analyse data. A questionnaire was designed and hosted on a busy website to ensure maximal exposure. The questionnaire focused on how search engine users perceive SEM and their click response towards SEO and PPC respectively. A qualitative research method was also used in the form of an interview. The interview was conducted with representatives of a leading South African search engine, to verify the results and gain experts’ opinions. The data was analysed and the results interpreted. Results indicated that the user perceived relevancy split is 45% for PPC results, and 55% for SEO results, regardless of demographic factors. Failing to invest in either one could cause a significant loss of website traffic. This indicates that advertisers should invest in both PPC and SEO. Advertisers can invest in a PPC campaign for immediate results, and then implement a SEO campaign over a period of time. The results can further be used to adjust a SEM strategy according to the target market group profile of an advertiser, which will ensure maximum effectiveness. Web search engines Internet marketing MTech
17	The crossover point between keyword rich website text and spamdexing Zuze, Herbert January 2011 (has links) Thesis Submitted in fulfilment of the requirements for the degree MAGISTER TECHNOLOGIAE In BUSINESS INFORMATION SYSTEMS in the FACULTY OF BUSINESS at the CAPE PENINSULA UNIVERSITY OF TECHNOLOGY 2011 / With over a billion Internet users surfing the Web daily in search of information, buying, selling and accessing social networks, marketers focus intensively on developing websites that are appealing to both the searchers and the search engines. Millions of webpages are submitted each day for indexing to search engines. The success of a search engine lies in its ability to provide accurate search results. Search engines’ algorithms constantly evaluate websites and webpages that could violate their respective policies. For this reason some websites and webpages are subsequently blacklisted from their index. Websites are increasingly being utilised as marketing tools, which result in major competition amongst websites. Website developers strive to develop websites of high quality, which are unique and content rich as this will assist them in obtaining a high ranking from search engines. By focusing on websites of a high standard, website developers utilise search engine optimisation (SEO) strategies to earn a high search engine ranking. From time to time SEO practitioners abuse SEO techniques in order to trick the search engine algorithms, but the algorithms are programmed to identify and flag these techniques as spamdexing. Search engines do not clearly explain how they interpret keyword stuffing (one form of spamdexing) in a webpage. However, they regard spamdexing in many different ways and do not provide enough detail to clarify what crawlers take into consideration when interpreting the spamdexing status of a website. Furthermore, search engines differ in the way that they interpret spamdexing, but offer no clear quantitative evidence for the crossover point of keyword dense website text to spamdexing. Scholars have indicated different views in respect of spamdexing, characterised by different keyword density measurements in the body text of a webpage. This raised several fundamental questions that form the basis of this research. This research was carried out using triangulation in order to determine how the scholars, search engines and SEO practitioners interpret spamdexing. Five websites with varying keyword densities were designed and submitted to Google, Yahoo! and Bing. Two phases of the experiment were done and the results were recorded. During both phases almost all of the webpages, including the one with a 97.3% keyword density, were indexed. The aforementioned enabled this research to conclusively disregard the keyword stuffing issue, blacklisting and any form of penalisation. Designers are urged to rather concentrate on usability and good values behind building a website. The research explored the fundamental contribution of keywords to webpage indexing and visibility. Keywords used with or without an optimum level of measurement of richness and poorness result in website ranking and indexing. However, the focus should be on the way in which the end user would interpret the content displayed, rather than how the search engine would react towards the content. Furthermore, spamdexing is likely to scare away potential clients and end users instead of embracing them, which is why the time spent on spamdexing should rather be used to produce quality content. Web search engines Web site development MTech
18	In search of search privacy Brandi, Wesley Antonio 22 July 2011 (has links) Search engines have become integral to the way in which we use the Web of today. Not only are they an important real time source of links to relevant information, but they also serve as a starting point to the Web. A veritable treasure trove of the latest news, satellite images, directions from anywhere to anywhere, local traffic updates and global trends ranging from the spread of influenza to which celebrity happens to be the most popular at a particular time. The more popular search engines are collecting incredible amounts of information. In addition to indexing significant portions of the Web they record what hundreds of millions of users around the world are searching for. As more people use a particular search engine, it has the potential to record more information on what is deemed relevant (and in doing so provide better relevance in the future, thereby attracting more users). Unfortunately, the relevance derived from this cycle between the search user and the search engine comes at a cost: privacy. In this work, we take an in depth look at what privacy means within the context of search. We discuss why it is that the search engine must be considered a threat to search privacy. We then investigate potential solutions and eventually propose our own in a bid to enhance search privacy. / Thesis (PhD)--University of Pretoria, 2011. / Computer Science / unrestricted Search privacy Search engines Online privacy UCTD
19	An Investigation into Code Search Engines: The State of the Art Versus Developer Expectations Li, Shuangyi 15 July 2022 (has links) An essential software development tool, code search engines are expected to provide superior accuracy, usability, and performance. However, prior research has neither (1) summarized, categorized, and compared representative code search engines, nor (2) analyzed the actual expectations that developers have for code search engines. This missing knowledge can empower developers to fully benefit from search engines, academic researchers to uncover promising research directions, and industry practitioners to properly marshal their efforts. This thesis fills the aforementioned gaps by drawing a comprehensive picture of code search engines, including their definition, standard processes, existing solutions, common alternatives, and developers' perspectives. We first study the state of the art in code search engines by analyzing academic papers, industry releases, and open-source projects. We then survey more than a 100 software developers to ascertain their usage of and preferences for code search engines. Finally, we juxtapose the results of our study and survey to synthesize a call-for-action for researchers and industry practitioners to better meet the demands software developers make on code search engines. We present the first comprehensive overview of state-of-the-art code search engines by categorizing and comparing them based on their respective search strategies, applicability, and performance. Our user survey revealed a surprising lack of awareness among many developers w.r.t. code search engines, with a high preference for using general-purpose search engines (e.g., Google) or code repositories (e.g., GitHub) to search for code. Our results also clearly identify typical usage scenarios and sought-after properties of code search engines. Our findings can guide software developers in selecting code search engines most suitable for their programming pursuits, suggest new research directions for researchers, and help programming tool builders in creating effective code search engine solutions. / Master of Science / When developing software, programmers rely on source code search engines to find code snippets related to the programming task at hand. Given their importance for software development, source code engines have become the focus of numerous research and industry projects. However, researchers and developers remain largely unaware of each other's efforts and expectations. As a consequence, developers find themselves struggling to determine which engine would best fit their needs, while researchers remain unaware what developers expect from search engines. This thesis address this problem via a three-pronged approach: (1) it provides a systematic review of the research literature and major engines; (2) it analyzes the results of surveying software developers about their experiences with and expectations for code search engines; (3) it presents actionable insights that can guide future research and industry efforts in code search engines to better meet the needs of software developers. Code search engines User survey Domain analysis
20	Google search Unruh, Miriam, McLean, Cheryl, Tittenberger, Peter, Schor, Dario 30 May 2006 (has links) After completing this tutorial you will be able to access "Google", conduct a simple search, and interpret the search results. google search web research beginner Web search engines Internet searching Internet Search Engines

Search results