Spelling suggestions: "subject:"[een] SEARCH ENGINES"" "subject:"[enn] SEARCH ENGINES""
11 |
Effective web crawlersAli, Halil, hali@cs.rmit.edu.au January 2008 (has links)
Web crawlers are the component of a search engine that must traverse the Web, gathering documents in a local repository for indexing by a search engine so that they can be ranked by their relevance to user queries. Whenever data is replicated in an autonomously updated environment, there are issues with maintaining up-to-date copies of documents. When documents are retrieved by a crawler and have subsequently been altered on the Web, the effect is an inconsistency in user search results. While the impact depends on the type and volume of change, many existing algorithms do not take the degree of change into consideration, instead using simple measures that consider any change as significant. Furthermore, many crawler evaluation metrics do not consider index freshness or the amount of impact that crawling algorithms have on user results. Most of the existing work makes assumptions about the change rate of documents on the Web, or relies on the availability of a long history of change. Our work investigates approaches to improving index consistency: detecting meaningful change, measuring the impact of a crawl on collection freshness from a user perspective, developing a framework for evaluating crawler performance, determining the effectiveness of stateless crawl ordering schemes, and proposing and evaluating the effectiveness of a dynamic crawl approach. Our work is concerned specifically with cases where there is little or no past change statistics with which predictions can be made. Our work analyses different measures of change and introduces a novel approach to measuring the impact of recrawl schemes on search engine users. Our schemes detect important changes that affect user results. Other well-known and widely used schemes have to retrieve around twice the data to achieve the same effectiveness as our schemes. Furthermore, while many studies have assumed that the Web changes according to a model, our experimental results are based on real web documents. We analyse various stateless crawl ordering schemes that have no past change statistics with which to predict which documents will change, none of which, to our knowledge, has been tested to determine effectiveness in crawling changed documents. We empirically show that the effectiveness of these schemes depends on the topology and dynamics of the domain crawled and that no one static crawl ordering scheme can effectively maintain freshness, motivating our work on dynamic approaches. We present our novel approach to maintaining freshness, which uses the anchor text linking documents to determine the likelihood of a document changing, based on statistics gathered during the current crawl. We show that this scheme is highly effective when combined with existing stateless schemes. When we combine our scheme with PageRank, our approach allows the crawler to improve both freshness and quality of a collection. Our scheme improves freshness regardless of which stateless scheme it is used in conjunction with, since it uses both positive and negative reinforcement to determine which document to retrieve. Finally, we present the design and implementation of Lara, our own distributed crawler, which we used to develop our testbed.
|
12 |
Search engine optimisation or paid placement systems-user preference /Neethling, Riaan. January 2007 (has links)
Thesis (MTech (Information Technology))--Cape Peninsula University of Technology, 2007. / Includes bibliographical references (leaves 98-113). Also available online.
|
13 |
An application of machine learning techniques to interactive, constraint-based searchHarbert, Christopher W. Shang, Yi, January 2005 (has links)
Thesis (M.S.)--University of Missouri-Columbia, 2005. / The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (December 12, 2006) Includes bibliographical references.
|
14 |
Search algorithms for discovery of Web servicesHicks, Janette M. January 2005 (has links)
Thesis (M.S.)--State University of New York at Binghamton, Watson School of Engineering and Applied Science (Computer Science), 2005. / Includes bibliographical references.
|
15 |
Evaluation and comparison of search enginesMtshontshi, Lindiwe 12 1900 (has links)
Thesis (MPhil)--Stellenbosch University, 2004. / ENGLISH ABSTRACT: A growing body of studies is developing approaches to evaluate human interaction
with Web search engines. Measuring the information retrieval effectiveness of World
Wide Web search engines is costly because of the human relevance judgements
involved. However, both for business enterprises and people it is important to know
the most effective Web search engine, since such search engines help their users find
a higher number of relevant Web pages with less effort. Furthermore, this information
can be used for several practical purposes. This study does not attempt to describe all
the currently available search engines, but provides a comparison of some, which are
deemed to be among the most useful. It concentrates on search engines and their
characteristics only. The goal is to help a new user get the most useful "hits" when
using the various tools. / AFRIKAANSE OPSOMMING: Al hoe meer studies word gedoen om benaderings te ontwikkel vir die evaluasie van
menslike interaksie met Web-soekenjins. Om te meet hoe effektief 'n soekenjin
inligting op die Wêreldwye Web kan opspoor, is duur vanweë die mens se
relevansiebeoordeling wat daarby betrokke is. Dit is egter belangrik dat die
bestuurders van sake-ondememings en ander mense sal weet watter die mees
doeltreffende soekenjins is, aangesien sulke soekenjins hulle gebruikers help om 'n
hoër aantal relevante Webblaaie met minder inspanning te vind. Hierdie inligting
kan ook gebruik word om 'n paar praktiese doelwitte te verwesenlik. Daar word nie
gepoog om al die soekenjins wat tans beskikbaar is, te beskryf nie, maar sommige
van die soekenjins wat as die nuttigste beskou word, word vergelyk. Daar word
alleenlik op soekenjins en hulle kenmerke gekonsentreer. Die doel is om die nuwe
gebruiker te help om die nuttigste inligting te verkry deur gebruik te maak van
verskeie hulpmiddels.
|
16 |
Search engine optimisation or paid placement systems: user preferenceNeethling, Riaan January 2007 (has links)
Thesis
submitted in fulfilment
of the requirements for the degree
Magister Technologiae
in
Information Technology
in the
Faculty of Informatics and Design
at the
CAPE PENINSULA UNIVERSITY OF TECHNOLOGY
2007 / The objective of this study was to investigate and report on user preference of
Search Engine Optimisation (SEO), versus Pay Per Click (PPC) results. This will
assist online advertisers to identify their optimal Search Engine Marketing (SEM)
strategy for their specific target market.
Research shows that online advertisers perceive PPC as a more effective SEM
strategy than SEO. However, empirical evidence exists that PPC may not be the
best strategy for online advertisers, creating confusion for advertisers considering a
SEM campaign. Furthermore, not all advertisers have the funds to implement a dual
strategy and as a result advertisers need to choose between a SEO and PPC
campaign. In order for online advertisers to choose the most relevant SEM strategy,
it is of importance to understand user perceptions of these strategies.
A quantitative research design was used to conduct the study, with the purpose to
collect and analyse data. A questionnaire was designed and hosted on a busy
website to ensure maximal exposure. The questionnaire focused on how search
engine users perceive SEM and their click response towards SEO and PPC
respectively. A qualitative research method was also used in the form of an
interview. The interview was conducted with representatives of a leading South
African search engine, to verify the results and gain experts’ opinions.
The data was analysed and the results interpreted. Results indicated that the user
perceived relevancy split is 45% for PPC results, and 55% for SEO results,
regardless of demographic factors. Failing to invest in either one could cause a
significant loss of website traffic. This indicates that advertisers should invest in both
PPC and SEO. Advertisers can invest in a PPC campaign for immediate results, and
then implement a SEO campaign over a period of time. The results can further be
used to adjust a SEM strategy according to the target market group profile of an
advertiser, which will ensure maximum effectiveness.
|
17 |
The crossover point between keyword rich website text and spamdexingZuze, Herbert January 2011 (has links)
Thesis
Submitted in fulfilment
of the requirements for the degree
MAGISTER TECHNOLOGIAE
In
BUSINESS INFORMATION SYSTEMS
in the
FACULTY OF BUSINESS
at the
CAPE PENINSULA UNIVERSITY OF TECHNOLOGY
2011 / With over a billion Internet users surfing the Web daily in search of information, buying,
selling and accessing social networks, marketers focus intensively on developing websites
that are appealing to both the searchers and the search engines. Millions of webpages are
submitted each day for indexing to search engines. The success of a search engine lies in its
ability to provide accurate search results. Search engines’ algorithms constantly evaluate
websites and webpages that could violate their respective policies. For this reason some
websites and webpages are subsequently blacklisted from their index.
Websites are increasingly being utilised as marketing tools, which result in major competition
amongst websites. Website developers strive to develop websites of high quality, which are
unique and content rich as this will assist them in obtaining a high ranking from search
engines. By focusing on websites of a high standard, website developers utilise search
engine optimisation (SEO) strategies to earn a high search engine ranking.
From time to time SEO practitioners abuse SEO techniques in order to trick the search
engine algorithms, but the algorithms are programmed to identify and flag these techniques
as spamdexing. Search engines do not clearly explain how they interpret keyword stuffing
(one form of spamdexing) in a webpage. However, they regard spamdexing in many different
ways and do not provide enough detail to clarify what crawlers take into consideration when
interpreting the spamdexing status of a website. Furthermore, search engines differ in the
way that they interpret spamdexing, but offer no clear quantitative evidence for the crossover
point of keyword dense website text to spamdexing. Scholars have indicated different views
in respect of spamdexing, characterised by different keyword density measurements in the
body text of a webpage. This raised several fundamental questions that form the basis of this
research.
This research was carried out using triangulation in order to determine how the scholars,
search engines and SEO practitioners interpret spamdexing. Five websites with varying
keyword densities were designed and submitted to Google, Yahoo! and Bing. Two phases of
the experiment were done and the results were recorded. During both phases almost all of
the webpages, including the one with a 97.3% keyword density, were indexed. The
aforementioned enabled this research to conclusively disregard the keyword stuffing issue,
blacklisting and any form of penalisation. Designers are urged to rather concentrate on
usability and good values behind building a website.
The research explored the fundamental contribution of keywords to webpage indexing and
visibility. Keywords used with or without an optimum level of measurement of richness and
poorness result in website ranking and indexing. However, the focus should be on the way in
which the end user would interpret the content displayed, rather than how the search engine
would react towards the content. Furthermore, spamdexing is likely to scare away potential
clients and end users instead of embracing them, which is why the time spent on
spamdexing should rather be used to produce quality content.
|
18 |
In search of search privacyBrandi, Wesley Antonio 22 July 2011 (has links)
Search engines have become integral to the way in which we use the Web of today. Not only are they an important real time source of links to relevant information, but they also serve as a starting point to the Web. A veritable treasure trove of the latest news, satellite images, directions from anywhere to anywhere, local traffic updates and global trends ranging from the spread of influenza to which celebrity happens to be the most popular at a particular time. The more popular search engines are collecting incredible amounts of information. In addition to indexing significant portions of the Web they record what hundreds of millions of users around the world are searching for. As more people use a particular search engine, it has the potential to record more information on what is deemed relevant (and in doing so provide better relevance in the future, thereby attracting more users). Unfortunately, the relevance derived from this cycle between the search user and the search engine comes at a cost: privacy. In this work, we take an in depth look at what privacy means within the context of search. We discuss why it is that the search engine must be considered a threat to search privacy. We then investigate potential solutions and eventually propose our own in a bid to enhance search privacy. / Thesis (PhD)--University of Pretoria, 2011. / Computer Science / unrestricted
|
19 |
An Investigation into Code Search Engines: The State of the Art Versus Developer ExpectationsLi, Shuangyi 15 July 2022 (has links)
An essential software development tool, code search engines are expected to provide superior accuracy, usability, and performance. However, prior research has neither (1) summarized, categorized, and compared representative code search engines, nor (2) analyzed the actual expectations that developers have for code search engines. This missing knowledge can empower developers to fully benefit from search engines, academic researchers to uncover promising research directions, and industry practitioners to properly marshal their efforts. This thesis fills the aforementioned gaps by drawing a comprehensive picture of code search engines, including their definition, standard processes, existing solutions, common alternatives, and developers' perspectives. We first study the state of the art in code search engines by analyzing academic papers, industry releases, and open-source projects. We then survey more than a 100 software developers to ascertain their usage of and preferences for code search engines. Finally, we juxtapose the results of our study and survey to synthesize a call-for-action for researchers and industry practitioners to better meet the demands software developers make on code search engines. We present the first comprehensive overview of state-of-the-art code search engines by categorizing and comparing them based on their respective search strategies, applicability, and performance. Our user survey revealed a surprising lack of awareness among many developers w.r.t. code search engines, with a high preference for using general-purpose search engines (e.g., Google) or code repositories (e.g., GitHub) to search for code. Our results also clearly identify typical usage scenarios and sought-after properties of code search engines. Our findings can guide software developers in selecting code search engines most suitable for their programming pursuits, suggest new research directions for researchers, and help programming tool builders in creating effective code search engine solutions. / Master of Science / When developing software, programmers rely on source code search engines to find code snippets related to the programming task at hand. Given their importance for software development, source code engines have become the focus of numerous research and industry projects. However, researchers and developers remain largely unaware of each other's efforts and expectations. As a consequence, developers find themselves struggling to determine which engine would best fit their needs, while researchers remain unaware what developers expect from search engines. This thesis address this problem via a three-pronged approach: (1) it provides a systematic review of the research literature and major engines; (2) it analyzes the results of surveying software developers about their experiences with and expectations for code search engines; (3) it presents actionable insights that can guide future research and industry efforts in code search engines to better meet the needs of software developers.
|
20 |
Google searchUnruh, Miriam, McLean, Cheryl, Tittenberger, Peter, Schor, Dario 30 May 2006 (has links)
After completing this tutorial you will be able to access "Google", conduct a simple search, and interpret the search results.
|
Page generated in 0.0657 seconds