• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Online evolving clustering approaches to improving web search results

Evans, Anthony D. January 2011 (has links)
A word's semantic interpretation may vary depending on the context in which it is used, and therefore a Web search for documents containing a specific keyword can result in a contextually disorganised set of results which can be undesirable to an end user who wants to quickly find relevant information. Existing search engines are capable of delivering search results quickly but are not able to take into account the contextual meaning from text and, therefore, the results of search engine queries are typically returned as an unordered list. Some searches where the term used has a distinct meaning will return only Web pages that fit that one meaning, however, many search terms can have multiple meanings and this makes locating the set of contextually relevant documents more difficult in situations where a large number of hits are returned. In this study, document clustering is applied as a solution to aid an end user to find relevant information more efficiently from search results. To cluster documents, it is necessary to compare documents and group together those that are similar based on a similarity distance measure involving words shared between those documents so that contextual groupings can be inferred [23J. Uniquely in this study, Cosine distance was also applied in situations that normally used Euclidean measures [24J. Conventional clustering methods were found to be inadequate when applied to search engine results in this study. For example, an Internet search engine's results only provides pointers to documents and does not contain document vectors, so the required complete dataset is therefore unavailable initially. Existing algorithms that cluster pre fetched documents in an off-line mode (BD) are unsuitable. The clustering component needs to be online so that as each document vector is obtained, it can be processed on the fly to provide immediate res ults. In addition to overcoming limitations of conventional clusterin g, this study also identifies and tackles practical challenges to clustering Webpage documents such as how to reduce noise (redundant text, adverts, code) effectively without compromising speed as a consequence of increasing the complexity of pre-processor functions. In summary, this thesis describes the development and implementation of a novel online cluste ring application implemented to deliver enhanced search engine results in real time. An improvement to currently available algorithms such as the use of Cosine based distance measures was required so that the clustering could be carried out on the output of existing search engines without performance degradation as di mensionality increases. This resulted in the implementation an efficient non iterative online data clustering technique capable of high dimensionality processing based on keyword frequency and Potential ca lculations to contextually cluster documents. This approach offers real-time clustering of search results without the consumption of excessive computing resources that is used by the more conventional clustering algorithms while still being able to adapt to the input of new documents 'on the fly'.
2

Search engine bias : the structuration of traffic on the World-Wide Web

Van Couvering, Elizabeth January 2010 (has links)
Search engines are essential components of the World Wide Web; both commercially and in terms of everyday usage, their importance is hard to overstate. This thesis examines the question of why there is bias in search engine results – bias that invites users to click on links to large websites, commercial websites, websites based in certain countries, and websites written in certain languages. In this thesis, the historical development of the search engine industry is traced. Search engines first emerged as prototypical technological startups emanating from Silicon Valley, followed by the acquisition of search engine companies by major US media corporations and their development into portals. The subsequent development of pay-per-click advertising is central to the current industry structure, an oligarchy of virtually integrated companies managing networks of syndicated advertising and traffic distribution. The study also shows a global landscape in which search production is concentrated in and caters for large global advertising markets, leaving the rest of the world with patchy and uneven search results coverage. The analysis of interviews with senior search engine engineers indicates that issues of quality are addressed in terms of customer service and relevance in their discourse, while the analysis of documents, interviews with search marketers, and participant observation within a search engine marketing firm showed that producers and marketers had complex relationships that combine aspects of collaboration, competition, and indifference. The results of the study offer a basis for the synthesis of insights of the political economy of media and communication and the social studies of technology tradition, emphasising the importance of culture in constructing and maintaining both local structures and wider systems. In the case of search engines, the evidence indicates that the culture of the technological entrepreneur is very effective in creating a new megabusiness, but less successful in encouraging a debate on issues of the public good or public responsibility as they relate to the search engine industry.

Page generated in 0.0105 seconds