1 |
WebCrawler : finding what people want /Pinkerton, Brian. January 2000 (has links)
Thesis (Ph. D.)--University of Washington, 2000. / Vita. Includes bibliographical references (leaves 89-93).
|
2 |
Meta-search and distributed search systems /Shen, Yipeng. January 2002 (has links)
Thesis (Ph. D.)--Hong Kong University of Science and Technology, 2002. / Includes bibliographical references (leaves 138-144). Also available in electronic version. Access restricted to campus users.
|
3 |
Contextualized web search: query-dependent ranking and social media searchBian, Jiang 29 September 2010 (has links)
Due to the information explosion on the Internet, effective information search techniques are required to retrieve the desired information from the Web. Based on much analysis on users' search intention and the variant forms of Web content, we find that both the query and the indexed web content are often associated with various context information, which can provide much essential information to indicate the ranking relevance in Web search. This dissertation seeks to develop new search algorithms and techniques by taking advantage of rich context information to improve search quality and consists of two major parts.
In the first part, we study the context of the query in terms of various ranking objectives of different queries. In order to improve the ranking relevance, we propose to incorporate such query context information into the ranking model. Two general approaches will be introduced in the following of this dissertation. The first one proposes to incorporate query difference into ranking by introducing query-dependent loss functions, by optimizing which we can obtain better ranking model. Then, we investigate another approach which applies a divide-and-conquer framework for ranking specialization.
The second part of this dissertation investigates how to extract the context of specific Web content and explore them to build more effective search system. This study is based on the new emerging social media content. Unlike traditional Web content, social media content is inherently associated with much new context information, including content semantics and quality, user reputation, and user interactions, all of which provide useful information for acquiring knowledge from social media. In this dissertation, we seek to develop algorithms and techniques for effective knowledge acquisition from collaborative social media environments by using the dynamic context information. We first propose a new general framework for searching social media content, which integrates both the content features and the user interactions. Then, a semi-supervised framework is proposed to explicitly compute content quality and user reputation, which are incorporated into the search framework to improve the search quality. Furthermore, this dissertation also investigates techniques for extracting the structured semantics of social media content as new context information, which is essential for content retrieval and organization.
|
4 |
Methods for Distributed Information RetrievalCraswell, Nicholas Eric, Nick.Craswell@anu.edu.au January 2001 (has links)
Published methods for distributed information retrieval generally rely on cooperation from search servers. But most real servers, particularly the tens of thousands available on the Web, are not engineered for such cooperation. This means that the majority of methods proposed, and evaluated in simulated environments of homogeneous cooperating servers, are never applied in practice.
¶
This thesis introduces new methods for server selection and results merging. The methods do not require search servers to cooperate, yet are as effective as the best methods which do. Two large experiments evaluate the new methods against many previously published methods. In contrast to previous experiments they simulate a Web-like environment, where servers employ varied retrieval algorithms and tend not to sub-partition documents from a single source.
¶
The server selection experiment uses pages from 956 real Web servers, three different retrieval systems and TREC ad hoc topics. Results show that a broker using queries to sample servers documents can perform selection over non-cooperating servers without loss of effectiveness. However, using the same queries to estimate the effectiveness of servers, in order to favour servers with high quality retrieval systems, did not consistently improve selection effectiveness.
¶
The results merging experiment uses documents from five TREC sub-collections, five different retrieval systems and TREC ad hoc topics. Results show that a broker using a reference set of collection statistics, rather than relying on cooperation to collate true statistics, can perform merging without loss of effectiveness. Since application of the reference statistics method requires that the broker download the documents to be merged, experiments were also conducted on effective merging based on partial documents. The new ranking method developed was not highly effective on partial documents, but showed some promise on fully downloaded documents.
¶
Using the new methods, an effective search broker can be built, capable of addressing any given set of available search servers, without their cooperation.
|
5 |
Investigating usability of search engines in small screen devices : a systems engineering approachMoulik, Anand 22 February 2006 (has links)
In today's world, desktop computers have become such an integral part of
our lives that it is practically impossible to imagine anything being done without the
aid of computers. As the world becomes more and more fast paced and users feel a
need to have computers on the go, desktop computers have reduced in size without
compromising on performance. The late 90s saw the desktop segment make room
for the laptop and the small screen devices (SSD) segment, which demonstrated
faster growth rates than the desktop segment. The SSD segment, however, had a
growth rate that was nowhere near the combined growth rate of desktop and laptop
computers. Portability of SSD was one factor that stood out among many others to
account for the unprecedented growth rate of the SSD segment that the computer
industry had witnessed. One of the most important, albeit under-represented and
neglected, factors of a product is its usability. Usability, or the ease with which a
product can be used, can be considered to be one of the most important factors in
the success or failure of product. Determining the usability of small screen devices
presents a bigger challenge, primarily because of the screen size of the SSD. The
process of usability engineering aims to solve some/most of the problems that the
SSD has. To make up for the drawbacks of usability engineering, systems
engineering was used in this thesis, since both disciplines have considerable overlap
in their processes. A growing number of SSD users use the Internet in one form or
the other. The Internet has grown rapidly in the last decade, and nearly everyone
using the Internet has come across a search engine sometime or other. Although
research has been limited to the area of desktop search engines, there has not been
enough research done in the area of search engines for small screen devices. This
thesis compares two different search engines on small screen devices to find the
better between the two. To do so, it takes a close look at the usability engineering
approach from a system engineering perspective revealing several deficiencies,
which may have hitherto gone unnoticed. It also shows a method to integrate several
key Systems Engineering components into the usability engineering approach. / Graduation date: 2006
|
6 |
On improving the relevancy ranking algorithm in web search engine李莉華, Lee, Lei-wah. January 2000 (has links)
published_or_final_version / Computer Science and Information Systems / Master / Master of Philosophy
|
7 |
On improving the relevancy ranking algorithm in web search engine /Lee, Lei-wah. January 2000 (has links)
Thesis (M. Phil.)--University of Hong Kong, 2000. / Includes bibliographical references (leaves 78-81).
|
8 |
Mining user preference using SPY voting for search engine personalization /Deng, Lin. January 2006 (has links)
Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2006. / Includes bibliographical references (leaves 68-73). Also available in electronic version.
|
9 |
Exploiting the structure of the web for spidering /Young, Joel D. January 2005 (has links)
Thesis (Ph.D.)--Brown University, 2005. / Vita. Thesis advisor: Thomas L. Dean. Includes bibliographical references (leaves 185-191). Also available online.
|
10 |
Incremental document clustering for web page classification.January 2000 (has links)
by Wong, Wai-Chiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 89-94). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgments --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Document Clustering --- p.2 / Chapter 1.2 --- DC-tree --- p.4 / Chapter 1.3 --- Feature Extraction --- p.5 / Chapter 1.4 --- Outline of the Thesis --- p.5 / Chapter 2 --- Related Work --- p.8 / Chapter 2.1 --- Clustering Algorithms --- p.8 / Chapter 2.1.1 --- Partitional Clustering Algorithms --- p.8 / Chapter 2.1.2 --- Hierarchical Clustering Algorithms --- p.10 / Chapter 2.2 --- Document Classification by Examples --- p.11 / Chapter 2.2.1 --- k-NN algorithm - Expert Network (ExpNet) --- p.11 / Chapter 2.2.2 --- Learning Linear Text Classifier --- p.12 / Chapter 2.2.3 --- Generalized Instance Set (GIS) algorithm --- p.12 / Chapter 2.3 --- Document Clustering --- p.13 / Chapter 2.3.1 --- B+-tree-based Document Clustering --- p.13 / Chapter 2.3.2 --- Suffix Tree Clustering --- p.14 / Chapter 2.3.3 --- Association Rule Hypergraph Partitioning Algorithm --- p.15 / Chapter 2.3.4 --- Principal Component Divisive Partitioning --- p.17 / Chapter 2.4 --- Projections for Efficient Document Clustering --- p.18 / Chapter 3 --- Background --- p.21 / Chapter 3.1 --- Document Preprocessing --- p.21 / Chapter 3.1.1 --- Elimination of Stopwords --- p.22 / Chapter 3.1.2 --- Stemming Technique --- p.22 / Chapter 3.2 --- Problem Modeling --- p.23 / Chapter 3.2.1 --- Basic Concepts --- p.23 / Chapter 3.2.2 --- Vector Model --- p.24 / Chapter 3.3 --- Feature Selection Scheme --- p.25 / Chapter 3.4 --- Similarity Model --- p.27 / Chapter 3.5 --- Evaluation Techniques --- p.29 / Chapter 4 --- Feature Extraction and Weighting --- p.31 / Chapter 4.1 --- Statistical Analysis of the Words in the Web Domain --- p.31 / Chapter 4.2 --- Zipf's Law --- p.33 / Chapter 4.3 --- Traditional Methods --- p.36 / Chapter 4.4 --- The Proposed Method --- p.38 / Chapter 4.5 --- Experimental Results --- p.40 / Chapter 4.5.1 --- Synthetic Data Generation --- p.40 / Chapter 4.5.2 --- Real Data Source --- p.41 / Chapter 4.5.3 --- Coverage --- p.41 / Chapter 4.5.4 --- Clustering Quality --- p.43 / Chapter 4.5.5 --- Binary Weight vs Numerical Weight --- p.45 / Chapter 5 --- Web Document Clustering Using DC-tree --- p.48 / Chapter 5.1 --- Document Representation --- p.48 / Chapter 5.2 --- Document Cluster (DC) --- p.49 / Chapter 5.3 --- DC-tree --- p.52 / Chapter 5.3.1 --- Tree Definition --- p.52 / Chapter 5.3.2 --- Insertion --- p.54 / Chapter 5.3.3 --- Node Splitting --- p.55 / Chapter 5.3.4 --- Deletion and Node Merging --- p.56 / Chapter 5.4 --- The Overall Strategy --- p.57 / Chapter 5.4.1 --- Preprocessing --- p.57 / Chapter 5.4.2 --- Building DC-tree --- p.59 / Chapter 5.4.3 --- Identifying the Interesting Clusters --- p.60 / Chapter 5.5 --- Experimental Results --- p.61 / Chapter 5.5.1 --- Alternative Similarity Measurement : Synthetic Data --- p.61 / Chapter 5.5.2 --- DC-tree Characteristics : Synthetic Data --- p.63 / Chapter 5.5.3 --- Compare DC-tree and B+-tree: Synthetic Data --- p.64 / Chapter 5.5.4 --- Compare DC-tree and B+-tree: Real Data --- p.66 / Chapter 5.5.5 --- Varying the Number of Features : Synthetic Data --- p.67 / Chapter 5.5.6 --- Non-Correlated Topic Web Page Collection: Real Data --- p.69 / Chapter 5.5.7 --- Correlated Topic Web Page Collection: Real Data --- p.71 / Chapter 5.5.8 --- Incremental updates on Real Data Set --- p.72 / Chapter 5.5.9 --- Comparison with the other clustering algorithms --- p.73 / Chapter 6 --- Conclusion --- p.75 / Appendix --- p.77 / Chapter A --- Stopword List --- p.77 / Chapter B --- Porter's Stemming Algorithm --- p.81 / Chapter C --- Insertion Algorithm --- p.83 / Chapter D --- Node Splitting Algorithm --- p.85 / Chapter E --- Features Extracted in Experiment 4.53 --- p.87 / Bibliography --- p.88
|
Page generated in 0.0731 seconds