Global ETD Search

1	WebCrawler : finding what people want / Pinkerton, Brian. January 2000 (has links) Thesis (Ph. D.)--University of Washington, 2000. / Vita. Includes bibliographical references (leaves 89-93). Web search engines.
2	Meta-search and distributed search systems / Shen, Yipeng. January 2002 (has links) Thesis (Ph. D.)--Hong Kong University of Science and Technology, 2002. / Includes bibliographical references (leaves 138-144). Also available in electronic version. Access restricted to campus users. Web search engines.
3	Contextualized web search: query-dependent ranking and social media search Bian, Jiang 29 September 2010 (has links) Due to the information explosion on the Internet, effective information search techniques are required to retrieve the desired information from the Web. Based on much analysis on users' search intention and the variant forms of Web content, we find that both the query and the indexed web content are often associated with various context information, which can provide much essential information to indicate the ranking relevance in Web search. This dissertation seeks to develop new search algorithms and techniques by taking advantage of rich context information to improve search quality and consists of two major parts. In the first part, we study the context of the query in terms of various ranking objectives of different queries. In order to improve the ranking relevance, we propose to incorporate such query context information into the ranking model. Two general approaches will be introduced in the following of this dissertation. The first one proposes to incorporate query difference into ranking by introducing query-dependent loss functions, by optimizing which we can obtain better ranking model. Then, we investigate another approach which applies a divide-and-conquer framework for ranking specialization. The second part of this dissertation investigates how to extract the context of specific Web content and explore them to build more effective search system. This study is based on the new emerging social media content. Unlike traditional Web content, social media content is inherently associated with much new context information, including content semantics and quality, user reputation, and user interactions, all of which provide useful information for acquiring knowledge from social media. In this dissertation, we seek to develop algorithms and techniques for effective knowledge acquisition from collaborative social media environments by using the dynamic context information. We first propose a new general framework for searching social media content, which integrates both the content features and the user interactions. Then, a semi-supervised framework is proposed to explicitly compute content quality and user reputation, which are incorporated into the search framework to improve the search quality. Furthermore, this dissertation also investigates techniques for extracting the structured semantics of social media content as new context information, which is essential for content retrieval and organization. Social media Ranking model Web search Web search engines
4	Methods for Distributed Information Retrieval Craswell, Nicholas Eric, Nick.Craswell@anu.edu.au January 2001 (has links) Published methods for distributed information retrieval generally rely on cooperation from search servers. But most real servers, particularly the tens of thousands available on the Web, are not engineered for such cooperation. This means that the majority of methods proposed, and evaluated in simulated environments of homogeneous cooperating servers, are never applied in practice. ¶ This thesis introduces new methods for server selection and results merging. The methods do not require search servers to cooperate, yet are as effective as the best methods which do. Two large experiments evaluate the new methods against many previously published methods. In contrast to previous experiments they simulate a Web-like environment, where servers employ varied retrieval algorithms and tend not to sub-partition documents from a single source. ¶ The server selection experiment uses pages from 956 real Web servers, three different retrieval systems and TREC ad hoc topics. Results show that a broker using queries to sample servers documents can perform selection over non-cooperating servers without loss of effectiveness. However, using the same queries to estimate the effectiveness of servers, in order to favour servers with high quality retrieval systems, did not consistently improve selection effectiveness. ¶ The results merging experiment uses documents from five TREC sub-collections, five different retrieval systems and TREC ad hoc topics. Results show that a broker using a reference set of collection statistics, rather than relying on cooperation to collate true statistics, can perform merging without loss of effectiveness. Since application of the reference statistics method requires that the broker download the documents to be merged, experiments were also conducted on effective merging based on partial documents. The new ranking method developed was not highly effective on partial documents, but showed some promise on fully downloaded documents. ¶ Using the new methods, an effective search broker can be built, capable of addressing any given set of available search servers, without their cooperation. web search distributed information retrieval
5	Investigating usability of search engines in small screen devices : a systems engineering approach Moulik, Anand 22 February 2006 (has links) In today's world, desktop computers have become such an integral part of our lives that it is practically impossible to imagine anything being done without the aid of computers. As the world becomes more and more fast paced and users feel a need to have computers on the go, desktop computers have reduced in size without compromising on performance. The late 90s saw the desktop segment make room for the laptop and the small screen devices (SSD) segment, which demonstrated faster growth rates than the desktop segment. The SSD segment, however, had a growth rate that was nowhere near the combined growth rate of desktop and laptop computers. Portability of SSD was one factor that stood out among many others to account for the unprecedented growth rate of the SSD segment that the computer industry had witnessed. One of the most important, albeit under-represented and neglected, factors of a product is its usability. Usability, or the ease with which a product can be used, can be considered to be one of the most important factors in the success or failure of product. Determining the usability of small screen devices presents a bigger challenge, primarily because of the screen size of the SSD. The process of usability engineering aims to solve some/most of the problems that the SSD has. To make up for the drawbacks of usability engineering, systems engineering was used in this thesis, since both disciplines have considerable overlap in their processes. A growing number of SSD users use the Internet in one form or the other. The Internet has grown rapidly in the last decade, and nearly everyone using the Internet has come across a search engine sometime or other. Although research has been limited to the area of desktop search engines, there has not been enough research done in the area of search engines for small screen devices. This thesis compares two different search engines on small screen devices to find the better between the two. To do so, it takes a close look at the usability engineering approach from a system engineering perspective revealing several deficiencies, which may have hitherto gone unnoticed. It also shows a method to integrate several key Systems Engineering components into the usability engineering approach. / Graduation date: 2006 Web search engines Pocket computers
6	On improving the relevancy ranking algorithm in web search engine 李莉華, Lee, Lei-wah. January 2000 (has links) published_or_final_version / Computer Science and Information Systems / Master / Master of Philosophy Web search engines. Internet searching.
7	On improving the relevancy ranking algorithm in web search engine / Lee, Lei-wah. January 2000 (has links) Thesis (M. Phil.)--University of Hong Kong, 2000. / Includes bibliographical references (leaves 78-81). Web search engines. Internet searching.
8	Mining user preference using SPY voting for search engine personalization / Deng, Lin. January 2006 (has links) Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2006. / Includes bibliographical references (leaves 68-73). Also available in electronic version. Web search engines Data mining.
9	Exploiting the structure of the web for spidering / Young, Joel D. January 2005 (has links) Thesis (Ph.D.)--Brown University, 2005. / Vita. Thesis advisor: Thomas L. Dean. Includes bibliographical references (leaves 185-191). Also available online. World Wide Web. Search engines.
10	Incremental document clustering for web page classification. January 2000 (has links) by Wong, Wai-Chiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 89-94). / Abstracts in English and Chinese. / Abstract --- p.ii / Acknowledgments --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Document Clustering --- p.2 / Chapter 1.2 --- DC-tree --- p.4 / Chapter 1.3 --- Feature Extraction --- p.5 / Chapter 1.4 --- Outline of the Thesis --- p.5 / Chapter 2 --- Related Work --- p.8 / Chapter 2.1 --- Clustering Algorithms --- p.8 / Chapter 2.1.1 --- Partitional Clustering Algorithms --- p.8 / Chapter 2.1.2 --- Hierarchical Clustering Algorithms --- p.10 / Chapter 2.2 --- Document Classification by Examples --- p.11 / Chapter 2.2.1 --- k-NN algorithm - Expert Network (ExpNet) --- p.11 / Chapter 2.2.2 --- Learning Linear Text Classifier --- p.12 / Chapter 2.2.3 --- Generalized Instance Set (GIS) algorithm --- p.12 / Chapter 2.3 --- Document Clustering --- p.13 / Chapter 2.3.1 --- B+-tree-based Document Clustering --- p.13 / Chapter 2.3.2 --- Suffix Tree Clustering --- p.14 / Chapter 2.3.3 --- Association Rule Hypergraph Partitioning Algorithm --- p.15 / Chapter 2.3.4 --- Principal Component Divisive Partitioning --- p.17 / Chapter 2.4 --- Projections for Efficient Document Clustering --- p.18 / Chapter 3 --- Background --- p.21 / Chapter 3.1 --- Document Preprocessing --- p.21 / Chapter 3.1.1 --- Elimination of Stopwords --- p.22 / Chapter 3.1.2 --- Stemming Technique --- p.22 / Chapter 3.2 --- Problem Modeling --- p.23 / Chapter 3.2.1 --- Basic Concepts --- p.23 / Chapter 3.2.2 --- Vector Model --- p.24 / Chapter 3.3 --- Feature Selection Scheme --- p.25 / Chapter 3.4 --- Similarity Model --- p.27 / Chapter 3.5 --- Evaluation Techniques --- p.29 / Chapter 4 --- Feature Extraction and Weighting --- p.31 / Chapter 4.1 --- Statistical Analysis of the Words in the Web Domain --- p.31 / Chapter 4.2 --- Zipf's Law --- p.33 / Chapter 4.3 --- Traditional Methods --- p.36 / Chapter 4.4 --- The Proposed Method --- p.38 / Chapter 4.5 --- Experimental Results --- p.40 / Chapter 4.5.1 --- Synthetic Data Generation --- p.40 / Chapter 4.5.2 --- Real Data Source --- p.41 / Chapter 4.5.3 --- Coverage --- p.41 / Chapter 4.5.4 --- Clustering Quality --- p.43 / Chapter 4.5.5 --- Binary Weight vs Numerical Weight --- p.45 / Chapter 5 --- Web Document Clustering Using DC-tree --- p.48 / Chapter 5.1 --- Document Representation --- p.48 / Chapter 5.2 --- Document Cluster (DC) --- p.49 / Chapter 5.3 --- DC-tree --- p.52 / Chapter 5.3.1 --- Tree Definition --- p.52 / Chapter 5.3.2 --- Insertion --- p.54 / Chapter 5.3.3 --- Node Splitting --- p.55 / Chapter 5.3.4 --- Deletion and Node Merging --- p.56 / Chapter 5.4 --- The Overall Strategy --- p.57 / Chapter 5.4.1 --- Preprocessing --- p.57 / Chapter 5.4.2 --- Building DC-tree --- p.59 / Chapter 5.4.3 --- Identifying the Interesting Clusters --- p.60 / Chapter 5.5 --- Experimental Results --- p.61 / Chapter 5.5.1 --- Alternative Similarity Measurement : Synthetic Data --- p.61 / Chapter 5.5.2 --- DC-tree Characteristics : Synthetic Data --- p.63 / Chapter 5.5.3 --- Compare DC-tree and B+-tree: Synthetic Data --- p.64 / Chapter 5.5.4 --- Compare DC-tree and B+-tree: Real Data --- p.66 / Chapter 5.5.5 --- Varying the Number of Features : Synthetic Data --- p.67 / Chapter 5.5.6 --- Non-Correlated Topic Web Page Collection: Real Data --- p.69 / Chapter 5.5.7 --- Correlated Topic Web Page Collection: Real Data --- p.71 / Chapter 5.5.8 --- Incremental updates on Real Data Set --- p.72 / Chapter 5.5.9 --- Comparison with the other clustering algorithms --- p.73 / Chapter 6 --- Conclusion --- p.75 / Appendix --- p.77 / Chapter A --- Stopword List --- p.77 / Chapter B --- Porter's Stemming Algorithm --- p.81 / Chapter C --- Insertion Algorithm --- p.83 / Chapter D --- Node Splitting Algorithm --- p.85 / Chapter E --- Features Extracted in Experiment 4.53 --- p.87 / Bibliography --- p.88 Web search engines Cluster analysis Discriminant analysis

Search results