Global ETD Search

1	Methods for Distributed Information Retrieval Craswell, Nicholas Eric, Nick.Craswell@anu.edu.au January 2001 (has links) Published methods for distributed information retrieval generally rely on cooperation from search servers. But most real servers, particularly the tens of thousands available on the Web, are not engineered for such cooperation. This means that the majority of methods proposed, and evaluated in simulated environments of homogeneous cooperating servers, are never applied in practice. ¶ This thesis introduces new methods for server selection and results merging. The methods do not require search servers to cooperate, yet are as effective as the best methods which do. Two large experiments evaluate the new methods against many previously published methods. In contrast to previous experiments they simulate a Web-like environment, where servers employ varied retrieval algorithms and tend not to sub-partition documents from a single source. ¶ The server selection experiment uses pages from 956 real Web servers, three different retrieval systems and TREC ad hoc topics. Results show that a broker using queries to sample servers documents can perform selection over non-cooperating servers without loss of effectiveness. However, using the same queries to estimate the effectiveness of servers, in order to favour servers with high quality retrieval systems, did not consistently improve selection effectiveness. ¶ The results merging experiment uses documents from five TREC sub-collections, five different retrieval systems and TREC ad hoc topics. Results show that a broker using a reference set of collection statistics, rather than relying on cooperation to collate true statistics, can perform merging without loss of effectiveness. Since application of the reference statistics method requires that the broker download the documents to be merged, experiments were also conducted on effective merging based on partial documents. The new ranking method developed was not highly effective on partial documents, but showed some promise on fully downloaded documents. ¶ Using the new methods, an effective search broker can be built, capable of addressing any given set of available search servers, without their cooperation. web search distributed information retrieval
2	Federated Text Retrieval from Independent Collections Shokouhi, Milad, milads@microsoft.com January 2008 (has links) Federated information retrieval is a technique for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely to return relevant answers. The results returned by selected collections are integrated and merged into a single list. Federated search is preferred over centralized search alternatives in many environments. For example, commercial search engines such as Google cannot index uncrawlable hidden web collections; federated information retrieval systems can search the contents of hidden web collections without crawling. In enterprise environments, where each organization maintains an independent search engine, federated search techniques can provide parallel search over multiple collections. There are three major challenges in federated search. For each query, a subset of collections that are most likely to return relevant documents are selected. This creates the collection selection problem. To be able to select suitable collections, federated information retrieval systems acquire some knowledge about the contents of each collection, creating the collection representation problem. The results returned from the selected collections are merged before the final presentation to the user. This final step is the result merging problem. In this thesis, we propose new approaches for each of these problems. Our suggested methods, for collection representation, collection selection, and result merging, outperform state-of-the-art techniques in most cases. We also propose novel methods for estimating the number of documents in collections, and for pruning unnecessary information from collection representations sets. Although management of document duplication has been cited as one of the major problems in federated search, prior research in this area often assumes that collections are free of overlap. We investigate the effectiveness of federated search on overlapped collections, and propose new methods for maximizing the number of distinct relevant documents in the final merged results. In summary, this thesis introduces several new contributions to the field of federated information retrieval, including practical solutions to some historically unsolved problems in federated search, such as document duplication management. We test our techniques on multiple testbeds that simulate both hidden web and enterprise search environments. Federated search distributed information retrieval federated text retrieval metasearch
3	A Hopfield-Tank Neural Network Approach to Solving the Mobile Agent Planning Problem Wang, Jin-Fu 27 June 2006 (has links) Mobile agent planning (MAP) is increasingly viewed as an important technique of information retrieval systems to provide location aware services of minimum cost in mobile computing environment. Although Hopfield-Tank neural network has been proposed for solving the traveling salesperson problem, little attention has been paid to the time constraints on resource validity for optimizing the cost of the mobile agent. Consequently, we hypothesized that Hopfield-Tank neural network can be used to solve the MAP problem. To test this hypothesis, we modify Hopfield-Tank neural network and design a new energy function to not only cope with the dynamic temporal features of the computing environment, in particular the server performance and network latency when scheduling mobile agents, but also satisfy the location-based constraints such as the starting and end node of the routing sequence must be the home site of the traveling mobile agent. In addition, the energy function is reformulated into a Lyapunov function to guarantee the convergent stable state and existence of the valid solution. The connection weights between the neurons and the activation function of state variables in the dynamic network are devised in searching for the valid solutions. Moreover, the objective function is derived to estimate the completion time of the valid solutions and predict the optimal routing path. Simulations study was conducted to evaluate the proposed model and algorithm for different time variables and various coefficient values of the energy function. The experimental results quantitatively demonstrate the computational power and speed of the proposed model by producing solutions that are very close to the minimum costs of the location-based and time-constrained distributed MAP problem rapidly. The spatio-temporal technique proposed in this work is an innovative approach in providing knowledge applicable to improving the effectiveness of solving optimization problems. Spatio-temporal optimization problem Mobile agent planning Hopfield-Tank neural network Dynamic environment Distributed information retrieval
4	Peer to peer English/Chinese cross-language information retrieval Lu, Chengye January 2008 (has links) Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.

1

Page generated in 0.1747 seconds