Global ETD Search

Return to search

Arquitetura e implementação de um sistema distribuído e recuperação de informação / Architecture and implementation of a distributed information retrieval system

A busca por documentos relevantes ao usuário é um problema que se torna mais custoso conforme as bases de conhecimento crescem em seu ritmo acelerado. Este problema passou a resolvido por sistemas distribuídos, devido a sua escalabilidade e tolerância a falhas. O desenvolvimento de sistemas voltados a estas enormes bases de conhecimento -- e a maior de todas, a Internet -- é uma indústria que movimenta bilhões de dólares por ano no mundo inteiro e criou gigantes. Neste trabalho, são apresentadas e discutidas estruturas de dados e arquiteturas distribuídas que tratem o problema de indexar e buscar grandes coleções de documentos em sistemas distribuídos, alcançando grande desempenho e escalabilidade. Serão também discutidos alguns dos grandes sistemas de busca da atualidade, como o Google e o Apache Solr, além do planejamento de uma grande aplicação com protótipo em desenvolvimento. Um projeto próprio de sistema de busca distribuído foi implementado, baseado no Lucene, com idéias coletadas noutros trabalhos e outras novas. Em nossos experimentos, o sistema distribuído desenvolvido neste trabalho superou o Apache Solr com um vazão 37,4\\% superior e mostrou números muito superiores a soluções não-distribuídas em hardware de custo muito superior ao nosso cluster. / The search for relevant documents for the final user is a problem that becomes more expensive as the databases grown faster. The solution was brought by distributed systems, because of its scalability and fail tolerance. The development of systems focused on enormous databases -- including the World Wide Web -- is an industry that involves billions of dollars in the world and had created giants. In this work, will be presented and discussed data structures and distributed architectures related to the indexes and searching in great document collections in distributed systems, reaching high performance and scalability. We will also discuss some of the biggest search engines, such as Google e Apache Solr, and the planning of an application with a developing prototype. At last, a new project of a distributed searching system will be presented and implemented, based on Lucene, with ideas from other works and new ideas of our own. On our tests, the system developed in this work had throughput 37.4\\% higher than Apache Solr and revealed higher performance than non-distributed solutions in a hardware more expensive than our cluster.

http://www.teses.usp.br/teses/disponiveis/45/45134/tde-12072010-110036/

arquivo invertido

distributed systems

information retrieval

inverted file

recuperação de informação

sistemas distribuídos

Identifer	oai:union.ndltd.org:usp.br/oai:teses.usp.br:tde-12072010-110036
Date	09 June 2010
Creators	Augusto, Luiz Daniel Creao
Contributors	Lago, Alair Pereira do
Publisher	Biblioteca Digitais de Teses e Dissertações da USP
Source Sets	Universidade de São Paulo
Language	Portuguese
Detected Language	English
Type	Dissertação de Mestrado
Format	application/pdf
Rights	Liberar o conteúdo para acesso público.

Page generated in 0.0021 seconds

Arquitetura e implementação de um sistema distribuído e recuperação de informação / Architecture and implementation of a distributed information retrieval system

Description

Links & Downloads

Tags

Additional Fields