The subject of this thesis is searching in unstructured peer-to-peer systems.
Such systems have been used for a variety of different applications, including
file-sharing, content distribution and video streaming. These applications have been very popular; they contribute to a large percentage of today's Internet traffic and their users typically number in the millions.
By searching, we refer to the process of locating content stored by peers.
Searching in unstructured peer-to-peer systems poses a challenge because of high churn:
both the topology and the content stored by peers can change quickly as peers arrive and depart, while the network formed under this churn process can be arbitrary at any point in time. As a result, a search mechanism must operate without any a priori assumptions on this dynamic topology.
Ideally,
a search mechanism should be scalable: as, typically, peers have limited bandwidth, the traffic generated by queries should not grow significantly as the peer population increases.
Moreover, a search mechanism should also be reliable: if certain content is in the system, searching should locate it with reasonable guarantees. These two goals can be conflicting, as generating more queries increases a mechanism's reliability but decreases its scalability. Hence, a fundamental question regarding searching in unstructured systems is whether a mechanism can exhibit both properties, despite the network's dynamic and arbitrary nature.
In this thesis, we show this is indeed the case, by proposing a novel mechanism that is both scalable and reliable.
This is shown under a mathematical model that captures the evolution of both network and content in an unstructured system, but is also verified through simulations. To the best of our knowledge, this is the first provably scalable and reliable search mechanism for unstructured peer-to-peer systems.
In addition to the above problem, we also consider a hybrid peer-to-peer system, in which the peer-to-peer network co-exists with a central server. The purpose of this hybrid architecture is to reduce the server's traffic by delegating
part of it to its clients ---\emph{i.e.}, the peers:
a peer wishing to retrieve certain content first propagates a query over the peer-to-peer network, and downloads the content from the server only if the query fails. This hybrid architecture can be used to partially decentralize a content distribution server, a search engine, an online encyclopedia, etc.
The trade-off between scalability and reliability translates, in the hybrid case, to a trade-off between the peer and the server traffic loads. We propose a search mechanism under which both loads remain bounded as the peer population grows. This is surprising, and has an important implication: one can construct hybrid peer-to-peer systems that can handle traffic generated by a large (unbounded) peer population, even when both the server and peer bandwidth capacities are limited. Again, this is proved under a model capturing the hybrid system's dynamic nature and verified through simulations. To the best of our knowledge, our work is the first to show that hybrid systems with such properties exist.
Identifer | oai:union.ndltd.org:TORONTO/oai:tspace.library.utoronto.ca:1807/19196 |
Date | 01 March 2010 |
Creators | Ioannidis, Efstratios |
Contributors | Marbach, Peter |
Source Sets | University of Toronto |
Language | en_ca |
Detected Language | English |
Type | Thesis |
Page generated in 0.0016 seconds