Designing a resilient routing infrastructure for peer-to-peer networks
Peer-to-Peer (P2P) networks have enabled a new generation of large scale distributed applications. Unlike the traditional client-server model, in a P2P network, all peers in the network both contribute to and receive services from the network. Due to their decentralized and self-organizing nature, P2P networks enable tens of thousands (potentially millions) of Internet machines to form virtual communities and share the vast resources aggregated from the participating machines. In this dissertation, we address a fundamental problem in designing P2P networks: How to construct and maintain a resilient infrastructure to provide reliable, scalable, and efficient routing service for millions of Internet nodes without central service and administration? The absence of central administration, the large number of nodes involved, and the high rate of node dynamics pose great challenges to the design of a resilient routing infrastructure for P2P networks. Our work tackles the above challenges and has successfully addressed the following problems: (1) How to design protocols to maintain “consistency” of routing tables to ensure successful routing? (2) How to reason about correctness of these protocols? (3) How to evaluate the system’s ability to sustain high rates of node dynamics and how to improve this ability? In particular, we have designed a suite of protocols that construct and maintain a resilient routing infrastructure. To base the protocol design on a sound foundation, we have introduced a theoretical foundation, called C-set trees, to guide protocol design and correctness reasoning. Based on the theoretical foundation, we have designed a join protocol and developed rigorous correctness proofs for the protocol. We have also designed an efficient failure recovery protocol, which has been demonstrated by extensive simulations to perform perfect recovery even when 50% of network nodes fail. Both the join protocol and the failure recovery protocol have been integrated into a single framework following a module composition approach. Furthermore, we have conducted extensive simulation experiments to study behaviors of the designed system under different rates of node dynamics (churn experiments). We find our system to be effective, efficient, and provide reliable and scalable routing service for an average node lifetime as low as 8.3 minutes (the median lifetime measured for two deployed P2P networks, Napster and Gnutella, was 60 minutes). Based on our system design, we have implemented a prototype system, named Silk, as the routing component of a shared infrastructure for P2P networks and other large-scale distributed applications.