Peer To Peer Network in Distributed Systems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 88
At a glance
Powered by AI
The key takeaways are about peer-to-peer networks, incentives in P2P systems, and different types of P2P overlay networks like structured and unstructured.

Structured overlay networks impose structure on peer connections to enable efficient search, while unstructured networks have arbitrary connections between peers.

Some examples of centralized P2P systems are Napster and Gnutella. Pure P2P examples are BitTorrent and Kademlia. Hybrid examples include Skype and eMule.

Peer-to-Peer & Incentives

Presented By:
Rahul Batra (MT13049)
Juhi Pruthi (MT13097)
Kritika Anand (MT13039)
Pooja Aggarwal (MT13072)
Pooja Gupta (MT13011)
Outline
P2P
 Introduction
 Incentives
 Features
 DHT
 Structured overlay networks
 Unstructured overlay networks
– Centralized Directory based P2P systems
– Pure P2P systems
– Hybrid P2P systems
 Napster
 Gnutella
 Kazaa
 Bit Torrent
 Chord Protocol
 CAN
Introduction
What is P2P?
 A peer-to-peer network is a large network of nodes,
called peers, that agree to cooperate in order to achieve a
particular task.

What is P2P overlay network?


 Any overlay network that is constructed by the Internet
peers in the application layer on top of the IP network.
P2P Vs Client/Server
Peer-to-Peer (P2P) Model Client Server Model

Decentralized form of Centralized form of networking


networking architecture architecture
Has an "everyone pulls their Has a "make a request and it will
own weight" sort of be granted" sort of relationship.
relationship . The clients make requests to the
server to access the resources.
All peers in a network can Clients requests for resources
request for resources as well which are provided by the Server.
as grant them.
Used in P2P file sharing Examples include Email, HTTP
programs like Napster. protocol.
P2P : Pros & Cons
Advantages Disadvantages
 Adding more members to  P2P networks have high
the system, increases the bandwidth consumption
resources of the system and rates, due to multiple request
thus increases throughput. and responses taking place at
Such networks also scale the same time from different
better, as increase in peers
members increases efficiency.
 Very robust as there is no  Lack of security as anyone
single point of failure can send and receive data
from anybody
 Operation and set up is
easier and cheaper as
machines are independent
Incentives in P2P
 Peer-to-peer systems rely on cooperation among self
interested users.
 Users have natural disincentives to cooperate
 As a result, each user’s attempt to maximize her own utility
effectively lowers the overall utility of the system.
 Avoiding this requires incentives for cooperation.
 A game-theoretic approach is adopted to address this
problem. In particular, a prisoners’ dilemma model is used.
Challenges in P2P
 Large populations : A file sharing system such as Gnutella
and KaZaa can exceed 100; 000 simultaneous users
 Asymmetry of interest: Asymmetric transactions of P2P
systems create the possibility for asymmetry of interest.

 Zero-cost identity: P2P systems allow peers to


continuously switch identities (i.e., whitewash).
Model
 Social Dilemma:
– Universal cooperation should result in optimal overall
utility
– Peers should benefit exploiting the cooperation of
others
 Asymmetric Transactions:
– A peer want service from another peer
– Not able to provide the service that the second peer
wants
 Untraceable defections:
– defections should be transparent
 Dynamic Population:
– Peers change the behavior
– Enter and leave the system as they please
– Should be able to change their strategy
Prisoner’s Dilemma
 One of the earliest “games” developed in game theory.
 Method of studying the issues of conflict vs. cooperation
between individuals.
 Each game consists of two players who can defect or
cooperate.
 Depending on how each acts, the players receive a payoff.
 The players use a strategy to decide how to act.
Prisoner’s Dilemma

• R:

Payoff matrix for the Generalized Prisoner’s


T>R>P>S
Dilemma.

 One player is the client and one player is the server in each
game,
 Client Always to cooperate
 Server chooses whether to Cooperate or Defect based on
previous experience with the client
 Client cannot trace defections
Population Dynamics
 Entities take independent and continuous actions that
change the composition of the population.
• In each round of the game, each player plays one game as
client and one game as server
• At the end of each round, players can:
– Mutate
– Learn
– Turnover
– Stay the same
Population Dynamics
 Let s : running average of the performance of a player’s
current strategy per round
age : number of rounds she has been using the strategy.
 A strategy’s rating is

 At the end of a round, a player switches to highest rated


strategy with a probability proportional to the difference in
score between her current strategy and the highest rated
strategy.
Strategy
• A strategy consists of:
– Decision function
– Private or shared history
– Server selection mechanism
– Stranger policy
• Examples of Strategies:
– 100% Cooperate
– 100% Defect
• We want to strike a balance between the two:
– Cooperate with good peers
– Defect bad peers
Decision Function
• Maps from a history of a player’s actions to a decision
whether to cooperate with or defect on that player.
• For peer i:
– pi is a measure of services it has provides
– ci is a measure of services it has consumed
• i’s generosity, g(i) = pi/ci
– Discriminates against peers that have consumed more
than they have provided.
• Normalized generosity: gj(i) = g(i)/g(j)
– Servers treat peers with respect to their own
generosity
• Reciprocative decision function :
─Cooperate with i with Probability min(gj(i), 1)
Discriminating Server Selection
 Cooperation requires familiarity between entities
 The large populations and high turnover makes it less likely
 Reciprocative decision function can scale to large
populations by using a private history and discriminating
server selection
 Every player holds lists of both past donors and past
recipients, and selects a server from one of these lists at
random with equal probabilities
Shared History
 Shared history allows the contribution of a player be
noticed by entire population.
 It results in a higher level of cooperation than with private
history.
 In Figure 1, if everyone keeps only private history, no one
will provide service because B does not know that A has
served C, etc.
Shared History
 The cost of shared history is a distributed infrastructure
(e.g., distributed hash table-based storage) to store the
history.
 Address the challenges:
1. Large population
2. Asymmetry of interest

 Vulnerable to collusion
Shared History Attack: Collusion
• Group of peers lie about transactions
• Positive (bad peer claims another bad peer cooperated)
• Negative (bad peer claims that good peer defected)
• Bad for objective (trust everyone) reputation
• Need subjective reputation mechanism
Shared History Attack: Collusion
Maxflow-based Subjective Reputation:
 In the example in Figure 1, C can falsely claim that A served
him, thus deceiving B into providing service.
 A maxflow-based algorithm promotes cooperation despite
collusion among 1/3 of the population.
 Basic idea : B would only believe C if C had already
provided service to B.
 The cost is its O(V ^3) running time, where V is the number
of nodes in the system.
Adaptive Stranger Policy
Zero-cost identities :
– Allows noncooperating peers to escape the consequences
of not cooperating
– Destroys cooperation in the system
 In practice, peers serve all strangers with constant
probability .
 Whitewashing can be nearly eliminated from the system
using adaptive stranger policy.
Adaptive Stranger Policy
 Let ps and cs be the number of services that
strangers have provided and consumed,
respectively.
 A player using “Stranger Adaptive” helps a stranger
with probability r=ps/cs
 And update :
r = (ps+1)/cs, if the stranger provided service
r = ps/(cs+1), if the stranger consumed service
Traitors
 players who acquire high reputation scores by
cooperating for a while, and then traitorously turn into
defectors before leaving the system
– users who turn deliberately to gain a higher score
– cooperators whose identities have been stolen and
exploited by defectors
 Long-term history exacerbates this problem by allowing
peers with many previous transactions to exploit that
history for many new transactions.
 Short-term history prevents traitors from disrupting
cooperation.
P2P Features
 Efficient use of resources
 Scalability
- Consumers of resources also donate resources
 Reliability
- No single point of failure
-Redundant data source
 Ease of deployment and administration
-Nodes are self-organized
-No need to deploy servers to satisfy demand
-Built-in fault tolerance, replication, and load balancing
What is DHT ?

 DHT provides the information look up service for P2P


applications similar to Hash Table.

 (key, value) pairs are stored in a DHT.

 Nodes form an overlay network

 Nodes maintain list of neighbors in routing table

 any participating node can efficiently retrieve the value


associated with a given key.
DHT
Classification
 Structured overlay networks
- Based on Distributed Hash Tables (DHT)
- Assigns keys to data items and organizes its peers into
graph that maps each data key to a peer.
 Unstructured overlay networks
-Organizes peers in a random graph in flat or
hierarchical manners.
Stuctured Overlay Networks
 Based on NodeID that is generated using Distributed Hash
Tables (DHT).
 Enables efficient discovery of data items using the given
keys.
 Storing the objects in the networks is based on key-value
pair.
 Examples: Content Addressable Network (CAN), Chord,
Pastry.
Unstructured Overlay Networks
 Composed of peers joining the network with some loose
rules, without any prior knowledge of the topology.
 Network uses flooding or random walks as the mechanism
to send queries across the overlay.
 When a peer receives the flood query, it sends a list of all
content matching the query to the originating peer.
 Examples: Gnutella,KaZaA, BitTorrent
Unstructured P2P File Sharing Systems
1. Centralized Directory based P2P systems
– All peers are connected to central entity
– Peers establish connections between each other on
demand to exchange data

– Central entity is necessary to provide the service


– Central entity is some kind of index/group database
– Central entity is lookup/routing table
– Examples: Napster, Bittorent
Unstructured P2P File Sharing Systems

2. Pure P2P systems


– Any terminal entity can be removed without loss of
functionality
– No central entities employed in the overlay

– Peers establish connections between each other


randomly
– Examples: Gnutella
Unstructured P2P File Sharing Systems
3. Hybrid P2P systems
– Main characteristic, compared to pure P2P:
Introduction of another dynamic hierarchical layer
– Election process to select and assign Group Leader
– Group Leader : high degree (degree>>20)
– Leafnodes: connected to one or more Group
Leaders(degree<7)
– Examples: KaZaA
Framework
 Common Primitives:
 Join: how to begin participating
 Publish: how to advertise a file
 Search: how to find a file
 Fetch: how to retrieve a file

Central Flood Supernode Flood

Whole Napster Gnutella


File
Chunk Bit Torrent KazaA
Based
Napster: Overview
• Centralized Database:
– Join: on startup, client
contacts central server Bob
– Publish: reports list of centralized
directory server
files to central server 1
– Search: queries the central peers

server and gets the 1

address of peer that


3
stores the requested file 1

– Fetch: get the file directly from 2 1


peer

Alice
Napster: Working

insert(X,
123.2.21.23)
...

Publish

I have these files !


123.2.21.23
Napster: Working

123.2.0.18

search(A)
-->
Fetch 123.2.0.18

Query
Reply

Where is file A?
Napster: Pros & Cons
• Pros:
– Simple
– Search scope is O(1)

• Cons:
– Server maintains lot of state
– Performance Bottlenecks
– Single point of failure
Gnutella: Overview
• Query Flooding:
- Join: on startup, client contacts a few other nodes;
these become its “neighbors”
• Ping-Pong protocol
– Publish: no need
– Search: send queries to its neighbors, who ask their
neighbors, and so on...
– Fetch: get the file directly from peer
 Features:
- No central server
- Constrained broadcast:
-Every peer sends packets it receives to all of its
peers
- Lifetime of packets limited by time -to-live
-Packets have unique ids to detect loops
Gnutella: Working

I have file A. I have file A.

Reply

Query

Where is file A?
Protocol Message Types
Type Description Information
Ping Announces None
availability
Pong Response to a ping IP address and port
no. of responding
peer;
number and total KB
of files shared
Query Search request Search criteria
QueryHit Returned by peers IP address,port no.
that have and network
requested file bandwidth of
responding peer;
Number of results
and resultsets
Communication Model
Gnutella: Pros & Cons
• Pros:
– Fully de-centralized
– Search cost distributed
– Simple,Robust and Scalable

• Cons:
– Search scope is O(N)
– Search time is more
KaZaA: Overview
• “Smart” Query Flooding:
– Join: on startup, client contacts a group leader
– Publish: sends list of files to group leader
– Search: sends query to group leader,group leader
floods query amongst themselves.
– Fetch: gets the file directly from peer(s); can fetch
simultaneously from multiple peers
 Features
- Each peer is either a group leader or assigned to a group
leader:
-TCP connection between peer and its group leader.
-TCP connections between some pairs of group
leaders.
-Group leader tracks the content in all its children.
KaZaA: Overview

ordinary peer

group-leader peer

neighoring relationships
in overlay network
KaZaA: Working
• Each file has a hash and a descriptor
• Client sends keyword query to its group leader
• Group leader responds with matches:
- Match Contains: metadata, hash, IP address
• If group leader forwards query to other group leaders,
they respond with matches
• Client then selects files for downloading:
- HTTP requests using hash as identifier is sent to peers
holding desired file
• Group Leader selection is time-based
– Node working safely for quite long is good option for
selection
KaZaA: Working

insert(X,
123.2.21.23)
...cc

Publish

I have X!

123.2.21.23
KaZaA: Working
search(A)
-->
123.2.22.50

search(A)
-->
123.2.22.50
123.2.0.18
Query Replies

Where is file A?

123.2.0.18
KaZaA: Pros & Cons
• Pros:
– Tries to take into account node heterogeneity:
• Bandwidth
• Host Computational Resources
• Host Availability
• Cons:
- No real guarantees on search scope or search time
- Limitations on simultaneous uploads
- Request queuing
- Incentive priorities
- Parallel downloading
What is BitTorrent?

 Distributed File Sharing System


– Enables sharing of large files in a faster manner.
– Insanely popular, generating 35-70% of Internet traffic.

 Uses P2P file swarming


– A technique also used by many download managers
like(IDM)
• Divides large file into many pieces
– ‘Replicate ‘different pieces on different peers
– Allows simultaneous exchange of pieces.
BitTorrent

 Alternative to downloading from single machine

 On receiving a piece what does a peer does?


– Not only downloads
– Also uploads.

Works well in networks with low bandwidth.


.torrent File
 Contains all data related to a torrent

NAME

Hashing
LENGTH .torrent
info

URL of
tracker
.torrent file

 Pieces are typically downloaded non-sequentially


– Fixed size chunks, ~256kbytes
– .torrent contains the size and SHA-1 hash of each piece

 Basically, a file is shared among two types of peers


– Seeder
– Leecher
Torrent Trackers

A BitTorrent tracker is a server responsible for helping peers


find each other using the BitTorrent protocol.

Trackers are of two types:


– Public/Open: used by anyone by adding tracker’s address
to existing torrent like Pirate Bay
– Private: is used to provide restricted access, by requiring
users to register
Operation of Trackers

 Peers communicate with the tracker to


– initiate downloads
– Inform periodically about its presence
– Provide network performance statictics

• Tracker selects peers for downloading


– Returns a random set of peers
– Including their IP addresses
– So the new peer knows who to contact for data
BitTorrent: Overview

Tracker

Swarm

Seeder

Leechers
Sharing Pieces

Initial Seeder

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Leecher
Seeder Leecher
Seeder 55
The Beauty of BitTorrent
 More leechers = more replicas of pieces

 More replicas = faster downloads


– Multiple, redundant sources for each piece

 Even while downloading, leechers take load off the


seed(s)
– Great for content distribution
– Cost is shared among the swarm

56
BitTorrent: Piece Selection
As File is transferred in form of chunks.
 Selecting pieces to download in a good order is very
important for good performance.
 Poor piece selection algorithm can result in having:
---Little to share with each other
---Limiting the scalability of the system
• Problem: eventually nobody has rare chunks
– E.g., the chunks required are ones which are the end of
file
– Limiting the ability to complete a download
Piece Selection Policies
 Strict Priority
– First Priority
• Rarest First
– General rules
• Random First Piece
– Special case, at the beginning
• Endgame Mode
– Special case
Download Phases

0%  Bootstrap: random selection


 Initially, you have no pieces to trade
 Essentially, beg for free pieces at random
 Steady-state: rarest piece first
% Downloaded

 Ensures that common pieces are saved for last


 Endgame
 Simultaneously request final pieces from multiple peers
 Cancel connections to slow peers
 Ensures that final pieces arrive quickly

100%
59
BitTorrent: Rarest Chunk First
• Which chunks to request first?
– The chunk with the fewest available copies
– I.e., the rarest chunk first
• Benefits to the peer
– Avoid starvation when some peers depart
• Benefits to the system
– Avoid starvation across all peers wanting a file
– Balance load by equalizing # of copies of chunks
Upload and Download Control
 How does each peer decide who to trade with?

 Incentive mechanism
– Based on tit-for-tat, game theory
– “If you give a piece to me, I’ll give a piece to you”
– “If you screw me over, you get nothing”
– Two mechanisms: choking and optimistic unchoke
Choking Algorithm
• Choking is a temporary refusal to upload
• Each peer unchokes a fixed number of
peers(default 4)

• Reasons for choking:


– Avoid free riders
– Network congestion
– Contribute to “useful” peers
Optimistic Unchoke
• A BitTorrent peer has a single “optimistic unchoke”
which uploads regardless of the current download
rate from it. This peer rotates every 30s

• Reasons:
– To discover whether currently unused connections are
better than the ones being used
– To provide minimal service to new peers
Bit-Torrent: Preventing Free-Riding

• Peer has limited upload bandwidth


– And must share it among multiple peers
 Prioritizing the upload bandwidth
– Favor neighbors that are uploading at highest rate
 Rewarding the top four neighbors
– Measure download bit rates from each neighbor
– Reciprocates by sending to the top four peers
– Recompute and reallocate every 10 seconds
• Optimistic unchoking
– Randomly try a new neighbor every 30 seconds
– So new neighbor has a chance to be a better partner
DHTs: Main Idea

N1 N2
N3
Client
Publisher N4 Lookup(H(audio data))
Key=H(audio data)
Value={artist,
album N6 N7 N8
title,
track title}
N9

65
Structure Of DHT

• key = Hash(filename) using….. for ex-SHA-1


• put(key, data)
• get(key) -> data
Properties of DHT

 Autonomy and Decentralization

 Fault tolerance

 Scalability
One node connect with O(log n) other nodes

Different strategies

Chord: constructing a distributed hash table

CAN: Routing in a d-dimensional space


Many more…
Problems Of P2P

• Main Task-Given a key, it maps the key onto a node.

• Addresses the following peer to peer problem

– Decentralization

– Availability

• SOLUTION-Scalable peer to peer lookup protocol


Chord Protocol

• Each node and key assigns an m-bit identifier using SHA


as base hash function

• Identifiers are ordered on an identifier circle modulo 2m.

• Identifiers are represented as a circle of numbers from


0 -2m-1.

• Identifier circle=chord ring

• successor(k)
 first node clockwise from k
Identifier Circle

An Identifier circle consisting of 10 nodes storing 5 keys


Chord Basic Lookup

It shows an example in which node 8 performs a lookup for key 54.


Node 8 invokes find successor for key 54 which eventually returns the
successor of that key, node 56. The query visits every node on the
circle between nodes 8 and 56.
Chord Finger Table

• The lookup scheme presented in the previous section


uses a number of messages linear in the number of
nodes
• Finger table
– Each node n maintains a routing table with up to
m entries.
– ith entry in table at node n contains
successor(n+2i-1 )mod2m
– first finger of n is the immediate successor of n
on the circle

• the number of node that must be contacted to find a


successor in an N-node network is O(logN).
Variables for Node
Lookup Using Finger Table

FingerTable entries for node 8


Lookup Using Finger Table

b)The path of a query for key 54 starting at node 8


Node Join
• Stabilization Protocol
– To ensure that each node’s successor
pointer is up to date.

– each node runs this protocol periodically in


the background

– updates Chord’s finger tables and successor


pointers.

• Join()
• Create()
• Stabilize()
• Fix Fingers()
• Check_Predecessor()
Node Join

Example illustrating the join operation. Node 26 joins the system between
nodes 21 and 32. The arcs represent the successor relationship.
(a) Initial state:node 21 points to node 32;
(b) node 26 finds its successor (i.e., node 32) and points to it;
Node Join

c) node 26 copies all keys less than 26 from node 32;


(d) the stabilize procedure updates the successor of node 21 to node 26.
CAN
• CAN is a distributed system that maps keys onto values

• Keys hashed into d dimensional space

• Interface:
insert(key, value)
retrieve(key)

• Entire space is partitioned amongst all the nodes.

• Each CAN node stores a chunk(called zone) of the entire


hash table.

• A node stores information about its adjacent zones in the


table.
Overview

y
State of the system at time t
Peer

Resource

Zone
x
In this 2 dimensional space a key is mapped to a point (x,y)
Design Of CAN

• Routing

• Construction

• Maintenance
Routing in CAN

• A node maintains CAN routing table that


hold
-IP address of neighbours
-Cordinate zone of neighbour

• In d-dimensional cordinate space 2 nodes are


neighbour if their cordinate spans overlap in
(d-1 ) dimensions and abut along one 1
dimension.

• Storing a (key,value) pair:

• Retrieving the data corresponding to key k:


CAN Construction

Example: A 2-D space partitioned into 7 CAN zones


CAN Construction
Node Operations
(e.g., Insertion)

1. Find a bootstrap node


first
CAN Construction
2. Randomly
choose a point
in the CAN
plane and route
the new node
from the
bootstrap node
to the chosen
location
CAN Construction

3. The new node arrives at


the destination zone
covering that point. The
destination zone is split
into two zones, each
controlled by one node
(old and new)
CAN Construction
4. Update the
neighborhood zone
routing information
Maintenance
 Use zone takeover in case of failure or leaving of a node

 Send your neighbor table to neighbors to inform that you


are alive at discrete time interval t

 If your neighbor does not send alive in time t, takeover its


zone

 Zone reassignment is needed

You might also like