BEP:45
Title:Multiple-address operation for the BitTorrent DHT
Version: 9c5c1dd1b372016e05af84fb34fccac6752ef54a
Last-Modified:Thu Jul 21 10:45:38 2016 -0400
Author: The 8472 <the8472.bep@infinite-source.de>
Status: Draft
Type:Informational
Content-Type:text/x-rst
Created:26-Jun-2015
Post-History:

Multiple-address operation for the BitTorrent DHT

This BEP provides implementation advice for clients that wish to operate a DHT node (BEP 5) on a host that has multiple globally routable unicast addresses.

In large parts it is based on practical experience operating such a node with 256+ IPv4 and IPv6 addresses.

Nomenclature

socket address: in this document generally refers to a socket bound to a distinct, specific address and port and thus is used interchangeably with <IP:Port> pair and the DHT state/behavior associated with it.

node: a DHT peer, which uses one or more socket addresses. Remote nodes are assumed to be interchangeable with a single socket address on their end.

Motivation

A node operator might choose to run a DHT node on multiple addresses for the purpose of source-specific multi-path routing [1] or for research purposes to sample a larger fraction of DHT traffic.

Node Behavior

In general any DHT node using multiple socket addresses should appear to the outside world as separate, independent DHT nodes. Exceptions to this rule will be outlined in a later section.

This spares other implementations from having to deal with any idiosyncrasies that might otherwise arise from a single node sending from multiple socket addresses.

The easiest approach to achieve this is to simply run multiple independent DHT nodes, each bound to a separate address and, if needed, perform separate announces through each of them.

Mandatory

The most important constraints are as follows:

Each socket address must have a distinct Node ID.

Rationale: To an external observer shared node IDs would look like a node constantly changing its IPs (e.g. unreliable domestic internet connection or a spoofing attack) or port (pathological NAT implementations) and thus deem them as unreliable and evict them from their routing table.

Node IDs should not only be distinct but also vastly separated by XOR distance, i.e. differ in some of their high order bits. This can be most easily achieved by generating a random node ID for each socket address. Alternatively an implementation can derive new IDs from a base ID by incrementing the base ID in reverse bit order, i.e. by flipping the high order bits first. Care should be taken to avoid counter reuse across multiple active sockets.

Rationale: Failure of a multi-homed node whose IDs only differ in low-order bits could take a small slice of the keyspace with it into oblivion, thus violating redundancy of the DHT.

Responses must be sent from the same socket address on which the corresponding request was received.

Write-tokens must only be sent from the same socket on which they were received.

Implementations should avoid querying the same targets from multiple sockets at once as it may cause unnecessary load spikes for remote nodes. This becomes more important as the number of socket addresses increases. Temporally spacing lookups to the same target keys may accomplish this.

Implementation Guidance

Determining Connectivity

Implementations may want to apply some heuristic based on responses and incoming unsolicited requests to determine whether individual socket addresses are reachable. This information can be used to ignore timeouts which occur on non-connected socket addresses for the purpose of routing table maintenance. In other words routing table entries should not be penalized for timeouts on a link known to be down when they are still reachable over other links.

Additionally nodes may preferentially schedule data lookups on known-good socket addresses while performing maintenance pings on all socket addresses - including bad ones - in the hope that they eventually become reachable again.

Shared Routing Table

BEP 5 describes a fixed-size routing table of 160 buckets which are ordered relative to a Node ID. Since a multi-address node has multiple IDs it will have to use one routing table for each of its socket addresses.

Alternatively a multi-address node may implement a modified version of the variable-size, tree-like routing table, ordered by the natural distance of the buckets, as described in the original Kademlia paper [2]. Careful study of the paper, especially the bucket splitting algorithms, is recommended since they are more complex than the naive approach of BEP 5. The only necessary modification to make it work in a multi-address environment is to consider multiple home buckets - each corresponding to a node ID - on which split operations may be performed instead of one.

A node may use the shared routing table to

  • return better node lists to incoming queries

  • spread traffic caused by active routing table maintenance over multiple sockets

  • slightly accelerate lookups by having a finer-grained routing table due to increased bucket splitting

  • verify that remote nodes' IDs are observer-independent, i.e. that they present the same ID to everyone. Remote nodes that present different IDs depending on who contacts them either are buggy or they are executing an attack attempting to insert themselves into the local routing table.

    when receiving a valid response, verified through its transaction ID, from a remote node whose socket address is found in the routing table but whose ID does not match the entry in the routing table then this entry may be evicted immediately.

When performing maintenance lookups which target a local ID or home bucket - e.g. bootstrap lookups - then those should be performed on the socket address corresponding to that particular target ID. This is necessary to make the presence of this particular socket address known to immediate neighbors in the keyspace.

Other types of maintenance traffic can be distributed freely among socket addresses.

Shared Storage

It makes little difference whether storage for incoming announce_peer, put and similar write requests is shared among multiple socket addresses or not. It is unlikely that remote nodes will query for a specific key on anything but the socket address closest to it.

Multi addresses announce_peer

Since remote nodes may restrict the validity of write tokens to the IP address to which they have been issued, a node cannot simply announce to other nodes from multiple socket addresses. It has to perform new lookups to obtain new tokens for each of them.

But it can reuse the set of responding nodes from a lookup on one socket address as candidate set for a lookup on another socket address, that may significantly accelerate a lookup.

As mentioned earlier, individual lookups should be performed with some delay between each to avoid hammering the same nodes from multiple sockets.

Of course the announced addresses should be consistent with the ones on which the underlying bittorrent client can accept connections. The easiest way to achieve that is to have the bittorrent client listen for peer connections on the unspecified address.

Other write operations

Other types of write operations, such as BEP 44 put may not benefit from being performed on more than one socket address at a given time.

It is therefore sufficient to schedule them on a randomly selected (active) socket address each time they are to be performed.

User Guidance

The traffic a DHT node receives is roughly proportional to its number of socket addresses and also grows with uptime, especially once reaching 24 hours of uptime.

At some point various stateful network components (i.e. firewalls, NATs) may become overwhelmed by the UDP traffic flowing to potentially millions of different remote addresses and lead to packet drops.

To prevent these issues they should be put into a stateless configuration, at least for any traffic flowing from and to the local ports used by the DHT node.

For firewalls this requires exempting those ports from connection tracking, for NATs additional static forwarding rules may be required.

References

[1]"Source-specific routing", Boutier, Chroboczek (http://arxiv.org/pdf/1403.0445.pdf)
[2]"Kademlia: A peer-to-peer information system based on the xor metric", Maymounkov, Mazieres (http://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf)