TOC 
Network Working GroupT. Arcieri
Internet-DraftA. Mysore
Expires: May 4, 2008G. Pahlke
 ClickCaster, Inc.
 J. Sanders
 T. Stapleton
 University of Colorado
 November 2007


Peer Distributed Transfer Protocol Specification
draft-distribustream-pdtp-rfcs-01

Status of this Memo

By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on May 4, 2008.

Copyright Notice

Copyright © The IETF Trust (2007).

Abstract

Peer Distributed Transfer Protocol is a high performance peer-to-peer file distribution system that provides streaming downloads of files originating from a central server. The files are then shared over a peer network, allowing the aggregate bandwidth of the network to scale with the number of clients.



Table of Contents

1.  Introduction
2.  Message Format
3.  Server to Client Messages
    3.1.  Overview
    3.2.  TELLINFO
    3.3.  TRANSFER
    3.4.  TELLVERIFY
    3.5.  HASHVERIFY
    3.6.  PROTOCOLERROR
4.  Client to Server Messages
    4.1.  Overview
    4.2.  REGISTER
    4.3.  REQUEST and UNREQUEST
    4.4.  PROVIDE and UNPROVIDE
    4.5.  ASKINFO
    4.6.  ASKVERIFY
    4.7.  COMPLETED
5.  Client to Client Communication
    5.1.  Overview
6.  References
Appendix A.  Glossary
§  Authors' Addresses
§  Intellectual Property and Copyright Statements




 TOC 

1.  Introduction

This document describes version 2 of PDTP, an enhancement of PDTP version 1 (Arcieri, A., “Peer Distributed Transfer Protocol,” November 2005.) [1]. PDTP currently only allows for operation on IPv4 networks, though support for IPv6 networks is planned.

PDTP is based on an extensible set of asynchronous messages. These messages are comprised of a type, and a sequence of key-value pairs. The possible data types are described in Appendix A. Due to the structure of PDTP, protocol messages can be grouped into two distinct categories:



 TOC 

2.  Message Format

All client to server (Client to Server Messages) and server to client (Server to Client Messages) communication utilizes a lightweight form of asynchronous messaging over TCP with JavaScript Object Notation (JSON) (Crockford, D., “The application/json Media Type for JavaScript Object Notation (JSON),” July 2006.) [3] providing underlying data serialization format. Messages are length-prefix framed with a JSON message body. The IANA approved TCP port for all PDTP client/server intercommunication is 6086.

Each frame consists of a 16-bit unsigned integer in network byte order representing the length of the message body, followed by the message body itself. Because the length prefix is 16-bit, the maximum allowed length of the message body is 65,535 bytes, with a maximum frame size of 65,537 bytes including the header. Messages are sent in succession over a persistent TCP connection.

Message bodies consist of JSON messages in the following format:

["type", {"arg1": "value1", "arg2": "value2", ..., "argN": "valueN"}]

Each message is composed of an outer JSON array with two members. The first member is a string which represents the message type. The second member is a JSON object which contains a message-specific collection of arguments. Unless otherwise specified, all members are represented as JSON strings.

To improve the readability of the protocol on the wire, it is recommended that CRLF be appended to the end of the JSON message. This is ignored as whitespace by JSON parsers, but meaningful to humans who may be reading wire dumps. However, this is not an explicit requirement of the protocol, nor should its presence be required by any implementation.



 TOC 

3.  Server to Client Messages



 TOC 

3.1.  Overview

These messages are sent from the server to the client to respond to information requests by the client and manage the file transfer process.



 TOC 

3.2.  TELLINFO

tell_info url [size] [chunkSize] [streaming]

The tell_info datagram is the expected response from the ask_info datagram, and contains information that the client can use to determine its chunk request policy and the manner in which a file is handled. Since the server only arranges for chunks to be sent when they are explicitly requested, a client must know how many chunks are in a file so that it knows how many to request. In order to determine the number of chunks in a file, the size of each chunk and the total size of the file are sent. The number of chunks is then the 'fileSize' divided by the 'chunkSize', rounded up to the first integer. The 'chunkSize' could also be used to determine a policy on caching or memory management and the 'fileSize' could be used to alleviate complications arising from the case of incomplete final chunks, and perhaps other things.

Fields:

url: String

A unique file identifier. The server uses this field to specify which file it is sending information about.

size: Integer (Optional)

The exact size in bytes of the file. Since the last chunk in the file might not be completely filled, it is necessary to know the total file size as well as the chunk size and the chunk count.

chunkSize: Integer (Optional)

The size in bytes of each chunk in the file.

streaming: Boolean (Optional)

If true, the file is assumed to be a streaming media file and chunks near the beginning will be sent first. If false, chunks from all over the file will have equal priority. Default is false.



 TOC 

3.3.  TRANSFER

transfer peer port method url range peer_id

The server controls all data flowing over the network. It uses the transfer message to initiate a chunk transfer between two peers. The transfer message is always sent to the peer that should initiate the connection to the other peer.

When a client receives a transfer message, it should connect to the specified peer and carry out the transfer as specified in Section 4 (Client to Client Communication) and reply to the server with a completed (COMPLETED) message upon success.

Fields:

peer: String

This is the network address of the peer with which the transfer should take place.

port: Integer

The port on the peer to connect to.

method: Enumeration

The HTTP method to be used for this transfer. Possible values are GET and PUT, with the same semantics as in the HTTP protocol. That is, if method is GET, this client is receiving data from a peer, and if method is PUT, this client is sending data to a peer.

url: String

This is a unique file identifier for the file to transfer.

range: Range

This is the byte range being transferred. This, combined with the url, provides a unique identifier for the data to transfer.

peer_id: Integer

The unique id of the peer in this transfer.



 TOC 

3.4.  TELLVERIFY

tell_verify peer url range peer_id authorized

The tell_verify datagram is sent to the client in response to an ask_verify (ASKVERIFY) message to inform it of the authorization status of a transfer.

Fields:

peer: IP Address

The address of the connecting peer.

url: String

A unique file identifier.

range: Range

The byte range to verify.

peer_id: String

The unique identifier of the peer.

authorized: Boolean

A boolean value specifying whether or not the transfer is authorized.



 TOC 

3.5.  HASHVERIFY

hash_verify url range hash_ok

The hash_verify message is sent to a client after a completed message with a hash has been received. The value of hash_ok denotes whether or not the hash sent in the completed message matches the expected hash of the chunk containing the specified range.

Fields:

url: String

The url of the file being transferred.

range: Range

The byte range of the file, which this message refers to.

hash_ok: Boolean

If true, the has sent in the completed message matched the expected hash of the byte range it referred to.



 TOC 

3.6.  PROTOCOLERROR

protocol_error message

The protocol_error message is sent to a client whenever a protocol error has occurred. The message field is meant to be read by a human to determine what went wrong. In most cases, the error is due to a programming error and is fatal. One notable exception to this is the case when a client generates a Peer Id that is not unique.

Fields:

message: String

An error message. This error message is not meant to be parsed programmatically, but rather to be logged and read as an aid to debugging.



 TOC 

4.  Client to Server Messages



 TOC 

4.1.  Overview

These messages are sent from the client to the server to request files, inform the server of completed transfers and handle files the client can provide to the network.



 TOC 

4.2.  REGISTER

register client_id listen_port

The register message is the first message a client sends to the server. The client must send this message to alert the server of its presence so that it may be included in the server's actions.

Fields:

client_id: String

A unique identifier, which the client generates. The server will identify this client on the network based on this identifier. The identifier must be a string less than 4 KB in length. It is the client's responsibility to generate an identifier that is unique. The server should respond with a protocol_error (PROTOCOLERROR) message if the identifier is too long or not unique.

listen_port: Integer

The port on which the client will listen for incoming connections from other peers. The server will pass this information along to other clients wishing to connect to this client for peer to peer chunk transfers.



 TOC 

4.3.  REQUEST and UNREQUEST

request url [range] unrequest url [range]

The request and unrequest messages are used by the client to indicate what data it needs to receive. A request datagram indicates that the client needs the specified object bytes; an unrequest datagram indicates that the client no longer needs the specified bytes. The completed and provide messages may also be used to cancel a request, though they carry additional semantics.

If the range of a request or unrequest message isn't specified, it is assumed to include all bytes in the file.

The server is expected to continuously arrange for the data in the client's request set to be delivered to the client through peer to peer transfer channels. A request set is informally the set of chunks that a client wants. Formally, we may say that a given unique chunk C with the URL L and the id K is in a client's request set if and only if:

In short, requests are standing and additive; unrequests are transient and subtractive. The server is expected to handle any possible combination of requests and unrequests that clients can send. (For example, a client may request a URL in its entirety, and then later unrequest certain parts of the URL. This is useful if, say, a client needs to complete a partial download, repair a damaged file, or optimize its network usage in response to user actions.)

Fields:

url: String

The object being requested. The server may discard requests for URLs it does not understand.

range: Range (Optional)

The range of bytes being requested, inclusive. If unspecified, it is taken to be the complete range of bytes in the file.



 TOC 

4.4.  PROVIDE and UNPROVIDE

provide url [range] unprovide url [range]

The provide and unprovide datagrams are used to tell the server which chunks of a file it is able to provide to peers. Each client in the system can have a cache of file chunks that it has already downloaded and depending on the client's individual capabilities, this cache may or may not be empty on startup. If a client has data in a cache on startup, the provide message can be used to inform the server. If the cache is empty on startup, the provide message is never needed because the server is expected to keep track of each chunk that has been transferred. On the other hand, if a client evicts one or more chunks from its cache, it should immediately send an unprovide message. Although there is no firm requirement on a client to send appropriate unprovide datagrams, it is in the client's best interest to do so, as it could lose standing in the network if it were asked to send a chunk it did not have.

A client can send as many provide messages as necessary to inform the server of its entire chunk cache. These messages should be interpreted by the server additively. That is, if a client first sends a message specifying that it provides byte range A to B and then another specifying that it provides byte range C to D, the server should conclude that it has byte range A to B and byte range C to D. Furthermore, a client can send provide and unprovide messages for multiple files, as specified by the url field.

If the range in a provide or unprovide message isn't specified, it is taken to include the entire range of bytes in the file specified by the url field.

Fields:

url: String

A unique file identifier. The client uses this field to specify a file in its cache.

range: Range (Optional)

The range of relevant bytes, inclusive. If unspecified, it is taken to be the complete range of chunks in the file.



 TOC 

4.5.  ASKINFO

ask_info url

The ask_info datagram is sent to the server when a client wants information about a file. Specifically, the client needs information about the file type, the file size, the number of chunks, and the size of each chunk in order to successfully receive a file. This information is used to determine when a file is finished transferring and how to handle a file, among other things.

It is expected that the server will always respond to an ask_info datagram with a tell_info (TELLINFO) datagram.

Fields:

url: String

A unique file identifier. The client uses this field to specify which file it would like information about.



 TOC 

4.6.  ASKVERIFY

ask_verify peer url range peer_id

The ask_verify datagram is sent to the server when a client wants to know whether a transfer is authorized. This is sent upon receipt of a put or get from a peer.

It is expected that the server will always respond to an ask_verify datagram with a tell_verify (TELLVERIFY) datagram.

Fields:

peer: String

The network address of the connecting peer.

url: String

A unique file identifier. The client uses this field to specify which file it would like information about.

range: Range

The range of bytes in the file referred to by the url for which we are asking verification.

peer_id: String

The unique identifier of the peer.



 TOC 

4.7.  COMPLETED

completed peer url range peer_id [hash]

The completed message is used by the client to indicate that a transfer has completed in either success or failure. Upon success, a completed message will include a hash of the chunk that it has received. The lack of a hash field denotes transfer failure. The server may use this information to inform its network optimization, so the appropriate use of completed messages is highly recommended. (Specifically, it is likely that the server will initiate another transfer after it has been informed that one has completed.)

Fields:

peer: String

The network address of the peer associated with this transfer.

url: String

The url of the transferred chunk.

range: Range

The byte range of the completed transfer.

peer_id: String

A unique identifier of the peer associated with the transfer.

hash: String (Optional)

A hash of the transferred chunk. Denotes failure if blank.



 TOC 

5.  Client to Client Communication



 TOC 

5.1.  Overview

All client to client communication is done using the HTTP 1.1 (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Fielding, P., and T. Berners-Lee, “Hypertext Transfer Protocol - HTTP/1.1,” June 1999.) [2] protocol. Each client is both an HTTP client and server. During a transfer, the connecting peer for that particular transfer acts as the client and the listening peer acts as the server. While full HTTP functionality may be implemented and implementation-specifically desirable, only the subset of the HTTP protocol that includes the GET and PUT requests and the full range of responses is strictly required for the PDTP protocol.

The listening client is truly an HTTP server and the connecting client is truly an HTTP client. By this we mean simply that not only the syntax but also the semantics of the GET and PUT requests and any responses received are identical to that of the HTTP protocol.

A connecting peer will include the following information in the request sent to a listening peer:

Upon receiving a request from a connecting peer, a listening peer processes the request in an appropriate manner and sends an appropriate response. The listening client may choose to ask the server for verification of this transfer using the ask_verify (ASKVERIFY) message. If verification fails, a 403 Forbidden response should be sent. Similarly, if the file cannot be found, or the range is unsatisfiable, the 404 File Not Found or 416 Requested Range Unsatisfiable responses should be sent. Upon success, either 206 Partial Content, 200 OK, or 201 Created should be sent, depending on the situation.



 TOC 

6. References

[1] Arcieri, A., “Peer Distributed Transfer Protocol,” November 2005.
[2] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Fielding, P., and T. Berners-Lee, “Hypertext Transfer Protocol - HTTP/1.1,” RFC 2616, June 1999.
[3] Crockford, D., “The application/json Media Type for JavaScript Object Notation (JSON),” RFC 4627, July 2006.


 TOC 

Appendix A.  Glossary



 TOC 

Authors' Addresses

  Tony Arcieri
  ClickCaster, Inc.
Email:  tony@clickcaster.com
  
  Ashvin Mysore
  ClickCaster, Inc.
Email:  ashvin@clickcaster.com
  
  Galen Pahlke
  ClickCaster, Inc.
Email:  galen@clickcaster.com
  
  James Sanders
  University of Colorado
Email:  sanderjd@gmail.com
  
  Tom Stapleton
  University of Colorado
Email:  tstapleton@gmail.com


 TOC 

Full Copyright Statement

Intellectual Property

Acknowledgment