Decode contents of bencoded .torrent files

Recently I logged into my old and long forgotten phpclasses.org account and noticed that the BDecode class, which I wrote back in 2006, is to my surprise still getting downloads. Due to its age, the code doesn't look as good as it used to, so I decided to rewrite it from scratch and post it here hoping that there is still anyone out there who requires to decode bencoded data.

The class supports all versions of PHP5 and does not depend on any external libraries.

What is Bencode?

Here is what Wikipedia has to say about it: "Bencode (pronounced like B encode) is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.

It supports four different types of values: byte strings, integers, lists, and dictionaries (associative arrays). Bencoding is most commonly used in torrent files. These metadata files are simply bencoded dictionaries.

While less efficient than a pure binary encoding, bencoding is simple and (because numbers are encoded in decimal notation) is unaffected by endianness, which is important for a cross-platform application like BitTorrent. It is also fairly flexible, as long as applications ignore unexpected dictionary keys, so that new ones can be added without creating incompatibilities."

Here are some useful links for those who want to learn more about the protocol:

Code

This class allows you to read contents of bencoded files and output them as multi-dimensional arrays. You can load contents of local or remote files (as the class utilizes file_get_contents to fetch the data) or inject bencoded contents directly into it.

Here is the source code:

  1. <?php
  2.  
  3. class BDecode
  4. {
  5.  
  6.     // entry types
  7.     const TYPE_END  = 0;
  8.     const TYPE_DICT = 1;
  9.     const TYPE_INT  = 2;
  10.     const TYPE_LIST = 3;
  11.     const TYPE_STR  = 4;
  12.  
  13.     // delimiter specifications
  14.     const DELIM_END  = 'e';
  15.     const DELIM_DICT = 'd';
  16.     const DELIM_INT  = 'i';
  17.     const DELIM_LIST = 'l';
  18.     const DELIM_STR  = '[0-9]';
  19.  
  20.     // miscellaneous constants
  21.     const SEPARATOR = ':';
  22.  
  23.  
  24.     /**
  25.      * Contents of the bencoded file
  26.      *
  27.      * @var string
  28.      */
  29.     protected $_contents;
  30.  
  31.     /**
  32.      * Pointer to current position in the contents
  33.      *
  34.      * @var int
  35.      */
  36.     protected $_pointer = 0;
  37.  
  38.     /**
  39.      * Output container
  40.      *
  41.      * @var array
  42.      */
  43.     protected $_output;
  44.  
  45.  
  46.     /**
  47.      * Class constructor
  48.      * Allows instantiating directly with a file name
  49.      *
  50.      * @param string $path Path to a bencoded file
  51.      * @return Bdecode
  52.      */
  53.     public function __construct($path = null)
  54.     {
  55.         // load file from the path if a file name specified
  56.         if ($path !== null) {
  57.             $this->loadFile($path);
  58.         }
  59.     }
  60.  
  61.  
  62.     /**
  63.      * Load contents from a bencoded string
  64.      *
  65.      * @param string $string Bencoded string
  66.      * @throws Exception
  67.      * @return void
  68.      */
  69.     public function load($source)
  70.     {
  71.         if (empty($source)) {
  72.             throw new Exception('Source can not be empty');
  73.         }
  74.  
  75.         if (!is_string($source)) {
  76.             throw new Exception('Source must be a string');
  77.         }
  78.  
  79.         $this->_contents = $source;
  80.     }
  81.  
  82.  
  83.     /**
  84.      * Load contents of a bencoded file from a local path or URL
  85.      *
  86.      * @param string $path Path to the bencoded file
  87.      * @throws Exception
  88.      * @return void
  89.      */
  90.     public function loadFile($path)
  91.     {
  92.         $realpath = realpath($path);
  93.         if ($realpath === false) {
  94.             throw new Exception('File not found: ' . $path);
  95.         }
  96.  
  97.         $contents = file_get_contents($path);
  98.         $this->load($contents);
  99.     }
  100.  
  101.  
  102.     /**
  103.      * Process from the current position
  104.      *
  105.      * @return array
  106.      */
  107.     protected function _process()
  108.     {
  109.         // get type of the element the pointer is currently pointing at
  110.         $type = $this->_getCurrentType();
  111.  
  112.         // process current entry
  113.         switch ($type) {
  114.             case self::TYPE_DICT:
  115.                 return $this->_processDictionary();
  116.  
  117.             case self::TYPE_INT:
  118.                 return $this->_processInteger();
  119.  
  120.             case self::TYPE_LIST:
  121.                 return $this->_processList();
  122.  
  123.             case self::TYPE_STR:
  124.                 return $this->_processString();
  125.  
  126.             default:
  127.                 break;
  128.         }
  129.  
  130.         return null;
  131.     }
  132.  
  133.  
  134.     /**
  135.      * Get type of the value at the current pointer
  136.      *
  137.      * @throws Exception
  138.      * @return int
  139.      */
  140.     protected function _getCurrentType()
  141.     {
  142.         // get current element starting delimiter
  143.         $current = substr($this->_contents, $this->_pointer, 1);
  144.  
  145.         // type => pattern map
  146.         $map = array(
  147.             self::TYPE_END  => self::DELIM_END,
  148.             self::TYPE_DICT => self::DELIM_DICT,
  149.             self::TYPE_INT  => self::DELIM_INT,
  150.             self::TYPE_LIST => self::DELIM_LIST,
  151.             self::TYPE_STR  => self::DELIM_STR,
  152.         );
  153.  
  154.         // attempt to determine the type
  155.         foreach ($map as $type => $pattern) {
  156.             if (preg_match('/^' . $pattern . '$/', $current)) {
  157.                 return $type;
  158.             }
  159.         }
  160.  
  161.         throw new Exception('Invalid type delimiter encountered: ' . $current);
  162.     }
  163.  
  164.  
  165.     /**
  166.      * Process a dictionary entry
  167.      *
  168.      * @return array
  169.      */
  170.     protected function _processDictionary()
  171.     {
  172.         $output = array();
  173.  
  174.         // move pointer to the beginning of the first entry
  175.         $this->_pointer++;
  176.  
  177.         do {
  178.             // get key of the current entry
  179.             $key = $this->_processString();
  180.  
  181.             // get value of the current entry
  182.             $output[$key] = $this->_process();
  183.         } while ($this->_getCurrentType() !== self::TYPE_END);
  184.  
  185.         // move pointer to the beginning of the next element
  186.         $this->_pointer++;
  187.  
  188.         return $output;
  189.     }
  190.  
  191.  
  192.     /**
  193.      * Process an integer entry
  194.      *
  195.      * @return int
  196.      */
  197.     protected function _processInteger()
  198.     {
  199.         // move pointer to the start of the value
  200.         $this->_pointer++;
  201.  
  202.         // get position of the closing delimiter
  203.         $endPos = strpos($this->_contents, self::DELIM_END, $this->_pointer);
  204.  
  205.         // extract the value
  206.         $value = (int) substr($this->_contents, $this->_pointer, $endPos - $this->_pointer);
  207.  
  208.         // move pointer to the beginning of the next element
  209.         $this->_pointer = $endPos + 1;
  210.  
  211.         return $value;
  212.     }
  213.  
  214.  
  215.     /**
  216.      * Process a list entry
  217.      *
  218.      * @return array
  219.      */
  220.     protected function _processList()
  221.     {
  222.         $output = array();
  223.  
  224.         // move pointer to the beginning of the first entry
  225.         $this->_pointer++;
  226.  
  227.         do {
  228.             $output[] = $this->_process();
  229.         } while ($this->_getCurrentType() !== self::TYPE_END);
  230.  
  231.         // move pointer to the beginning of the next element
  232.         $this->_pointer++;
  233.  
  234.         return $output;
  235.     }
  236.  
  237.  
  238.     /**
  239.      * Process a string entry
  240.      *
  241.      * @return string
  242.      */
  243.     protected function _processString()
  244.     {
  245.         // get index of the separator
  246.         $separatorPos = strpos($this->_contents, self::SEPARATOR, $this->_pointer);
  247.  
  248.         // extract length of the string
  249.         $length = (int) substr($this->_contents, $this->_pointer, $separatorPos - $this->_pointer);
  250.  
  251.         // extract value of the string
  252.         $value = substr($this->_contents, $separatorPos + 1, $length);
  253.  
  254.         // move pointer to the beginning of the next element
  255.         $this->_pointer = $separatorPos + $length + 1;
  256.  
  257.         return $value;
  258.     }
  259.  
  260.  
  261.     /**
  262.      * Decode contents into a structure
  263.      *
  264.      * @throws Exception
  265.      * @return array
  266.      */
  267.     public function toArray()
  268.     {
  269.         // skip if there's nothing to process
  270.         if (empty($this->_contents)) {
  271.             throw new Exception('Contents not loaded');
  272.         }
  273.  
  274.         // reset the pointer
  275.         $this->_pointer = 0;
  276.  
  277.         // parse contents into an array and return
  278.         return $this->_process();
  279.     }
  280.  
  281.  
  282.     /**
  283.      * Get the original contents of the file
  284.      *
  285.      * @return string
  286.      */
  287.     public function __toString()
  288.     {
  289.         return (string) $this->_contents;
  290.     }
  291.  
  292.  
  293.     /**
  294.      * Decode bencoded contents
  295.      *
  296.      * @param string $contents Bencoded string
  297.      * @return array
  298.      */
  299.     public static function decode($contents)
  300.     {
  301.         $bencode = new self();
  302.         $bencode->load($contents);
  303.         return $bencode->toArray();
  304.     }
  305.  
  306.  
  307.     /**
  308.      * Load and decode contents of a bencoded file
  309.      *
  310.      * @param string $path Path to a bencoded file
  311.      * @return array
  312.      */
  313.     public static function decodeFile($path)
  314.     {
  315.         $bencode = new self($path);
  316.         return $bencode->toArray();
  317.     }
  318.  
  319. }

Usage

There are several ways to use the class.

1. Instantiate directly with a file name to parse:

  1. $bdecode = new BDecode('/path/to/file');
  2. $struct = $bdecode->toArray();

2. Instantiate the class but load the file only when needed:

  1. $bdecode = new BDecode();
  2. $bdecode->loadFile('/path/to/file');
  3. $struct = $bdecode->toArray();

3. Instantiate the class and load bencoded contents afterwards:

  1. $bdecode = new BDecode();
  2. $contents = file_get_contents('/path/to/file');
  3. $bdecode->load($contents);
  4. $struct = $bdecode->toArray();

There are also 2 static shortcut methods for both approaches (loading from a file or from a string).

You can use the following method if you have a file to load but don't want to go through the hassle of instantiating the class manually:

  1. $struct = BDecode::decodeFile('/path/to/file');

If you have the contents already, you can use the decode() method:

  1. $contents = file_get_contents('/path/to/file');
  2. $struct = BDecode::decode($contents);

As you can see, everything is very simple and there is no prior knowledge of the protocol required! Even though this class might not be as useful as it was a few years ago, I still hope that it will help someone out there.

As always, leave comments and suggestions below!

Comments
1
Hi, how to get the info hash, by right, i need to use the info array to generate sh1 but sh1 only accept string as parameter, please clarify..

Thanks
blacklizard, June 19th 2012, 20:38
2
If I understand correctly, you need to decode the sha1 hashes from the info array :?

This class doesn't decode the data, it only parses the dictionary into a multi-dimensional array. You could write your own class, extending this one and adding additional functionality to get/set torrent file properties. Something like $torrent->getFiles(), $torrent->setAnnounceUrl($url).

I have been planning to write a class like that and post it here but unfortunately haven't had the time to do that.
Andris, June 26th 2012, 11:15
3
Nice class! Thanks!
Sam, November 29th 2012, 10:26
Name
Email (required)
will not be published
Website
Recaptcha
you will only be required to fill it in once in this session

You can use [code][/code] tags in your comments