Be cautious about Content-Length HTTP headers

The Content-Length HTTP header is used to inform the server or client of the size of the request or response body that is being transferred. Many times when writing our own HTTP clients/servers, we want to send a string in the body. To compute the Content-Length in such a case the easiest way to do it is to compute the string length, but that can lead to strange bugs.

Let’s see what the HTTP/1.1 RFC 2616 says about Content-Length:

The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs, sent to the recipient or, in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET.

Thus it is the number of octets (8-bit bytes) that is to be sent in the entity body. Thus string lengths will horribly go wrong in case of UTF-8 strings, where more than a byte if required to represent the unicode characters. So strings in languages other than english or emojis will return wrong Content-Length causing our code to break, since it was reading less number of bytes.

For example in Node.js:

''.length             // returns 1, wrong
Buffer.byteLength('') // returns 3, correct

The official docs of Node.js for the HTTP module, shows usage of string length, but that is not really a safe thing to do. We should always use Buffer.byteLength or the corresponding methods in other languages to count the number of bytes.

A reduced test-case to demonstrate how it can lead to wrong data being received and errors happening is given below.

var http = require('http');

var TEST_STRING = 'Haha🔫';

var server = http.createServer(function(req, res) {
  var data = '';
  req.on('data', function(chunk) {
    data += chunk;
  });
  req.on('end', function() {
    console.log('Recieved string:', data);
    console.log('Actual String:', TEST_STRING);
    res.end();
  });
});

server.listen(3001, 'localhost', function() {
  var req = http.request({
    port: 3001,
    headers: {
      'Content-Length': TEST_STRING.length // Should use Buffer.byteLength here
    }
  }, function(res) {
    server.close();
  });
  req.on('error', function(e) {
    console.log('Problem with request:', e.message);
    server.close();
  });
  req.end(TEST_STRING);
});

The above sample sends the string Haha🔫 but due to wrong Content-Length headers the server ends up receiving a garbage value.