HTTP Response Splitting in Node.js - Root Cause Analysis

Author: Amit Klein

Today we’re publishing an advisory describing how Node.js’s HTTP Response Splitting protection can be bypassed and HTTP Response Splitting attacks can (still) be mounted against some Node.js applications. Of course, the Node.js team was notified in advance and quickly came up with a fix that is incorporated in new Node.js releases, available in coordination with our advisory release.

The full advisory is here, I really recommend you read it; in this blog post I want to concentrate on the root cause of this specific vulnerability.

So apparently, early in its life, Node.js was vulnerable to a straight-forward version of HTTP Response Splitting. Then on November 19th, 2012, Bert Belder from the Node.js team took notice of this and deployed a fix (https://github.com/nodejs/node/commit/3c293ba27250f1885efa8d8db8e75d3ea033c206), which was incorporated in Node.js v0.8.20 (https://nodejs.org/en/blog/release/v0.8.20/ – released February 15th, 2013). The nature of the fix was simple: each HTTP header value was subjected to a search for CR/LF characters, and if such characters are found, they were discarded. Since HTTP Response Splitting is all about injecting CR/LF via user data (those CR/LF sequences are used to “break out” of the intended HTTP response header to craft additional response headers, and in fact entire HTTP responses), the solution makes a lot of sense.

Indeed this approach takes care of almost the entire problem. Almost. There is a scenario in which this approach doesn’t cut it, not because it’s not the right approach, but because there’s an anomaly in how Node.js is implemented. Node.js allows the application developer to name the encoding scheme of the data he/she sends in the HTTP response body. As you know, HTTP responses are made of a status line (the first response line), then HTTP response headers, and finally the HTTP response body. Particularly the problem is with how Unicode data is serialized. Keep in mind that Javascript strings (Javascript is the language Node.js and its applications are written in) are made of Unicode symbols (unlike e.g. C/C++ whose native strings are byte-oriented).

Normally, UTF8 encoding is used to serialize HTTP response header (Unicode) strings. This is a good choice, as UTF8 behaves well from a security standpoint. That is, the only Unicode symbols that map into ASCII bytes (ordinal value < 128) are the ASCII characters themselves. In other words, for a byte-oriented protocol such as HTTP, that relies on ASCII characters (e.g. Carriage Return – CR, and Line Feed – LF) to delimit protocol headers, UTF8 is a safe way of encoding Unicode characters. You can never “break out” from an HTTP header using any character other than the intended CR and LF. So the test for CR and LF (in their Unicode form, before serialization) on the header value in this case correctly prevents HTTP Response Splitting, because a non-CR/LF Unicode character will never turn into a CR/LF byte when serialized as UTF8.

The problem arises when the application developer specifies “ascii” or “binary” encoding for the HTTP response data (body). This is because Node.js’s implementation normally serializes the headers upon the first time that data is provided for the HTTP response body, using the encoding specified for the HTTP response body (and probably without the application developer being aware to this subtlety). And the “ascii” and “binary” encodings implemented by Node.js simply “fold” Unicode characters into bytes by taking the least significant 8 bits of the Unicode code point to be the serialized byte (another way of looking at is is that “ascii”/”binary” encoding means taking the code point modulo 256). So in this encoding, a seemingly innocent Unicode character U+010D (“LATIN SMALL LETTER C WITH CARON” – ?), is folded into the byte 0x0D – which is CR. Likewise, U+010A (“LATIN CAPITAL LETTER C WITH DOT ABOVE” – ?) is folded into the byte 0x0A (LF). Using these (U+010D, U+010A) characters it is thus possible to bypass the HTTP Response Splitting protection implemented in Node.js and come up with HTTP responses that contain “unexpected” CR/LF characters.

Frankly it’s quite easy to miss. After all, the HTTP Response Splitting prevention code is there, it’s solid and tested, it’s right in the critical code path, so there’s no reason to think that something’s wrong. The major difference introduced by the “ascii”/”binary” encoding can also easily escape the reviewer (the term “ascii”/”binary” doesn’t necessarily hint at the “folding” behavior). And the vulnerable scenario is, to be honest, quite rare.

But things got complicated later on. On August 15th, 2013, two changes were committed by another Node.js developer – https://github.com/nodejs/node/commit/ce3d18412c9cbd8259f1dac84f83a039436adf91 and https://github.com/nodejs/node/commit/da93d6adfb0abfcaac26e1509748edca0db8c003. Both were incorporated into v0.11.6 (and onward), and the former was also backported to v0.10.17 (and onward). Both v0.11.6 and v0.10.17 were released on August 21st, 2013. These two changes together caused a new vulnerable scenario, much more common, to materialize.

The two commits caused HTTP responses with an empty body to be explicitly serialized as ‘ascii’ (a day later another commit changed this to ‘binary’ – https://github.com/nodejs/node/commit/1f9f86349410f0a008f8d0df9aa66aed60f7a8e9, but there are no relevant differences between ‘ascii’ and ‘binary’ for the purpose of this blog post), instead of implicitly as UTF8, as it was prior to these changes. So all of a sudden, HTTP responses with empty body became vulnerable to the attack (provided of course that at least one header could be written to by the HTTP request). This is not a subtle change. There’s a very common situation wherein HTTP responses with empty body are used (often with request data written into their headers) – I’m talking of course about HTTP 3xx redirection. A quite common practice for an application developer is to take a request URL parameter and embed it in a redirection URL. For example, one can ask (via an HTML form) the user for his/her language preference, and redirect the user to the language-specific page based on the result. So starting late 2013, the problem became more serious, and all this due to a modification in a totally different area of the code!

This is an example to what a determined attacker (that was me in this case…) with enough resources (time, access to source code) can do – find a subtlety that was missed, and understand how to exploit it. Which ties back to my earlier post, Why Software Security is Hard (http://blog.safebreach.com/2016/01/06/why-software-security-is-hard/). In that post I explained that an attacker can orchestrate an attack out of less probable events. Back to the case at hand – feeding the Node.js application with characters which are folded into CR/LF is very unlikely when the application is consumed by non-malicious users. On top of that, an attacker needs to craft the even less likely HTTP Response Splitting payload in order to actually split the HTTP response stream in a useful manner. Finally, the attack surface expanded significantly with the late 2013 modifications, illustrating how difficult it is to maintain the security of a software product even when security awareness and best practices are followed.

Resource

SafeBreach Breach and Attack Simulation (BAS) Platform

Use Cases

Resource

SafeBreach Security Control Validation: Minimize Risk, Maximize the Return on Your Security Investments

GUIDE

Breach and Attack Simulation

Resource

SafeBreach & Zscaler Internet Access™

Resource

Four Pillars of Breach and Attack Simulation (BAS)