FaustCTF’17: Doodle Writeup
Although we were a bit late to the party of exploitation, I nevertheless liked the idea behind doodle’s vulnerability, so I decided to do a write-up (even if there is not first-blood money coming…)
In FAUSTCTF, there was a service dubbed Doodle, which provided the same basic features as its namesake. You could create a poll and have people vote for different options. To secure this, an access token was required for each poll. Whenever a poll was created or updated, you would be presented with a new token, but yet the old token still worked. All this could already be deduced without even looking at the source.
The Source Code
While many services in a CTF are written in “main-stream” languages such as Python, PHP, or C, this was a slightly different situation. FAUST took it on themselves to annoy the participants with Smalltalk ;-) Nevertheless, let’s try to understand what is happening here. First, we have a look at the interesting case from above, i.e., the fact that multiple access tokens work at the same time. The source for displaying a token (in
poll.st) sheds some light on this:
Here, we observe that a new
Token is created, and the
tokenvalue is set to carry the poll id (which basically is a UUID). Next,
cryptography is called for the token, and finally it is serialized before being inserted into the HTML body. So, let’s try and understand what is happening in
cryptography first. To that end, we open up
token.st and find the following code:
Here, we find a number of things. First, the algorithm used is TwoFish in CBC mode. The initialization vector (IV) is generated randomly each time. Once the encryption is done, a SHA1 HMAC is derived from the resulting ciphertext, but importantly does not include the IV. Let’s now look at the source for the serialize and deserialize functionality.
Even without an meaningful understanding of the language itself, we can see that when serializing the token, it merely is the base64 representation of the calculated HMAC, the IV, and the resulting ciphertext. Similarly, when deserializing, HMAC, IV, and ciphertext are “unpacked” accordingly.
Cipher Block Chaining
Cipher Block Chaining or CBC for short is a crypto mode which is meant to ensure that even if the same exact plain text is sent multiple times with the same key, the resulting crypt texts will look completely different. In a nutshell, CBC works as follows:
CIPHER_n = encrypt(PLAIN_n ^ CIPHER_n-1, key)
That is, we take the resulting ciphertext of the previous block and XOR it on the current plain text block. The result is then encrypted. Since naturally for the first cipher block, no previous encrypted block exists, an initilization vector is required. Hence, by changing the IV, the actual message being encrypted in the first step is changed. This changes the encrypted output, which subsequently changes all following blocks (since each block depends on the result of the previous cipher block). Decryption works in the same fashion:
PLAIN_n = decrypt(CIPHER_N, key) ^ CIPHER_n-1
For more information on CBC see the Wiki page.
Let’s sum up what we have learned so far:
- Each poll has a poll id. More precisely it is a SandstoneDB identifier (e.g.,
- The poll id is encrypted using TwoFish CBC, with a random IV.
- The encrypted text is secured with an HMAC, but not the IV.
- The token consists of the HMAC, the IV, and the ciphertext.
While I did not look deeper into how the poll ids were generated, an interesting fact is the following: the first entry in our database had the ID
4E5B17E0-D631-11D2-84B3-FA1DE458B368, the second one was
2E6F9860-D632-11D2-84B3-FA1DE458B368. One thing becomes quite clear here: only the first two “blocks” are changed. More important, the blocks are 8 and 4 bytes long, with an added dash in the middle. Hence, only the first 13 characters of the IDs are different, the rest is exactly the same (and based somehow on the current time).
Thinking back to how CBC works, we now see the vulnerability. While the encrypted text can not be mangled as it is secured by the HMAC, we are free to fiddle around with the IV. The IV is used to unXOR the first 16 byte block after the decryption step. Let’s assume that the gameserver created a poll with its ID starting with
4E5B..., and we created the second poll, getting the ID
2E6F.... We can learn the poll id for a given poll and we know our token, which encrypts our ID
For the sake of simplicity, let’s only focus on the first byte of our IV. We assume that is an “A” (0x41 in hex). Considering our token
2E6F..., the decryption looks something like this:
PLAIN_1 = decrypt(CIPHER_1, key) ^ 'A....'
PLAIN_1 = decrypt(CIPHER_1, key) ^ 'A' = '2'
If we now want to change the ‘2’ into a ‘4’, we merely have to change the first byte of the IV, such that it is
'A' ^ '2' ^ '4' = 'G'. This allows us to produce a plaintext of our choosing after the XORing is done. This process can be applied in the same fashion to each of the first 16 bytes. Moreover, since we only modify the IV, the HMAC still works perfectly fine and no other plaintext block is changed. The horrible Python code exploiting this is shown below:
Patching the flaw
While we did not actually manage to patch the issue in the CTF (simply because we did not compile the updated
token.st file, since it was not in the Makefile and we did not notice..), there are several ways of addressing the issue. The dirty option we pursued, which merely would have held up exploits, was to change the order of the IV and HMAC in the serialized token. Assuming that nobody is specifically looking only at us, that would have sufficed. There are however three better ways of doing it.
- Make the IV part of the HMAC calculation.
- Make the IV static (similar to the TwoFish and HMAC keys).
- Remove CBC altogether, since it really has no use here ;)
I commend FAUST for taking the effort to program a service in Smalltalk which had this subtle bug in a lot of other code ;-) We were a little late to join the party of exploiting teams, also because I screwed up the exploit a bit in the first try. When trying to extract the token from the newly created poll, I calculated the required number of chars: 20 bytes HMAC, 16 bytes IV, and 48 bytes encrypted message (poll ids are only 36 bytes, but all messages must be padded to a multiple of 16 bytes). My snafu in that case was that I did not account for the base64 encoding: my regular expression was looking for 84 bytes rather than the required 112. It took me almost an hour to figure out my issue and I felt really stupid afterwards. Nevertheless, we managed to get 320 of the 367 successfully submitted Doodle flags with this exploit.
As my colleague Oliver already mentioned, the CTF ran quite nicely. They had a good mix of different services and I am happy we finished 3rd overall (see final scoreboard)