Saarsec

saarsec

Schwenk and pwn

PlaidCTF 2020 - golf.so

Or how to use Z3 to create tiny ELFs.

PlaidCTF a few week ago featured a nice codegolf-style challenged named golf.so. The task was simple:

Upload a 64-bit ELF shared object of size at most 1024 bytes. It should spawn a shell (execute execve("/bin/sh", ["/bin/sh"], ...)) when used like

LD_PRELOAD=<upload> /bin/true

This was accompanied by a nice scoreboard of the smallest submissions received so far. Although we did not solve the challenge during the CTF, I still thought it was neat, and, luckily, the challenge website was up for a few more days after the CTF was over.

Naive Beginnings

As others have noted, using gcc to compile such a file from C does not work:

__attribute__((constructor)) void init(void){
        execve("/bin/sh");
}

even when compiled with gcc -fPIC -shared -Os -o first.so first.c -nostdlib results in a binary of over 14 kilobytes. It was thus time to get creative.

Building Small ELFs by Hand

There are some great posts on creating small ELF executables. However, executables and shared objects are not loaded in the same way. Nonetheless, there are still some great takeaways we can learn from that post:

Armed with that knowledge we thus set out to create the smallest 64-bit ELF shared object possible: a single ELF header. Thankfully, wikipedia lists all header formats in great detail.

We ended up with the following:

e_ident[EI_MAG]        = 7f 45 4c 46 // magic number
e_ident[EI_CLASS]      = 02 // 64-bit
e_ident[EI_DATA]       = 01 // little endian
e_ident[EI_VERSION]    = 01 // current version
e_ident[EI_OSABI]      = 00 // system ABI
e_ident[EI_ABIVERSION] = 00 // ABI version
e_ident[EI_PAD]        = 00 00 00 00 00 00 00 // 7b padding

e_type                 = 03 00 // ET_DYN = shared object
e_machine              = 3e 00 // amd64
e_version              = 01 00 00 00 // "current" ELF version
e_entry                = 00 00 00 00 00 00 00 00 // entry point for executables
e_phoff                = 00 00 00 00 00 00 00 00 // offset to program header table
e_shoff                = 00 00 00 00 00 00 00 00 // offset to section header table
e_flags                = 00 00 00 00 // "flags"?
e_ehsize               = 40 00 // size if the file header (64 bytes for 64-bit)
e_phentsize            = 38 00 // size of a program header (56 bytes for 64-bit)
e_phnum                = 00 00 // no program headers
e_shentsize            = 40 00 // size of a section header (64 bytes for 64-bit)
e_shnum                = 00 00 // no section headers
e_shstrndx             = 00 00 // index of section header that contains section names

All that is now left to do is modify this file until it is accepted by the loader and can spawn a shell. Luckily, the loader itself is quite helpful at that task, e.g., invoking it with the header-only object from above results in

$ LD_PRELOAD=./minimal.so /bin/true                                                      
ERROR: ld.so: object './minimal.so' from LD_PRELOAD cannot be preloaded (object file has no loadable segments): ignored.

This lets us know that we need to add a loadable segment (and thus a program header).

First however, we need a nice way to be able to add sections that can be automatically overlapped.

Letting Z3 do the dirty work

Looking at the header again, we can see that we have three types of fields:

This is something that can be modelled nicely using z3’s array logic.

elf = z3.Array('elf', z3.BitVecSort(8), z3.BitVecSort(8))

Creates a new Z3-Array (named elf) of 8-bit values (i.e. bytes), where the indices are also 8-bit values (by the time we attempted to solve the task there were already solutions below 256 bytes, so 8-bit indices should suffice).

We can then create a new solver instance and add the constraints from above like so:

s = z3.SolverFor('QF_ABV')
s.add(elf[0] == 0x7f)
s.add(elf[1] == 0x45)
s.add(elf[2] == 0x4c)
s.add(elf[3] == 0x46)

What makes z3 arrays really powerful for this task is the fact, that array indices themselves can also be symbolic variables. For example, if we want to ensure that e_phoff (offset 0x20 in the elf header) points to some byte that contains the value 1, we could create a new symbolic variable phoff (as another 8-bit variable) and write

phoff = z3.BitVec('phoff', 8)
s.add(elf[0x20] == phoff)
s.add(elf[phoff] == 0x1)

Keeping track of used bytes

To be able to constrain not only single bytes, but also multi-byte fields (such as the various 8-byte pointers), we can abuse python’s slice notation. At the same time we can also introduce a symbolic variable to keep track of the required filesize:

class Golfer:
    def __init__(self):
        self.s = z3.SolverFor('QF_ABV')
        self.elf = z3.Array('elf', z3.BitVecSort(8), z3.BitVecSort(8))
        self.maxsize = z3.BitVec('maxsize', 8)

    def add(self, *args):
        self.s.add(*args)    

    def __getitem__(self, key):
        if isinstance(key, slice):
            return z3.Concat(*(self[key.start + i] for i in reversed(range(0, key.stop, key.step or 1))))
        else:
            self.add(z3.Or(self.elf[key] == 0, z3.ULT(key, self.maxsize)))
            return self.elf[key]

Remember that trailing zero-bytes can be omitted. Therefore we can add a constraint that maxsize only has to be larger than the currently accessed byte if the byte itself is non-zero.

We also abuse the slice notation to mean <start>:<length> rather than <start>:<end> because it makes handling symbolic offsets easier.

With this we can write the minimal header-only object from above as

    def file_header(self):
        self.add(self[0x0:4] == int.from_bytes(b'\x7fELF', byteorder='little'))
        self.add(self[0x4] == 2)  # 64-bit
        self.add(self[0x5] == 1)  # little-endian
        self.add(self[0x6] == 1)  # "current" ELF version
        self.add(self[0x7] == 0)  # System-V-ABI
        self.add(self[0x8] == 0)  # EI_ABIVERSION
        self.add(self[0x9:7] == 0)  # EI_PAD
        self.add(self[e_type:2] == 3)  # ET_DYN
        self.add(self[e_machine:2] == 0x3e)  # AMD64
        self.add(self[e_version:4] == 1)  # "current" ELF version (again?)
        self.add(self[e_phoff:8] == z3.ZeroExt(64 - 8, self.phoff))
        self.add(self[e_phentsize:2] == self.phentsize)
        self.add(self[e_phnum:2] == self.phnum)

Collecting Constraints

With this setup we could then “simply” modify our ELF (e.g. add a section or constrain another value) and see at which point the loader would complain. While the loader itself is already quite verbose, what helped at this stage was:

After spending quite a lot of time in a modify-crash-debug loop, here are the constraints we ended up with:

First, we need one loadable segment (type PT_LOAD). This segment must be marked RWX, must be aligned to page boundaries, must fit in the file, and must include our shellcode.

Second, we need a dynamic segment (type PT_DYNAMIC). The virtual address of this segment must point to an array of dynamic tags (ELF64_Dyn structs), its memorysize must fit the tag array, and its filesize must be non-zero.

Finally, we need four dynamic tags:

We can add these as constraints like so:

    def write_segments(self):
        # first PT_LOAD segment
        self.add(self[self.phoff + p_type:4] == 1)  # PT_LOAD (loadable segment)
        self.add(self[self.phoff + p_flags:4] & 0b111 == 0b111)  # ensure RWX
        self.add(self[self.phoff + p_offset:8] == 0)  # load everything (=always aligned!)
        self.add(self[self.phoff + p_vaddr:8] == 0)  # otherwise _dl_map_segments shits its pants
        self.add((self[self.phoff + p_offset:8] - self[self.phoff + p_vaddr:8]) & 0xfff == 0)  # align
        self.add(self[self.phoff + p_filesz:8] == z3.ZeroExt(64 - 8, self.maxsize))  # load at most all of this file
        self.add(
            self[self.phoff + p_memsz:8] == self[self.phoff + p_filesz:8])  # make virtual size at least physical size
        self.add(self[self.phoff + p_align:8] == 0x1000)  # page aligned

        # second PT_DYNAMIC segment
        self.add(self[self.phoff + self.phentsize + p_type:4] == 2)  # PT_DYNAMIC (dynamic segment)
        self.add(self[self.phoff + self.phentsize + p_vaddr:8] == z3.ZeroExt(64 - 8, self.ld))
        self.add(self.ld != 0) # segment pointer must be non-zero
        self.add(self[self.phoff + self.phentsize + p_memsz:8] == self.ldnum * self.dyn_sz)
        self.add(self[self.phoff + self.phentsize + p_filesz:8] != 0)  # filesz must be != 0

    def write_ld(self):
        self.add(z3.ULT(self.strtab, self.maxsize))  # ensure it's at least within our file
        self.add(z3.ULT(self.symtab, self.maxsize))  # ensure it's at least within our file
        tags = [
            (DT_INIT, z3.ZeroExt(64 - 8, self.init)),
            (DT_STRTAB, z3.ZeroExt(64 - 8, self.strtab)),
            (DT_SYMTAB, z3.ZeroExt(64 - 8, self.symtab))
        ]

        for i, (k, v) in enumerate(tags):
            tag_offset = self.ld + (i * 0x10)
            self.add(self[tag_offset + 0x0:8] == k)
            self.add(self[tag_offset + 0x8:8] == v)

        self.add(self[self.ld + len(tags) * 0x10: 8] == DT_NULL)

Shellcode

We now have a 64-bit ELF shared object that can be successfully loaded and even jumps to an instruction under our control (at symbolic location init). All that’s now left is to spawn a shell (with execve("/bin/sh", ["/bin/sh"], ...)).

The shortest code we could come up with was

mov rdi, 0x68732f6e69622f
push 0
push rdi
mov rsi, rsp
xor eax, eax
mov al, 0x3b
syscall

Which we could add to our ELF as follows:

    def write_shellcode(self):
        shellcode = '''
            mov rdi, 0x68732f6e69622f;
            push 0;
            push rdi;
            mov rsi, rsp;
            xor eax, eax;
            mov al, 0x3b;
            syscall;
            '''.strip()

        code = asm.asm(shellcode)
        for j, b in enumerate(code):
            self.add(self[self.init + j] == b)

However, for optimal golfing we would rather not have our shellcode as one giant, continuous blob, but rather as small chunks.

The first “optimization” in that regard is placing the /bin/sh\0 constant somewhere else in memory and using rip-relative addressing (lea rdi, [rip+<something>]). For this we just need to create another symbolic address for the /bin/sh\0 string and then make sure we fix the offset our lea instruction accordingly.

As a second step we can also insert jump instructions between our shellcode instructions to jump from one blob to another. Here, we can again leverage z3 to create a constraint that the after an instruction we either place the next shellcode instruction or a relative jump to the next instruction.

Together, we end up with the following:

    def write_shellcode(self):
        # place /bin/sh into memory
        for i, x in enumerate(b'/bin/sh\0'):
            self.add(self[self.binsh + i] == x)

        shellcode = '''
                lea rdi, [rip+0x0c0c0c0c];
                push 0;
                push rdi;
                mov rsi, rsp;
                xor eax, eax;
                mov al, 0x3b;
                syscall;
                '''.strip().splitlines()

        # create array of instruction locations
        # first instruction must be at `init`, later instructions can be wherever
        ins_loc = [self.init] + [z3.BitVec('ins_%02d' % (j + 1), 8) for j in range(len(shellcode) - 1)]

        for j, ins in enumerate(shellcode):
            code = asm.asm(ins)
            loc = ins_loc[j]

            fixed = False
            for i, x in enumerate(code):
                if x != 0x0c:
                    self.add(self[loc + i] == x)
                else:
                    if not fixed:
                        self.add(self[loc + i:4] == z3.SignExt(32 - 8, self.binsh - loc - (i + 4)))
                        fixed = True

            # if there is a next instruction, make sure it 
            # follows immediately *or* is jumped to
            if j + 1 < len(ins_loc):
                loc_after = loc + len(code)
                next_loc = ins_loc[j + 1]
                self.add(z3.Or(
                    self[loc_after:2] == z3.Concat(next_loc - (loc_after + 2), z3.BitVecVal(0xeb, 8)),  # have a jump
                    loc_after == next_loc  # be the next instruction
                ))

We used 0x0c as a marker byte to fix-up the lea instruction, and hardcoded the 0xeb for relative jumps.

Computing the smallest ELF

With all constraints set up we now only needed a way to minimize the total filesize. However, as we already had a symbolic variable for that (maxsize), we could simply check whether z3 could find a satisfying solution with maxsize < X and decrease X until this was no longer true.

With this we ended up with a 163 byte solution, which was good enough to give us both flags (albeit a few days too late to score them):

00000000  7f 45 4c 46  02 01 01 00  00 00 00 00  00 00 00 00  │·ELF│····│····│····│
00000010  03 00 3e 00  01 00 00 00  0f 05 01 00  00 00 0f 05  │··>·│····│····│····│
00000020  1a 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  │····│····│····│····│
00000030  00 00 31 c0  eb 20 38 00  02 00 a3 00  00 00 00 00  │··1·│· 8·│····│····│
00000040  00 00 a3 00  00 00 00 00  00 00 00 10  00 00 00 00  │····│····│····│····│
00000050  00 00 02 00  00 00 b0 3b  eb be 2f 62  69 6e 2f 73  │····│···;│··/b│in/s│
00000060  68 00 82 00  00 00 00 00  00 00 48 8d  3d e9 ff ff  │h···│····│··H·│=···│
00000070  ff 6a 00 57  48 89 e6 eb  b9 40 30 00  00 00 00 00  │·j·W│H···│·@0·│····│
00000080  00 00 0c 00  00 00 00 00  00 00 6a 00  00 00 00 00  │····│····│··j·│····│
00000090  00 00 05 00  00 00 00 00  00 00 20 00  00 00 00 00  │····│····│·· ·│····│
000000a0  00 00 06                                            │···│

Full script

Here is the full script for completeness:

import z3
from pwnlib import *
from pwnlib.util import fiddling

context.context(arch='amd64', os='linux')

# elf header
e_ident = 0x0
e_type = 0x10
e_machine = 0x12
e_version = 0x14
e_entry = 0x18
e_phoff = 0x20
e_shoff = 0x28
e_flags = 0x30
e_ehsize = 0x34
e_phentsize = 0x36
e_phnum = 0x38
e_shentsize = 0x3a
e_shnum = 0x3c
e_shstrndx = 0x3e

# program header
p_type = 0x00
p_flags = 0x04
p_offset = 0x08
p_vaddr = 0x10
p_paddr = 0x18
p_filesz = 0x20
p_memsz = 0x28
p_align = 0x30

DT_NULL = 0
DT_STRTAB = 0x5
DT_SYMTAB = 0x6
DT_INIT = 0xc


class Golfer:
    def __init__(self):
        self.s = z3.SolverFor('QF_ABV')
        # there are 136 byte entries => 8 bit indices suffice
        self.ELF = z3.Array('elf', z3.BitVecSort(8), z3.BitVecSort(8))
        self.maxsize = z3.BitVec('maxsize', 8)
        self.phoff = z3.BitVec('phoff', 8)
        self.shoff = z3.BitVec('shoff', 8)
        self.flags = z3.BitVec('flag', 4 * 8)
        self.phentsize = 0x38  # 64-bit program header
        self.phnum = 2
        self.shentsize = 0x40  # 64-bit section header
        self.shnum = 0
        self.shstrndx = z3.BitVec('shstrndx', 8)
        self.init = z3.BitVec('init', 8)
        self.code_size = z3.BitVec('codesize', 8)
        self.ld = z3.BitVec('ld', 8)
        self.ldnum = 3
        self.dyn_sz = 16
        self.strtab = z3.BitVec('strtab', 8)
        self.symtab = z3.BitVec('symtab', 8)
        self.binsh = z3.BitVec('binsh', 8)

        self.file_header()
        self.write_segments()
        self.write_ld()
        self.write_shellcode()

    def add(self, *args):
        self.s.add(*args)

    def __getitem__(self, key):
        if isinstance(key, slice):
            return z3.Concat(*(self[key.start + i] for i in reversed(range(0, key.stop, key.step or 1))))
        else:
            self.add(z3.Or(self.ELF[key] == 0, z3.ULT(key, self.maxsize)))
            return self.ELF[key]

    def file_header(self):
        self.add(self[0x0:4] == int.from_bytes(b'\x7fELF', byteorder='little'))
        self.add(self[0x4] == 2)  # 64-bit
        self.add(self[0x5] == 1)  # little-endian
        self.add(self[0x6] == 1)  # "current" ELF version
        self.add(self[0x7] == 0)  # System-V-ABI
        self.add(self[0x8] == 0)  # EI_ABIVERSION
        self.add(self[0x9:7] == 0)  # EI_PAD
        self.add(self[e_type:2] == 3)  # ET_DYN
        self.add(self[e_machine:2] == 0x3e)  # AMD64
        self.add(self[e_version:4] == 1)  # "current" ELF version (again?)
        self.add(self[e_phoff:8] == z3.ZeroExt(64 - 8, self.phoff))
        self.add(self[e_phentsize:2] == self.phentsize)
        self.add(self[e_phnum:2] == self.phnum)

    def write_segments(self):
        # first PT_LOAD segment
        self.add(self[self.phoff + p_type:4] == 1)  # PT_LOAD (loadable segment)
        self.add(self[self.phoff + p_flags:4] & 0b111 == 0b111)  # ensure RWX
        self.add(self[self.phoff + p_offset:8] == 0)  # load everything (=always aligned!)
        self.add(self[self.phoff + p_vaddr:8] == 0)  # otherwise _dl_map_segments shits its pants
        self.add((self[self.phoff + p_offset:8] - self[self.phoff + p_vaddr:8]) & 0xfff == 0)  # align
        self.add(self[self.phoff + p_filesz:8] == z3.ZeroExt(64 - 8, self.maxsize))  # load at most all of this file
        self.add(
            self[self.phoff + p_memsz:8] == self[self.phoff + p_filesz:8])  # make virtual size at least physical size
        self.add(self[self.phoff + p_align:8] == 0x1000)  # page aligned

        # second PT_DYNAMIC segment
        self.add(self[self.phoff + self.phentsize + p_type:4] == 2)  # PT_DYNAMIC (dynamic segment)
        self.add(self[self.phoff + self.phentsize + p_vaddr:8] == z3.ZeroExt(64 - 8, self.ld))
        self.add(self.ld != 0) # segment pointer must be non-zero
        self.add(self[self.phoff + self.phentsize + p_memsz:8] == self.ldnum * self.dyn_sz)
        self.add(self[self.phoff + self.phentsize + p_filesz:8] != 0)  # filesz must be != 0

    def write_ld(self):
        self.add(z3.ULT(self.strtab, self.maxsize))  # ensure it's at least within our file
        self.add(z3.ULT(self.symtab, self.maxsize))  # ensure it's at least within our file
        tags = [
            (DT_INIT, z3.ZeroExt(64 - 8, self.init)),
            (DT_STRTAB, z3.ZeroExt(64 - 8, self.strtab)),
            (DT_SYMTAB, z3.ZeroExt(64 - 8, self.symtab))
        ]

        for i, (k, v) in enumerate(tags):
            tag_offset = self.ld + (i * 0x10)
            self.add(self[tag_offset + 0x0:8] == k)
            self.add(self[tag_offset + 0x8:8] == v)

        self.add(self[self.ld + len(tags) * 0x10: 8] == DT_NULL)

    def write_shellcode(self):
        # place /bin/sh into memory
        for i, x in enumerate(b'/bin/sh\0'):
            self.add(self[self.binsh + i] == x)

        shellcode = '''
                lea rdi, [rip+0x0c0c0c0c];
                push 0;
                push rdi;
                mov rsi, rsp;
                xor eax, eax;
                mov al, 0x3b;
                syscall;
                '''.strip().splitlines()

        # create array of instruction locations
        # first instruction must be at `init`, later instructions can be wherever
        ins_loc = [self.init] + [z3.BitVec('ins_%02d' % (j + 1), 8) for j in range(len(shellcode) - 1)]

        for j, ins in enumerate(shellcode):
            code = asm.asm(ins)
            loc = ins_loc[j]

            fixed = False
            for i, x in enumerate(code):
                if x != 0x0c:
                    self.add(self[loc + i] == x)
                else:
                    if not fixed:
                        self.add(self[loc + i:4] == z3.SignExt(32 - 8, self.binsh - loc - (i + 4)))
                        fixed = True

            # if there is a next instruction, make sure it 
            # follows immediately *or* is jumped to
            if j + 1 < len(ins_loc):
                loc_after = loc + len(code)
                next_loc = ins_loc[j + 1]
                self.add(z3.Or(
                    self[loc_after:2] == z3.Concat(next_loc - (loc_after + 2), z3.BitVecVal(0xeb, 8)),  # have a jump
                    loc_after == next_loc  # be the next instruction
                ))

    def get_output(self, concrete_max_size=255):
        self.s.push()
        self.s.add(self.maxsize == concrete_max_size)
        if self.s.check() != z3.sat:
            self.s.pop()
            raise ValueError()
        self.s.pop()
        m = self.s.model()
        info = {}
        info['maxsize'] = m.eval(self.maxsize).as_long()
        info['phoff'] = m.eval(self.phoff).as_long()
        info['init'] = m.eval(self.init).as_long()
        info['ld'] = m.eval(self.ld).as_long()
        info['strtab'] = m.eval(self.strtab).as_long()
        info['symtab'] = m.eval(self.symtab).as_long()
        info['binsh'] = m.eval(self.binsh).as_long()

        o = bytes(m.eval(self.ELF[i]).as_long() for i in range(concrete_max_size))
        return o, info


def main():
    g = Golfer()
    lo = 1
    hi = 256
    elf = None
    info = None
    while lo < hi:
        m = (lo + hi) // 2
        try:
            elf, info = g.get_output(m)
            hi = m
        except ValueError:
            lo = m + 1
    if elf:
        print(f"Got {lo} bytes:")
        print('maxsize = %d' % info['maxsize'])
        print('phoff @ 0x%02x' % info['phoff'])
        print('init @ 0x%02x' % info['init'])
        print('ld @ 0x%02x' % info['ld'])
        print('strtab @ 0x%02x' % info['strtab'])
        print('symtab @ 0x%02x' % info['symtab'])
        print('binsh @ 0x%02x' % info['binsh'])
        print(fiddling.hexdump(elf))
        with open('gen.so', 'wb') as f:
            f.write(elf)
    else:
        print("I'm sorry")


if __name__ == '__main__':
    main()

Conclusion

Super fun golfing challenge that I now regret to not have tackled during the CTF! Also, I’m still super curious how others managed to get 136 byte solutions…