PlaidCTF 2020 - golf.so
14 May 2020 by jkrupp
Or how to use Z3 to create tiny ELFs.
PlaidCTF a few week ago featured a nice codegolf-style challenged named golf.so. The task was simple:
Upload a 64-bit ELF shared object of size at most 1024 bytes. It should spawn a shell (execute
execve("/bin/sh", ["/bin/sh"], ...)
) when used likeLD_PRELOAD=<upload> /bin/true
This was accompanied by a nice scoreboard of the smallest submissions received so far. Although we did not solve the challenge during the CTF, I still thought it was neat, and, luckily, the challenge website was up for a few more days after the CTF was over.
Naive Beginnings
As others have noted, using gcc to compile such a file from C
does not work:
__attribute__((constructor)) void init(void){
execve("/bin/sh");
}
even when compiled with gcc -fPIC -shared -Os -o first.so first.c -nostdlib
results in a binary of over 14 kilobytes.
It was thus time to get creative.
Building Small ELFs by Hand
There are some great posts on creating small ELF executables. However, executables and shared objects are not loaded in the same way. Nonetheless, there are still some great takeaways we can learn from that post:
- An ELF file must start with an ELF file header (duh.)
- Program and Section headers can be placed anywhere in the file and may even overlap other parts.
- Trailing zero bytes can be omitted.
Armed with that knowledge we thus set out to create the smallest 64-bit ELF shared object possible: a single ELF header. Thankfully, wikipedia lists all header formats in great detail.
We ended up with the following:
e_ident[EI_MAG] = 7f 45 4c 46 // magic number
e_ident[EI_CLASS] = 02 // 64-bit
e_ident[EI_DATA] = 01 // little endian
e_ident[EI_VERSION] = 01 // current version
e_ident[EI_OSABI] = 00 // system ABI
e_ident[EI_ABIVERSION] = 00 // ABI version
e_ident[EI_PAD] = 00 00 00 00 00 00 00 // 7b padding
e_type = 03 00 // ET_DYN = shared object
e_machine = 3e 00 // amd64
e_version = 01 00 00 00 // "current" ELF version
e_entry = 00 00 00 00 00 00 00 00 // entry point for executables
e_phoff = 00 00 00 00 00 00 00 00 // offset to program header table
e_shoff = 00 00 00 00 00 00 00 00 // offset to section header table
e_flags = 00 00 00 00 // "flags"?
e_ehsize = 40 00 // size if the file header (64 bytes for 64-bit)
e_phentsize = 38 00 // size of a program header (56 bytes for 64-bit)
e_phnum = 00 00 // no program headers
e_shentsize = 40 00 // size of a section header (64 bytes for 64-bit)
e_shnum = 00 00 // no section headers
e_shstrndx = 00 00 // index of section header that contains section names
All that is now left to do is modify this file until it is accepted by the loader and can spawn a shell. Luckily, the loader itself is quite helpful at that task, e.g., invoking it with the header-only object from above results in
$ LD_PRELOAD=./minimal.so /bin/true
ERROR: ld.so: object './minimal.so' from LD_PRELOAD cannot be preloaded (object file has no loadable segments): ignored.
This lets us know that we need to add a loadable segment (and thus a program header).
First however, we need a nice way to be able to add sections that can be automatically overlapped.
Letting Z3 do the dirty work
Looking at the header again, we can see that we have three types of fields:
- Fields that always have a fixed value, such as the magic number at the beginning
- Fields that depend on other structures, such as
e_phoff
- Fields that may have any value, such as
e_entry
This is something that can be modelled nicely using z3’s array logic.
elf = z3.Array('elf', z3.BitVecSort(8), z3.BitVecSort(8))
Creates a new Z3-Array (named elf
) of 8-bit values (i.e. bytes), where the indices are also 8-bit values (by the time we attempted to solve the task there were already solutions below 256 bytes, so 8-bit indices should suffice).
We can then create a new solver instance and add the constraints from above like so:
s = z3.SolverFor('QF_ABV')
s.add(elf[0] == 0x7f)
s.add(elf[1] == 0x45)
s.add(elf[2] == 0x4c)
s.add(elf[3] == 0x46)
What makes z3 arrays really powerful for this task is the fact, that array indices themselves can also be symbolic variables.
For example, if we want to ensure that e_phoff
(offset 0x20
in the elf header) points to some byte that contains the value 1
, we could create a new symbolic variable phoff
(as another 8-bit variable) and write
phoff = z3.BitVec('phoff', 8)
s.add(elf[0x20] == phoff)
s.add(elf[phoff] == 0x1)
Keeping track of used bytes
To be able to constrain not only single bytes, but also multi-byte fields (such as the various 8-byte pointers), we can abuse python’s slice notation. At the same time we can also introduce a symbolic variable to keep track of the required filesize:
class Golfer:
def __init__(self):
self.s = z3.SolverFor('QF_ABV')
self.elf = z3.Array('elf', z3.BitVecSort(8), z3.BitVecSort(8))
self.maxsize = z3.BitVec('maxsize', 8)
def add(self, *args):
self.s.add(*args)
def __getitem__(self, key):
if isinstance(key, slice):
return z3.Concat(*(self[key.start + i] for i in reversed(range(0, key.stop, key.step or 1))))
else:
self.add(z3.Or(self.elf[key] == 0, z3.ULT(key, self.maxsize)))
return self.elf[key]
Remember that trailing zero-bytes can be omitted.
Therefore we can add a constraint that maxsize
only has to be larger than the currently accessed byte if the byte itself is non-zero.
We also abuse the slice notation to mean <start>:<length>
rather than <start>:<end>
because it makes handling symbolic offsets easier.
With this we can write the minimal header-only object from above as
def file_header(self):
self.add(self[0x0:4] == int.from_bytes(b'\x7fELF', byteorder='little'))
self.add(self[0x4] == 2) # 64-bit
self.add(self[0x5] == 1) # little-endian
self.add(self[0x6] == 1) # "current" ELF version
self.add(self[0x7] == 0) # System-V-ABI
self.add(self[0x8] == 0) # EI_ABIVERSION
self.add(self[0x9:7] == 0) # EI_PAD
self.add(self[e_type:2] == 3) # ET_DYN
self.add(self[e_machine:2] == 0x3e) # AMD64
self.add(self[e_version:4] == 1) # "current" ELF version (again?)
self.add(self[e_phoff:8] == z3.ZeroExt(64 - 8, self.phoff))
self.add(self[e_phentsize:2] == self.phentsize)
self.add(self[e_phnum:2] == self.phnum)
Collecting Constraints
With this setup we could then “simply” modify our ELF (e.g. add a section or constrain another value) and see at which point the loader would complain. While the loader itself is already quite verbose, what helped at this stage was:
- adding
LD_DEBUG=all
increases output - obtaining a debug-build of the loader (archlinux users:
asp export glibc
, then modify thePKGBUILD
to includeoptions=(debug !strip)
, rebuild withmakepkg
, done) coredumpctl <pid> gdb
can automagically load a coredump into gdb
After spending quite a lot of time in a modify-crash-debug loop, here are the constraints we ended up with:
First, we need one loadable segment (type PT_LOAD
).
This segment must be marked RWX
, must be aligned to page boundaries, must fit in the file, and must include our shellcode.
Second, we need a dynamic segment (type PT_DYNAMIC
).
The virtual address of this segment must point to an array of dynamic tags (ELF64_Dyn
structs), its memorysize must fit the tag array, and its filesize must be non-zero.
Finally, we need four dynamic tags:
- a
DT_INIT
tag, that contains the address of our shellcode - a
DT_STRTAB
tag, that must point inside our file - a
DT_SYMTAB
tag, that must point inside out file - a final
DT_NULL
tag to mark the end of the array
We can add these as constraints like so:
def write_segments(self):
# first PT_LOAD segment
self.add(self[self.phoff + p_type:4] == 1) # PT_LOAD (loadable segment)
self.add(self[self.phoff + p_flags:4] & 0b111 == 0b111) # ensure RWX
self.add(self[self.phoff + p_offset:8] == 0) # load everything (=always aligned!)
self.add(self[self.phoff + p_vaddr:8] == 0) # otherwise _dl_map_segments shits its pants
self.add((self[self.phoff + p_offset:8] - self[self.phoff + p_vaddr:8]) & 0xfff == 0) # align
self.add(self[self.phoff + p_filesz:8] == z3.ZeroExt(64 - 8, self.maxsize)) # load at most all of this file
self.add(
self[self.phoff + p_memsz:8] == self[self.phoff + p_filesz:8]) # make virtual size at least physical size
self.add(self[self.phoff + p_align:8] == 0x1000) # page aligned
# second PT_DYNAMIC segment
self.add(self[self.phoff + self.phentsize + p_type:4] == 2) # PT_DYNAMIC (dynamic segment)
self.add(self[self.phoff + self.phentsize + p_vaddr:8] == z3.ZeroExt(64 - 8, self.ld))
self.add(self.ld != 0) # segment pointer must be non-zero
self.add(self[self.phoff + self.phentsize + p_memsz:8] == self.ldnum * self.dyn_sz)
self.add(self[self.phoff + self.phentsize + p_filesz:8] != 0) # filesz must be != 0
def write_ld(self):
self.add(z3.ULT(self.strtab, self.maxsize)) # ensure it's at least within our file
self.add(z3.ULT(self.symtab, self.maxsize)) # ensure it's at least within our file
tags = [
(DT_INIT, z3.ZeroExt(64 - 8, self.init)),
(DT_STRTAB, z3.ZeroExt(64 - 8, self.strtab)),
(DT_SYMTAB, z3.ZeroExt(64 - 8, self.symtab))
]
for i, (k, v) in enumerate(tags):
tag_offset = self.ld + (i * 0x10)
self.add(self[tag_offset + 0x0:8] == k)
self.add(self[tag_offset + 0x8:8] == v)
self.add(self[self.ld + len(tags) * 0x10: 8] == DT_NULL)
Shellcode
We now have a 64-bit ELF shared object that can be successfully loaded and even jumps to an instruction under our control (at symbolic location init
).
All that’s now left is to spawn a shell (with execve("/bin/sh", ["/bin/sh"], ...)
).
The shortest code we could come up with was
mov rdi, 0x68732f6e69622f
push 0
push rdi
mov rsi, rsp
xor eax, eax
mov al, 0x3b
syscall
Which we could add to our ELF as follows:
def write_shellcode(self):
shellcode = '''
mov rdi, 0x68732f6e69622f;
push 0;
push rdi;
mov rsi, rsp;
xor eax, eax;
mov al, 0x3b;
syscall;
'''.strip()
code = asm.asm(shellcode)
for j, b in enumerate(code):
self.add(self[self.init + j] == b)
However, for optimal golfing we would rather not have our shellcode as one giant, continuous blob, but rather as small chunks.
The first “optimization” in that regard is placing the /bin/sh\0
constant somewhere else in memory and using rip
-relative addressing (lea rdi, [rip+<something>]
).
For this we just need to create another symbolic address for the /bin/sh\0
string and then make sure we fix the offset our lea
instruction accordingly.
As a second step we can also insert jump instructions between our shellcode instructions to jump from one blob to another. Here, we can again leverage z3 to create a constraint that the after an instruction we either place the next shellcode instruction or a relative jump to the next instruction.
Together, we end up with the following:
def write_shellcode(self):
# place /bin/sh into memory
for i, x in enumerate(b'/bin/sh\0'):
self.add(self[self.binsh + i] == x)
shellcode = '''
lea rdi, [rip+0x0c0c0c0c];
push 0;
push rdi;
mov rsi, rsp;
xor eax, eax;
mov al, 0x3b;
syscall;
'''.strip().splitlines()
# create array of instruction locations
# first instruction must be at `init`, later instructions can be wherever
ins_loc = [self.init] + [z3.BitVec('ins_%02d' % (j + 1), 8) for j in range(len(shellcode) - 1)]
for j, ins in enumerate(shellcode):
code = asm.asm(ins)
loc = ins_loc[j]
fixed = False
for i, x in enumerate(code):
if x != 0x0c:
self.add(self[loc + i] == x)
else:
if not fixed:
self.add(self[loc + i:4] == z3.SignExt(32 - 8, self.binsh - loc - (i + 4)))
fixed = True
# if there is a next instruction, make sure it
# follows immediately *or* is jumped to
if j + 1 < len(ins_loc):
loc_after = loc + len(code)
next_loc = ins_loc[j + 1]
self.add(z3.Or(
self[loc_after:2] == z3.Concat(next_loc - (loc_after + 2), z3.BitVecVal(0xeb, 8)), # have a jump
loc_after == next_loc # be the next instruction
))
We used 0x0c
as a marker byte to fix-up the lea
instruction, and hardcoded the 0xeb
for relative jumps.
Computing the smallest ELF
With all constraints set up we now only needed a way to minimize the total filesize.
However, as we already had a symbolic variable for that (maxsize
), we could simply check whether z3 could find a satisfying solution with maxsize < X
and decrease X
until this was no longer true.
With this we ended up with a 163 byte solution, which was good enough to give us both flags (albeit a few days too late to score them):
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 │·ELF│····│····│····│
00000010 03 00 3e 00 01 00 00 00 0f 05 01 00 00 00 0f 05 │··>·│····│····│····│
00000020 1a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 │····│····│····│····│
00000030 00 00 31 c0 eb 20 38 00 02 00 a3 00 00 00 00 00 │··1·│· 8·│····│····│
00000040 00 00 a3 00 00 00 00 00 00 00 00 10 00 00 00 00 │····│····│····│····│
00000050 00 00 02 00 00 00 b0 3b eb be 2f 62 69 6e 2f 73 │····│···;│··/b│in/s│
00000060 68 00 82 00 00 00 00 00 00 00 48 8d 3d e9 ff ff │h···│····│··H·│=···│
00000070 ff 6a 00 57 48 89 e6 eb b9 40 30 00 00 00 00 00 │·j·W│H···│·@0·│····│
00000080 00 00 0c 00 00 00 00 00 00 00 6a 00 00 00 00 00 │····│····│··j·│····│
00000090 00 00 05 00 00 00 00 00 00 00 20 00 00 00 00 00 │····│····│·· ·│····│
000000a0 00 00 06 │···│
Full script
Here is the full script for completeness:
import z3
from pwnlib import *
from pwnlib.util import fiddling
context.context(arch='amd64', os='linux')
# elf header
e_ident = 0x0
e_type = 0x10
e_machine = 0x12
e_version = 0x14
e_entry = 0x18
e_phoff = 0x20
e_shoff = 0x28
e_flags = 0x30
e_ehsize = 0x34
e_phentsize = 0x36
e_phnum = 0x38
e_shentsize = 0x3a
e_shnum = 0x3c
e_shstrndx = 0x3e
# program header
p_type = 0x00
p_flags = 0x04
p_offset = 0x08
p_vaddr = 0x10
p_paddr = 0x18
p_filesz = 0x20
p_memsz = 0x28
p_align = 0x30
DT_NULL = 0
DT_STRTAB = 0x5
DT_SYMTAB = 0x6
DT_INIT = 0xc
class Golfer:
def __init__(self):
self.s = z3.SolverFor('QF_ABV')
# there are 136 byte entries => 8 bit indices suffice
self.ELF = z3.Array('elf', z3.BitVecSort(8), z3.BitVecSort(8))
self.maxsize = z3.BitVec('maxsize', 8)
self.phoff = z3.BitVec('phoff', 8)
self.shoff = z3.BitVec('shoff', 8)
self.flags = z3.BitVec('flag', 4 * 8)
self.phentsize = 0x38 # 64-bit program header
self.phnum = 2
self.shentsize = 0x40 # 64-bit section header
self.shnum = 0
self.shstrndx = z3.BitVec('shstrndx', 8)
self.init = z3.BitVec('init', 8)
self.code_size = z3.BitVec('codesize', 8)
self.ld = z3.BitVec('ld', 8)
self.ldnum = 3
self.dyn_sz = 16
self.strtab = z3.BitVec('strtab', 8)
self.symtab = z3.BitVec('symtab', 8)
self.binsh = z3.BitVec('binsh', 8)
self.file_header()
self.write_segments()
self.write_ld()
self.write_shellcode()
def add(self, *args):
self.s.add(*args)
def __getitem__(self, key):
if isinstance(key, slice):
return z3.Concat(*(self[key.start + i] for i in reversed(range(0, key.stop, key.step or 1))))
else:
self.add(z3.Or(self.ELF[key] == 0, z3.ULT(key, self.maxsize)))
return self.ELF[key]
def file_header(self):
self.add(self[0x0:4] == int.from_bytes(b'\x7fELF', byteorder='little'))
self.add(self[0x4] == 2) # 64-bit
self.add(self[0x5] == 1) # little-endian
self.add(self[0x6] == 1) # "current" ELF version
self.add(self[0x7] == 0) # System-V-ABI
self.add(self[0x8] == 0) # EI_ABIVERSION
self.add(self[0x9:7] == 0) # EI_PAD
self.add(self[e_type:2] == 3) # ET_DYN
self.add(self[e_machine:2] == 0x3e) # AMD64
self.add(self[e_version:4] == 1) # "current" ELF version (again?)
self.add(self[e_phoff:8] == z3.ZeroExt(64 - 8, self.phoff))
self.add(self[e_phentsize:2] == self.phentsize)
self.add(self[e_phnum:2] == self.phnum)
def write_segments(self):
# first PT_LOAD segment
self.add(self[self.phoff + p_type:4] == 1) # PT_LOAD (loadable segment)
self.add(self[self.phoff + p_flags:4] & 0b111 == 0b111) # ensure RWX
self.add(self[self.phoff + p_offset:8] == 0) # load everything (=always aligned!)
self.add(self[self.phoff + p_vaddr:8] == 0) # otherwise _dl_map_segments shits its pants
self.add((self[self.phoff + p_offset:8] - self[self.phoff + p_vaddr:8]) & 0xfff == 0) # align
self.add(self[self.phoff + p_filesz:8] == z3.ZeroExt(64 - 8, self.maxsize)) # load at most all of this file
self.add(
self[self.phoff + p_memsz:8] == self[self.phoff + p_filesz:8]) # make virtual size at least physical size
self.add(self[self.phoff + p_align:8] == 0x1000) # page aligned
# second PT_DYNAMIC segment
self.add(self[self.phoff + self.phentsize + p_type:4] == 2) # PT_DYNAMIC (dynamic segment)
self.add(self[self.phoff + self.phentsize + p_vaddr:8] == z3.ZeroExt(64 - 8, self.ld))
self.add(self.ld != 0) # segment pointer must be non-zero
self.add(self[self.phoff + self.phentsize + p_memsz:8] == self.ldnum * self.dyn_sz)
self.add(self[self.phoff + self.phentsize + p_filesz:8] != 0) # filesz must be != 0
def write_ld(self):
self.add(z3.ULT(self.strtab, self.maxsize)) # ensure it's at least within our file
self.add(z3.ULT(self.symtab, self.maxsize)) # ensure it's at least within our file
tags = [
(DT_INIT, z3.ZeroExt(64 - 8, self.init)),
(DT_STRTAB, z3.ZeroExt(64 - 8, self.strtab)),
(DT_SYMTAB, z3.ZeroExt(64 - 8, self.symtab))
]
for i, (k, v) in enumerate(tags):
tag_offset = self.ld + (i * 0x10)
self.add(self[tag_offset + 0x0:8] == k)
self.add(self[tag_offset + 0x8:8] == v)
self.add(self[self.ld + len(tags) * 0x10: 8] == DT_NULL)
def write_shellcode(self):
# place /bin/sh into memory
for i, x in enumerate(b'/bin/sh\0'):
self.add(self[self.binsh + i] == x)
shellcode = '''
lea rdi, [rip+0x0c0c0c0c];
push 0;
push rdi;
mov rsi, rsp;
xor eax, eax;
mov al, 0x3b;
syscall;
'''.strip().splitlines()
# create array of instruction locations
# first instruction must be at `init`, later instructions can be wherever
ins_loc = [self.init] + [z3.BitVec('ins_%02d' % (j + 1), 8) for j in range(len(shellcode) - 1)]
for j, ins in enumerate(shellcode):
code = asm.asm(ins)
loc = ins_loc[j]
fixed = False
for i, x in enumerate(code):
if x != 0x0c:
self.add(self[loc + i] == x)
else:
if not fixed:
self.add(self[loc + i:4] == z3.SignExt(32 - 8, self.binsh - loc - (i + 4)))
fixed = True
# if there is a next instruction, make sure it
# follows immediately *or* is jumped to
if j + 1 < len(ins_loc):
loc_after = loc + len(code)
next_loc = ins_loc[j + 1]
self.add(z3.Or(
self[loc_after:2] == z3.Concat(next_loc - (loc_after + 2), z3.BitVecVal(0xeb, 8)), # have a jump
loc_after == next_loc # be the next instruction
))
def get_output(self, concrete_max_size=255):
self.s.push()
self.s.add(self.maxsize == concrete_max_size)
if self.s.check() != z3.sat:
self.s.pop()
raise ValueError()
self.s.pop()
m = self.s.model()
info = {}
info['maxsize'] = m.eval(self.maxsize).as_long()
info['phoff'] = m.eval(self.phoff).as_long()
info['init'] = m.eval(self.init).as_long()
info['ld'] = m.eval(self.ld).as_long()
info['strtab'] = m.eval(self.strtab).as_long()
info['symtab'] = m.eval(self.symtab).as_long()
info['binsh'] = m.eval(self.binsh).as_long()
o = bytes(m.eval(self.ELF[i]).as_long() for i in range(concrete_max_size))
return o, info
def main():
g = Golfer()
lo = 1
hi = 256
elf = None
info = None
while lo < hi:
m = (lo + hi) // 2
try:
elf, info = g.get_output(m)
hi = m
except ValueError:
lo = m + 1
if elf:
print(f"Got {lo} bytes:")
print('maxsize = %d' % info['maxsize'])
print('phoff @ 0x%02x' % info['phoff'])
print('init @ 0x%02x' % info['init'])
print('ld @ 0x%02x' % info['ld'])
print('strtab @ 0x%02x' % info['strtab'])
print('symtab @ 0x%02x' % info['symtab'])
print('binsh @ 0x%02x' % info['binsh'])
print(fiddling.hexdump(elf))
with open('gen.so', 'wb') as f:
f.write(elf)
else:
print("I'm sorry")
if __name__ == '__main__':
main()
Conclusion
Super fun golfing challenge that I now regret to not have tackled during the CTF! Also, I’m still super curious how others managed to get 136 byte solutions…