Writeup by leraton for RSA Secure Dev 1/3

Table of contents

Introduction

We are asked to write an assembly code that compute the RSA signature of a message.

According to the documentation, we know that:

a message will be placed in register R5.
a prime number $p$ will be placed in register R6.
a prime number $q$ will be placed in register R7.
a number $i_q$ will be placed in register R8: $i_q = q^{-1} \pmod {p}$.
a number $dP$ will be placed in register R9: $dP = e^{-1} \pmod {p - 1}$.
a number $dQ$ will be placed in register RA: $dQ = e^{-1} \pmod {q - 1}$.
a number $e$ will be placed in register RB.
a number $d$ will be placed in register RC: $d = e^{-1} \pmod {(p-1)\times(q-1)}$.

A basic implementation

The RSA signature C of a message M can be computed with the following formula:

$C = M^d \pmod {n}$, with $n=p\times q$

This formula is then be implemented in assembly:

MUL R1, R6, R7  ; Store n = p*q in R1
MOV RD, R1      ; Move n in RD for modulo
POW R0, R5      ; R0 = R5**RC%RD = M**d%n
STP

This code can be compiled:

$ python assembly.py RSA1.asm
4ff1001D03501400

And tested on the machine:

$ nc localhost 4000
Enter your bytecode in hexadecimal:
>>> 4ff1001D03501400
Which flag do you want to grab?
  0. Quit.
  1. Easy flag   - check for code correctness and performances.
  2. Medium flag - check resistance against several fault attacks, d not given.
  3. Hard flag   - check resistance against more fault attacks, not e and not d given.
>>> 1
[+] Testing correctness...
[+] Correct!
[+] Testing performances against the reference solution...
[*] Reference performances: 1394304.00 ns
[*] User performance:       3258956.04 ns
[*] Ratio:                     2.34
[!] Error: too bad performances
Please check your inputs.

As we can see, the code is correct however it is too slow…

A quicker algorithm

After some search, I found a more efficient implementation that utilise the Chinese Remainder Theorem (CRT).
Fortunately, this algorithm utilize already defined variables:

$m_1 = c^{\:dP} \pmod{p}$
$m_2 = c^{\:dQ} \pmod{q}$
$h = i_q \times (m_1 - m_2) \pmod{p}$
$m = m_2 + h \times q$

This algorithm can then be rewriten in assembly

; m2 = c**dQ mod q
MOV RD, R7
MOV RC, RA
POW R2, R5
; m1 = c**dP mod p
MOV RD, R6
MOV RC, R9
POW R1, R5  
;  h = iq * (m1 - m2) mod p
SUB R3, R1, R2
MOV R4, R8
MUL R3, R3, R4
MOD R3, R3
; h*q
MUL R3, R3, R7
; m = m2 + h*q
MOV R0, R2
ADD R0, R0, R3

STP

Compiled

$ python assembly.py RSA_CRT.asm
007D00AC0352006D009C03514c8b00844f1b02334fdb00204ac01400

And tested on the machine:

$ nc localhost 4000
Enter your bytecode in hexadecimal:
>>> 007D00AC0352006D009C03514c8b00844f1b02334fdb00204ac01400
Which flag do you want to grab?
  0. Quit.
  1. Easy flag   - check for code correctness and performances.
  2. Medium flag - check resistance against several fault attacks, d not given.
  3. Hard flag   - check resistance against more fault attacks, not e and not d given.
>>> 1
[+] Testing correctness...
[+] Correct!
[+] Testing performances against the reference solution...
[*] Reference performances: 1389010.49 ns
[*] User performance:       1138902.10 ns
[*] Ratio:                     0.82
[+] Congrats! Here is the easy flag: FCSC{06de1084d295f2a016f79485f2a47744bfb9ed3e30e9312121665559df9447de}

And we’ve got the flag !