Rouge Saumon
The “Rouge Saumon” challenge is about making sense of a radio dump, which is already convered to binary data for us. It was presented in the France CyberSecurity Challenge 2023, in the Misc category.
Challenge details
You have intercepted a communication sent by a satellite from deep space, and have decoded its bitstream. Curious, you decide to investigate the content of this communication. You know that up there, cosmic rays can disrupt the signal: you therefore expect the communication protocol to be at least robust and resilient to noise and frame desynchronization!
We are given a file sat_data.bin
, containing the data dump.
Solution
First, let’s look into the data dump :
>>> with open('sat_data.bin', 'rb') as f:
... data=f.read()
...
>>> data[:2000]
b'HDR\x00\x05\x00\x00\x00\xff\x00\x00\x00la n\x1fn \xabempus aliquam, nun\xa6 turpis ullamcorper ni
bh, in tempus sapien eros vitae lig\x03la. Pellentesque rhoncus nunc et augue. I\xe3teger id felis.
Curabitur aliquet pellentesque diam. Integer quis metus vitae elit\xe0l\x0b\xb8\xbf\x84\x8a~q\xa9!
\xf6*\xd6\xd7[\x94\xc3\x10@)\x9051\xbe\x0e`D8V\xe7\x9eL|HDR\x00\x03\x00\x00\x00\xff\x00\x00\x00raes
ent blandit odio^eu enim. Pellentesque sed dui ut augue blandit sodales. Vestibulum ante ipsum prim
is in fauc\xb4bus orci luctus et ultrices posuere cu\x0cilia Curae; Aliquam nibh. Mauris ac ma\xc6r
is sed pede pellen\x83o\x08\xeb\xae\xd1H\xab\xef=\xe61I\x80x\xde\xb5\xc8\xe6<\xf9\x81z\xb3\xc9!\xe7
\x10\xab\x99\x1a\x93HDR\x00\x04\x00\x00\x00\xff\x00\x00\x00tesque fermentum. Maecenas adipiscing an
te no( diam \x19odales hendrerit. Ut velit mauris, egestas sed, gravida nec\x14 o\xf5nare ut, mi. A
enean ut orci vel massa suscipit pu\xecvinar. Nulla soll\xdecitudiA. Fus\xa2e varius, lig\xa0d8\x03
\xba\xc5:*\xcb\x89:\x08\xa9UY9\xc1\xc8)K\xb1\x87S\xf2!\xeba\xa4\xc8\x04\x19\xe6\xbfHDR\x00\x02\x00\
x00\x00\xff\x00\x00\x00. D\x8cis)sempe\x87. Duis arcu massa, scelerisque5vitae, consfquat in, preti
um a, enim. Pellentesque co!gue. Ut in risu\x06 volutpat libe\xb7o ph\x9fxetra tempor. Cras vestibu
lum bibendum augue. Praesent egestab\xe6leo in pede. P\xcf\xcc\xee\xb9\xd2\xdd%+GwbZ\x85\x9f\x02h\x
b1\x84\x86\xed\xe6\x9cg\xc5\x14\xf5\xb5<y39VHDR\x00\x01\x00\x00\x00\xff\x00\x00\x00 sed, dolor. Cra
s elementum ultrices \xcdism. Maec2nas ligula 9as\xa2a, varius a, semper congue, \x93uismod non, li
. Proin porttitor, orci nec nonummy molestie, enim est elei(end \xb6i, non f\xc3rmentum diam nisl s
it amet erat\xa8L\xe5V\xa9\xbb\x85\x08>\xc5?\xb8\x86\xa2\xac\x0f\x05\xb5B\x01\x01\xda\x07Yn\xc6l\xb
a\xcd\x07_\x8bHDR\x00\x00\x00\x00\x00\xff\x00\x00\x00The-Red-Salmon-Code-Corrects-LOREM-and-BMP_ima
ge(32 bytes)/BEGIN==>Lorem ipsum do\x0for sit amet, co\'sNctetur adipiscing elit. Sed non risu\xc9.
ruspendi\xb3se lectus torto[, dignissim sit amet\xe9 adipis\xfbing nec,O\x16ltri\xebies\x85\xf3\x1
d\x83\x07\xa5\xed\xd3\xb8?6r\xb3\xb1(\x85\xaa\x1a\xda\x1b\xf1,~\xb3\xc8\xebW\xab\xd7<a\xc0HDR\x00\x
06\x00\x00\x00\xff\x00\x00\x00obo\xabtis egestas. Lorem ipsum d\xd3lor sit amet, consectetuer adip\
x12scing elit.\xe6Morbi vel erat non mauris convallis vehicula. Nulla et sapien. mnteger Uortor tel
6us, aliquam\x1cfaucibus, convall\xa5s id, congue \xe7u, quam.\xb9M\x1d\xdd\xae\xe5nT\xf1\xa1\x9a\x
91h">\xa1\xb3A\x8c\xc8\x1c\xd0\x18\xd7\x0eFZ59.\x13\x1c"nHDR\x00\x07\x00\x00\x00\xff\x00\x00\x00aur
is ullam\x1borper kelis vitae e\x91at. Proin feu\xc9iat, augue non elementum posuere, \\et\x99s pur
us iaculis lectus, et tristique ligula justo vitae magna. AlLquam\xb8convallis sollicitudin purus.
Praesent aliquam, '
Okay, that was a long blob, sorry. But you might see some kind of structure: some parts are close to being perfectly readable (latin) text, some parts are just meaningless bytes.
Also, we have this string, HDR\x00
, repeated every 255 bytes.
The challenge implies that “you therefore expect the communication protocol to be at least robust and resilient to noise”, so I suspect there is a kind of Error Correction Code in place.
So I start Googling things like HDR Error Correcting Code
, HDR red salmon error correction code
, red salmon error correcting code
(note : Rouge Saumon
is French and translates to Red Salmon
in English).
While the HDR
stuff does not give promising results, I see that this acronym is used elsewhere as High Data Rate
. More importantly, Google is smart and points me towards the Reed-Solomon error correction technique, which (by luck) was close enough to my Red salmon ecc
search.
However, if you pay close attention to the above dump, this code is also mentionned(ish) in the “readable” text portion (“The-Red-Salmon-Code-Corrects-LOREM-and-BMP_image(32 bytes)”).
So, we are dealing with Reed-Solomon error correcting code (with 32 added bytes), and the frames might be 255 bytes. Luckily, someone already did the job of implementing Reed-Solomon decoding in Python.
Let’s try to decode the first few frames :
from reedsolo import RSCodec, ReedSolomonError
with open('sat_data.bin', 'rb') as f:
raw_data = f.read()
frames = []
for i in range(0, len(raw_data), 255):
rsc = RSCodec(32)
frames.append(bytes(rsc.decode(raw_data[i:i+255])[0])
for f in frames[:5]:
print(f)
Which outputs:
b'HDR\x00\x05\x00\x00\x00\xff\x00\x00\x00la non tempus aliquam, nunc turpis ullamcorper nibh, in tempus sapien eros vitae ligula. Pellentesque rhoncus nunc et augue. Integer id felis. Curabitur aliquet pellentesque diam. Integer quis metus vitae elit l'
b'HDR\x00\x03\x00\x00\x00\xff\x00\x00\x00raesent blandit odio eu enim. Pellentesque sed dui ut augue blandit sodales. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam nibh. Mauris ac mauris sed pede pellen'
b'HDR\x00\x04\x00\x00\x00\xff\x00\x00\x00tesque fermentum. Maecenas adipiscing ante non diam sodales hendrerit. Ut velit mauris, egestas sed, gravida nec, ornare ut, mi. Aenean ut orci vel massa suscipit pulvinar. Nulla sollicitudin. Fusce varius, ligu'
b'HDR\x00\x02\x00\x00\x00\xff\x00\x00\x00. Duis semper. Duis arcu massa, scelerisque vitae, consequat in, pretium a, enim. Pellentesque congue. Ut in risus volutpat libero pharetra tempor. Cras vestibulum bibendum augue. Praesent egestas leo in pede. P'
b'HDR\x00\x01\x00\x00\x00\xff\x00\x00\x00 sed, dolor. Cras elementum ultrices diam. Maecenas ligula massa, varius a, semper congue, euismod non, mi. Proin porttitor, orci nec nonummy molestie, enim est eleifend mi, non fermentum diam nisl sit amet erat'
Hehe, looks like a good first step in the right direction ! The text frames are all individually perfectly readable, but they do not match when put all on the same line. But, remember the challenge description ? “you therefore expect the communication protocol to be at least robust and resilient to […] frame desynchronization!”
Every frame starts with the HDR
magic bytes, and then 9 bytes.
If we go through every frame, we can see that not much changes between there headers, except the 2nd and 3rd bytes, which roughly grow along with each frame. One reasonable guess would be that this is the ID of the frame, and this guess is confirmed by the readable text, which suddenly makes perfect (latin) sense when the frames are ordered this way. One more reasonable guess would be that the \xff
byte represents length. And one meta-guess would be that the full format of the header is actually :
HDR\x00 ( 4 bytes ) | Frame ID ( 4 bytes, little-endian ) | Frame length ( 4 bytes, little-endian )
At this point, I would like to underline that while I used the word guess extensively in the previous paragraph, these are all reasonable assumptions which one might even call observations. I don’t mean guess in the guessy CTF challenge way.
Putting the frames back together and dumping the extracted content, we get some Latin text, then a bunch of ASCII zeroes (to pad a frame), and then random-looking bytes. According to binwalk
, these bytes represent a Bitmap (BMP) image, which we can extract using said tool.
We do as demanded, and the final flag is FCSC{8a65bcd7ad3f1dcec4f22106b636f86ec98a4e14fdb9b5b8de87fafdab11386f}