Resources
- https://en.wikipedia.org/wiki/Data_Plane_Development_Kit - Wikipedia entry for DPDK.
- https://core.dpdk.org/doc/ - DPDK documentation.
Solution
The flag is sent over the network in an ICMP packet to an unregistered MAC address. Run babydpdk
to register a DPDK userspace driver with that MAC address and process the packets. Debug the executable with GDB to dump the flag when an ICMP packet is a received.
$ gdb --args /home/user/babydpdk -l 0 --no-huge --allow 0000:01:10.0
(gdb) b *0x00005555555564ba
(gdb) r
(gdb) x/s $rdi+0x12a
0x103f9d16a: "FCSC{50fcce15afa5b2692f1ab91e8803d817f3dc72ec9e140da03605a4123b952862}"
Detailed solution
First let’s extract the archive and find what we are working with.
$ tar xvf babydpdk.tar.xz && cd generated && tree
.
βββ babydpdk
βββ babydpdk.c
βββ buildroot
βΒ Β βββ bzImage
βΒ Β βββ rootfs.ext2
βββ docker-compose.yml
βββ Dockerfile
βββ flag
βββ qemu
βΒ Β βββ bios-256k.bin
βΒ Β βββ efi-e1000.rom
βΒ Β βββ kvmvapic.bin
βΒ Β βββ linuxboot_dma.bin
βΒ Β βββ qemu-system-x86_64
βββ run.sh
3 directories, 13 files
To understand the challenge setup, we take a look at the Docker files first.
$ cat docker-compose.yml
services:
baby-dpdk:
build: .
ports:
- "4000:4000"
environment:
- FLAG=FCSC{deadbeef}
$ cat Dockerfile
[...]
WORKDIR /app
EXPOSE 4000
USER ctf
CMD ["socat", "tcp-l:4000,reuseaddr,fork", "EXEC:\"/app/run.sh\",pty,stderr"]
The Docker container is running /app/run.sh
which we can interact with by connecting to port 4000. The run.sh
script contains only the following command, which starts a full system emulation with QEMU. It is running a Linux kernel image, with the rootfs.ext2
filesystem mounted.
There is also an unusually long list of hardware-related arguments. I don’t know enough about networking to explain every detail, but I’m guessing that it is probably setup this way to enable specific network features required for DPDK to work correctly.
-device pcie-root-port,slot=1,id=pcie.1
- Adds a PCIe port to the system.-device igb,bus=pcie.1,netdev=n,multifunction=on
- Connects an Intel IGB Network Interface Card to the PCIe port.-device intel-iommu,intremap=on,caching-mode=on
- Adds a virtual Intel IOMMU to the system, with interrupt remapping and IOMMU page caching enabled.
Finally, the -netdev stream,id=n,addr.type=fd,addr.str=%d
configures file descriptor based networking using the “stream” backend. This allows sending network packet to the machine via a file descriptor. %d
will be replaced at runtime before the QEMU command is executed.
##!/bin/bash
/app/flag /app/qemu-system-x86_64 \
-M q35 \
-m 256m \
-cpu Westmere \
-bios /app/bios-256k.bin \
-kernel bzImage \
-snapshot \
-drive file=rootfs.ext2,if=virtio,format=raw \
-append "rootwait root=/dev/vda console=tty1 console=ttyS0 intel_iommu=on no_timer_check" \
-nodefaults \
-device pcie-root-port,slot=1,id=pcie.1 \
-device igb,bus=pcie.1,netdev=n,multifunction=on \
-netdev stream,id=n,addr.type=fd,addr.str=%d \
-vga none \
-nographic \
-serial stdio \
-monitor none \
-device intel-iommu,intremap=on,caching-mode=on
However, notice that /app/qemu-system-x86_64
is not launched directly, but it is wrapped by /app/flag
. Before starting the machine, we can first open the flag
executable in Ghidra or IDA to understand what it is doing.
undefined8 main(int argc,char **argv) {
__pid_t fd;
int sv;
undefined4 local_10;
int r;
r = socketpair(1,1,0,&sv);
if (r != 0) {
err(1,"socketpair failed");
}
fd = fork();
if (fd == -1) {
err(1,"fork failed");
}
else if (fd != 0) {
fixup_fd(argv + 1,local_10);
execv(argv[1],argv + 1);
err(1,"execv failed");
return 0;
}
interact(sv);
return 0;
}
The program forks into two processes.
- The first process calls
fixup_fd
to replace%d
by the correct file descriptor (“fix fd for wiring packets to qemu instance”), then runs the command given by command line arguments (in this case/app/qemu-system-x86_64 [...]
). - The other process runs the
interact
function which callssend_flag
in an infinite loop.
void interact(int fd) {
ssize_t n;
undefined1 buffer [1504];
pollfd pollfd;
int r;
do {
while( true ) {
while( true ) {
pollfd.events = 1;
pollfd.revents = 0;
pollfd.fd = fd;
r = poll(&pollfd,1,1000);
if (r != 0) break;
send_flag(fd);
}
if (r != 1) break;
n = read(fd,buffer,1504);
r = (int)n;
}
perror("poll failed");
} while( true );
}
After renaming some variables in the send_flag
function, we understand that it is simply crafting and sending a network packet containing the flag to the QEMU machine. More precisely:
- The packet an ICMP packet containing the flag in the data field.
- The ICMP packet is encapsulated by a TCP/IPv4 header, specifying source address
198.0.0.254
and destination address198.0.0.1
. - The TCP/IP packet is encapsulated by an Ethernet header, specifying destination address
02:de:c0:ed:00:01
.
void send_flag(undefined4 param_1) {
in_addr_t src_addr;
in_addr_t dst_addr;
size_t flag_len;
uint32_t total_len;
iovec iovecs [5];
byte icmp_payload [8];
byte ip_payload [20];
byte eth_payload [14];
char *flag;
uint16_t id_ip;
long n;
dst_addr = inet_addr("198.0.0.1");
src_addr = inet_addr("198.0.0.254");
flag = getenv("FLAG");
if (flag == (char *)0x0) {
flag = "FCSC{ceci est un faux flag}";
}
iovecs[0].iov_base = &total_len;
iovecs[0].iov_len = 4;
iovecs[1].iov_base = eth_payload;
iovecs[1].iov_len = 14;
// Destination MAC address 02:de:c0:ed:00:01
eth_payload[0] = 2;
eth_payload[1] = 0xde;
eth_payload[2] = 0xc0;
eth_payload[3] = 0xed;
eth_payload[4] = 0;
eth_payload[5] = 1;
memset(eth_payload + 6,0,6); // Source MAC address 00:00:00:00:00:00
eth_payload._12_2_ = htons(0x800); // Type: IPV4
iovecs[2].iov_base = ip_payload;
iovecs[2].iov_len = 20;
ip_payload[0] = 0x45; // Version: 4
ip_payload[1] = 0;
flag_len = strlen(flag);
ip_payload._2_2_ = htons((short)flag_len + 0x1c);
id_ip = id_ip.0;
id_ip.0 = id_ip.0 + 1;
ip_payload._4_2_ = htons(id_ip); // Unique identification
ip_payload[6] = 0;
ip_payload[7] = 0;
ip_payload[8] = 1;
ip_payload[9] = 1; // Protocol: ICMP (1)
iovecs[3].iov_base = icmp_payload;
iovecs[3].iov_len = 8;
icmp_payload[0] = 8; // Type: 8 (Echo (ping) request)
icmp_payload[1] = 0;
icmp_payload[4] = 0;
icmp_payload[5] = 0;
icmp_payload[6] = 0;
icmp_payload[7] = 0;
icmp_payload[2] = 0;
icmp_payload[3] = 0;
// ICMP data
iovecs[4].iov_base = flag;
ip_payload._12_4_ = src_addr;
ip_payload._16_4_ = dst_addr;
iovecs[4].iov_len = strlen(flag);
n = iovecs[4].iov_len + 42;
total_len = htonl((uint32_t)n);
do_writev(param_1,iovecs,5,n + iovecs[0].iov_len);
return;
}
Now that we understand the setup, the goal is to find a way to obtain the flag sent over the network. Let’s connect to the QEMU machine (nc localhost 4000
) and check the network setup. There is no network interface configured to receive the packet on MAC address 02:de:c0:ed:00:01
. Running tcpdump
will therefore not show anything because the network packets sent to the machine are simply not processed at all.
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host proto kernel_lo
valid_lft forever preferred_lft forever
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/sit 0.0.0.0 brd 0.0.0.0
In the home directory on the filesystem, we find two files. An ELF executable babydpdk
and a bash script that runs it with /home/user/babydpdk -l 0 --no-huge --allow 0000:01:10.0
. When we run the script, we get the following output.
EAL: Detected CPU lcores: 1
EAL: Detected NUMA nodes: 1
EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /tmp/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: Using IOMMU type 1 (Type 1)
Received and processed an ICMP echo request
Received and processed an ICMP echo request
Received and processed an ICMP echo request
[...]
It looks like the program is able to receive and process ICMP echo requests containing the flag.
DPDK stands for Data Plane Development Kit, “an open source software project providing a set of data plane libraries and network interface controller polling-mode drivers for offloading TCP packet processing from the operating system kernel to processes running in user space”. This matches what we just observed, the program seems to handle network packets processing from userspace, without root privileges.
The command line arguments are -l 0
to make the first core available for DPDK, --no-huge
to use anonymous memory instead of hugepages, and --allow 0000:01:10.0
to specify the PCI device to probe. Running lspci
, we find that this ID corresponds to “Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)”.
At this point there are easy ways to access the content of the ICMP packets without trying to understand how this is working, but we are also given the C code of babydpdk
, so let’s read it before continuing. I added a additional comments to clarify the code.
##define NUM_MBUFS 64
##define MBUF_CACHE_SIZE 32
volatile int quit = 0;
static void on_sig(int sig) {
quit = 1;
}
/* Initialization of Environment Abstraction Layer (EAL). 8< */
int main(int argc, char **argv) {
int ret;
unsigned lcore_id;
struct rte_mempool *mbuf_pool;
unsigned nb_ports;
uint16_t portid;
// Initialize the Run Time Environment.
// This includes memory setup, command line arguments processing, enumeration of available PCI devices, etc.
ret = rte_eal_init(argc, argv);
if (ret < 0) {
rte_panic("Cannot init EAL\n");
}
argc -= ret;
argv += ret;
// Add a signal handler to stop the program on SIGINT or SIGTERM.
signal(SIGINT, on_sig);
signal(SIGTERM, on_sig);
// Ensure exactly one Ethernet devices is available.
nb_ports = rte_eth_dev_count_avail();
if (nb_ports != 1) {
rte_panic("Must use only one single port\n");
}
// Initialize a memory pool for packet processing.
mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS*nb_ports,
MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
if (mbuf_pool == NULL) {
rte_panic("Cannot create mbuf pool\n");
}
// Call port_init for the found ethdev port.
RTE_ETH_FOREACH_DEV(portid) {
if (port_init(portid, mbuf_pool) != 0) {
rte_panic("Cannot initialize port id=%d\n", portid);
}
}
// Ensure the system only has one logical core.
if (rte_lcore_count() > 1) {
rte_panic("Must use only one single logical core\n");
}
// Starts packet processing on the logical core.
lcore_main(0);
// Wait until the logical lore finish its job.
rte_eal_mp_wait_lcore();
/* clean up the EAL */
rte_eal_cleanup();
return 0;
}
The port_init
function is the most important part of the DPDK setup. It reads information from the ethdev port, configures the targeted Ethernet device, sets up the queue sizes for incoming and outgoing packets and assigns the MAC address 02:de:c0:ed:00:01
to the device. Finally, it starts the device which enables packet reception and transmission.
The MAC address does correspond to the destination address that the flag
executable is sending network packets to.
##define RX_RING_ENTRIES 32
##define TX_RING_ENTRIES 32
static inline int port_init(uint16_t portid, struct rte_mempool *mbuf_pool) {
struct rte_eth_dev_info infos;
struct rte_eth_conf port_conf;
const uint16_t rx_rings = 1;
const uint16_t tx_rings = 1;
int ret;
uint16_t nb_rxd = RX_RING_ENTRIES;
uint16_t nb_txd = TX_RING_ENTRIES;
uint16_t qid;
// MAC address 02:de:c0:ed:00:01
const struct rte_ether_addr self_mac = { .addr_bytes = {0x02, 0xde, 0xc0, 0xed, 0, 1}};
if (!rte_eth_dev_is_valid_port(portid)) return -1;
memset(&port_conf, 0, sizeof(port_conf));
// Get information about the ethdev port.
ret = rte_eth_dev_info_get(portid, &infos);
if (ret != 0) {
printf("Error during getting device port id=%d infos\n", portid);
return ret;
}
// Configure the Ethernet device.
ret = rte_eth_dev_configure(portid, rx_rings, tx_rings, &port_conf);
if (ret != 0) {
return ret;
}
// Adjust the queue size for sending and receiving Ethernet packets.
ret = rte_eth_dev_adjust_nb_rx_tx_desc(portid, &nb_rxd, &nb_txd);
if (ret != 0) {
printf("Error during adjusting device port id=%d number of rx and tx descriptors\n");
return ret;
}
// Allocate and set up a receive queue for the Ethernet device.
for (qid = 0; qid < rx_rings; qid++) {
struct rte_eth_rxconf *rxconf;
rxconf = &infos.default_rxconf;
rxconf->offloads = port_conf.rxmode.offloads;
ret = rte_eth_rx_queue_setup(portid, qid, nb_rxd, rte_eth_dev_socket_id(portid), rxconf, mbuf_pool);
if (ret < 0) {
return ret;
}
}
// Allocate and set up a transmit queue for the Ethernet device.
for (qid = 0; qid < tx_rings; qid++) {
struct rte_eth_txconf *txconf;
txconf = &infos.default_txconf;
txconf->offloads = port_conf.txmode.offloads;
ret = rte_eth_tx_queue_setup(portid, qid, nb_txd, rte_eth_dev_socket_id(portid), txconf);
if (ret < 0) {
return ret;
}
}
// Set the default MAC address of the Ethernet device (02:de:c0:ed:00:01).
ret = rte_eth_dev_default_mac_addr_set(portid, &self_mac);
if (ret != 0) {
printf("Could not set mac address\n");
return ret;
}
// Start the Ethernet device now that it is fully configured.
ret = rte_eth_dev_start(portid);
if (ret < 0) {
return ret;
}
return 0;
}
Now that the device is configured and running, the lcore_main
is called to start processing and transmitting packets with the Ethernet device. The loop reads incoming network packets and replies to ICMP echo requests.
static int lcore_main(__rte_unused void *arg) {
unsigned lcore_id = rte_lcore_id();
struct rte_mbuf *m = NULL;
unsigned nb_pkts = 0;
struct rte_ether_hdr *eth_h;
struct rte_ether_addr eth_addr;
struct rte_icmp_hdr *icmp_h;
struct rte_ipv4_hdr *ip_h;
uint32_t ip_addr;
uint32_t cksum;
uint16_t nb_replies;
uint16_t nb_tx;
// Infinite loop until receiving SIGINT or SIGTERM.
while (!quit) {
usleep(125000);
// Check if we received Ethernet packets.
nb_pkts = rte_eth_rx_burst(0, 0, &m, 1);
if (likely(nb_pkts) == 0) continue;
// This macro points to the start of the data in the mbuf (internal packet structure).
eth_h = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
int l2_len = sizeof(struct rte_ether_hdr);
ip_h = (struct rte_ipv4_hdr *) ((char *)eth_h + l2_len);
// Check if packet is a ICMP echo.
icmp_h = (struct rte_icmp_hdr *) ((char *)ip_h + sizeof(struct rte_ipv4_hdr));
if (! ((ip_h->next_proto_id == IPPROTO_ICMP) &&
(icmp_h->icmp_type == RTE_IP_ICMP_ECHO_REQUEST) &&
(icmp_h->icmp_code == 0))) {
rte_pktmbuf_free(m);
continue;
}
// Swap ethernet source and destination.
rte_ether_addr_copy(ð_h->src_addr, ð_addr);
rte_ether_addr_copy(ð_h->dst_addr, ð_h->src_addr);
rte_ether_addr_copy(ð_addr, ð_h->dst_addr);
// Swap ipv4 source and destination.
ip_addr = ip_h->src_addr;
ip_h->src_addr = ip_h->dst_addr;
ip_h->dst_addr = ip_addr;
// Change icmp type to echo reply.
icmp_h->icmp_type = RTE_IP_ICMP_ECHO_REPLY;
cksum = ~icmp_h->icmp_cksum & 0xffff;
cksum += ~htons(RTE_IP_ICMP_ECHO_REQUEST << 8) & 0xffff;
cksum += htons(RTE_IP_ICMP_ECHO_REPLY << 8);
cksum = (cksum & 0xffff) + (cksum >> 16);
cksum = (cksum & 0xffff) + (cksum >> 16);
icmp_h->icmp_cksum = ~cksum;
// Send a burst of output packets.
rte_eth_tx_burst(0, 0, &m, 1);
// Free the packet mbuf back into its original mempool.
rte_pktmbuf_free(m);
printf("Received and processed an ICMP echo request\n");
fflush(stdout);
}
return 0;
}
Enough reading, now let’s find how to obtain the flag. We could recompile babydpdk
from the source code and add a print statement during the ICMP packet processing to print the packet content. This is probably the cleanest way to solve the challenge. We could also patch the babydpdk
ELF executable to do the same thing.
But since babydpdk
is executed by our Linux user (and not root), and since the packet processing with DPDK runs in userspace, the easiest method is probably to read the process memory. We are also lucky that GDB is installed on the QEMU machine, which makes it even easier as we can simply debug the program, break when an ICMP packet is received, and read the flag in memory.
$ gdb --args /home/user/babydpdk -l 0 --no-huge --allow 0000:01:10.0
(gdb) # Add a breakpoint in lcore_main, after an ICMP packet is received
(gdb) b *0x00005555555564ba
Breakpoint 1 at 0x5555555564ba
(gdb) r
Thread 1 "babydpdk" hit Breakpoint 1, 0x00005555555564ba in main ()
(gdb) # Read the ICMP packet payload
(gdb) x/s $rdi+0x12a
0x103f9d16a: "FCSC{50fcce15afa5b2692f1ab91e8803d817f3dc72ec9e140da03605a4123b952862}"
Note that it would still be possible without GDB, for example using ptrace
to access the process registers and memory.