A Brief Introduction of Seccomp and How to Bypass it

Posted on 2022-11-08 In CTF

Introduction

I have solve several seccomp related challenges in past few ctfs. To further enhance my understanding on seccomp, I decide to take a some time and learn how seccomm works.

All the codes used in this post can be download from here seccomp.zip

Seccomp

So whats is seccomp? "Seccomp is a computer secuirty facility in Linux kernel.[1]"

Basically, seccomp create a sandbox which limit user's ability to use syscalls. Using seccomp, we can create a environment that allow/disallow specific syscall being used.

The first version seccomp released at 2005. At that time, seccomp only have one node: strict mode. In this mode, users are only allowed to use four type of syscall read, write, exit and sigreturn. In strict mode, if user use any of syscall other than those four, the program will terminate immediately. (This is not very useful because there are too many limits!)

Then, in 2012, the second version of seccomp was introduced. Now seccomp have a new mode called filter mode. In this mode, user can specify syscalls that are allowed to run using BPF (Berkeley Packet Filter) virtual machine.

seccomp.h

/* Valid values for seccomp.mode and prctl(PR_SET_SECCOMP, <mode>) */
#define SECCOMP_MODE_DISABLED	0 /* seccomp is not in use. */
#define SECCOMP_MODE_STRICT	1 /* uses hard-coded filter. */
#define SECCOMP_MODE_FILTER	2 /* uses user-supplied filter. */

Seccomp strict mode

Lets start with the basic mode of seccomp - strict mode.

To do that, we are going to ues a syscall called prctl.

Seccomp BPF (filter mode)

As we said, filter mode allow user use bpf filter and limit the usage of certain syscall. So before talking about seccomp, Lets talk about bpf first.

BPF was initially used in data link layer, allowing user to run code quickly inside kernel. So, user can manipulate witch data packets such filter data packet using bpf. Later bpf was also used in other places.

seccomp filter mode also use a bgf vritual machine for syscall filtering.

To do that, we need make a small program and load that program to kernel. Noted that loading process is invertible. So, once you loaded your bpf filter to the kernel, you will not able to modify or delete that.

a bpf program is defined by sock_fprog. In this struct, there is a len field showing the total number of instructions and a pointer to a list of instructions.

struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
	unsigned short		len;	/* Number of filter blocks */
	struct sock_filter __user *filter;
};

each instruction is defined by sock_filter. We can use different instruction code to perform different operation.

struct sock_filter {	/* Filter block */
	__u16	code;   /* Actual filter code */
	__u8	jt;	/* Jump true */
	__u8	jf;	/* Jump false */
	__u32	k;      /* Generic multiuse field */
};

sock_fprog and sock_filter are defined in filter.h. Detailed explanation be also be found in the header file.

bpf instruction include basic load, save, arithmetic op, jmp (condition/uncondition), and return.

all these instructions can be found in bpf_common.h.

When writing the bpf program, we can use two macro (BPF_STMT and BPF_JUMP to simplify our code.

filter.h

#ifndef BPF_STMT
#define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k }
#endif
#ifndef BPF_JUMP
#define BPF_JUMP(code, k, jt, jf) { (unsigned short)(code), jt, jf, k }
#endif

BPF_STMT is used for manipulate register and return value. It has two parameter. the first parameter is bpf instruction and second parameter is the value.

1 2	// find load nr (syscall number) to internal register BPF_STMT(BPF_LD \| BPF_W \| BPF_ABS \| BPF_K,(offsetof(struct seccomp_data, nr)))

BPF_LD means load value from seccomp_data, BPF_ABS means use absolute offset for data. BPF_K means load value to that internal register.

1	BPF_STMT(BPF_RET \| BPF_K,SECCOMP_RET_KILL)

This just means return kill and end the process

BPF_JUMP is used for design control flow of our mini program. There are four arguments: instruction number, the value, jmp number if true, jump number is false

1	BPF_JUMP(BPF_JMP \| BPF_JEQ \| BPF_K ,__NR_execve , 0, 2)

This statement means if value in the internal register is equal to __NR_execve, it will skip next jt=0 instruction (so doesn't do anything). if they are not equal, it will skip next jf=2 instructions.

After we create struct for seccomp, we can load our program to kernel using prctl(PR_SET_SECCOMP,SECCOMP_MODE_FILTER,&prog).

One thing to mention is that we must set PR_SET_NO_NEW_PRIVS to 1, otherwise seccomp filter mode won't work prctl(PR_SET_NO_NEW_PRIVS,1,0,0,0);

Lets look at an example which trying to print out the flag for us.

seccomp_filter.c

// https://man7.org/linux/man-pages/man2/seccomp.2.html
#include <linux/seccomp.h>  /* Definition of SECCOMP_* constants */
#include <linux/filter.h>   /* Definition of struct sock_fprog */
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <stddef.h>
#include <fcntl.h>

// demo of filter mode, not allow open
int main() {
    char msg[] = "hello world1!\n";
    write(0,&msg,14);
    printf("current mode %d, change to filter mode now...\n", prctl(PR_GET_SECCOMP));

    // must set to PR_SET_NO_NEW_PRIVS to 1, otherwise SECCOMP_MODE_FILTER will fail.
    // https://man7.org/linux/man-pages/man2/seccomp.2.html
    prctl(PR_SET_NO_NEW_PRIVS,1,0,0,0);

    struct sock_filter filter[] = {
        BPF_STMT(BPF_LD | BPF_W |BPF_ABS,(offsetof(struct seccomp_data, nr))), // get syscall number
        BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,__NR_open,0,2), // if equal, continue (jump 0 instruction, jump to kill), else jump 2 instructions (jump to allow)
        BPF_STMT(BPF_RET | BPF_K,SECCOMP_RET_KILL), // return kill
        BPF_STMT(BPF_RET | BPF_K,SECCOMP_RET_TRACE), // return trace
        BPF_STMT(BPF_RET | BPF_K,SECCOMP_RET_ALLOW), // return allow
    };
    // 
    struct sock_fprog prog = {
        .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
        .filter = filter,                                         
    };

    printf("set mode return %d\n",prctl(PR_SET_SECCOMP,SECCOMP_MODE_FILTER,&prog));
    printf("seccomp change to mode filter\n");

    // open should not work
    int fd = syscall(__NR_open,"flag.txt",O_RDONLY);
    
    // rest of the program wont execute
    char flag[23];
    read(fd,&flag,23);
    write(1,&flag,23);
}

Output:

hello world1!
current mode 0, change to filter mode now...
set mode return 0
seccomp change to mode filter
Bad system call

We can see seccomp successfully stop open syscall from executing!

We can also validate our seccomp filter using a tool call seccomp-tools

17:06:17 $ seccomp-tools dump ./seccomp_filter
hello world1!
current mode 0, change to filter mode now...
 line  CODE  JT   JF      K
=================================
 0000: 0x20 0x00 0x00 0x00000000  A = sys_number
 0001: 0x15 0x00 0x02 0x00000002  if (A != open) goto 0004
 0002: 0x06 0x00 0x00 0x00000000  return KILL
 0003: 0x06 0x00 0x00 0x7ff00000  return TRACE
 0004: 0x06 0x00 0x00 0x7fff0000  return ALLOW

Common strategies in CTF

Now lets try to bypass our seccomp restriction.

Seccomp with no x32 ABI check

Here is our code from previous example. (see seccomp_filter_no_x32_abi_check.c)

Is there anyway we can bypass that? The answer is yes. Because we restricting x64 syscall, x32 syscalls are not limited!

We can still use x32 ABI syscalls

Therefore, if set x32 syscall flag bit for our syscall number, we can use our system call again!

We replace our syscall of open with __NR_open|__X32_SYSCALL_BIT.

seccomp_filter_no_x32_abi_check_bypass.c

// https://github.com/torvalds/linux/blob/master/arch/x86/entry/syscalls/syscall_64.tbl
// only works for common syscalls
// using x32 ABI to bypass :)
// 0x2 | 0x40000000
int fd = syscall(__NR_open|__X32_SYSCALL_BIT,"flag.txt",O_RDONLY);
printf("why open works????????????????\n");

Execute our program again, we get the flag!

get_flag

Seccomp with no arch check

To prevent user from calling x32 ABI calls, we add another restriction. Check if syscall number is larger than 0x40000000

1	BPF_JUMP(BPF_JMP \| BPF_JGE\| BPF_K,__X32_SYSCALL_BIT,1,0),

seccomp_filter_no_arch_check.c

Output for seccomp

17:48:35 $ seccomp-tools dump ./seccomp_filter_no_arch_check
 line  CODE  JT   JF      K
=================================
 0000: 0x20 0x00 0x00 0x00000000  A = sys_number
 0001: 0x35 0x01 0x00 0x40000000  if (A >= 0x40000000) goto 0003
 0002: 0x15 0x00 0x02 0x00000002  if (A != open) goto 0005
 0003: 0x06 0x00 0x00 0x00000000  return KILL
 0004: 0x06 0x00 0x00 0x7ff00000  return TRACE
 0005: 0x06 0x00 0x00 0x7fff0000  return ALLOW

Now, previous method wont work because if our syscall number is larger than 0x40000000, program will return Kill and stop executing.

But, in this example, our rule doesn't check syscall for i386 architecture. We can use retf and return to 32 bit. Then, we can use syscall from i386.

Here is the shellcode we can use to return to i386 code execution.

xor return address with 0x2300000000 to return to 32 bit
push return address to the stack
use retf

get_flag_2

full exploit can be found in seccomp_filter_no_arch_check_bypass.c

Result

17:57:59 $ ./seccomp_filter_no_arch_check_bypass
hello world1!
current mode 0, change to filter mode now...
set mode return 0
seccomp change to mode filter
now you can't use open lol
flag{here_is_the_flag}

Other

TBD