A Brief Introduction of Seccomp and How to Bypass it
Introduction
I have solve several seccomp related challenges in past few ctfs. To further enhance my understanding on seccomp, I decide to take a some time and learn how seccomm works.
All the codes used in this post can be download from here seccomp.zip
Seccomp
So whats is seccomp? "Seccomp is a computer secuirty facility in Linux kernel.[1]"
Basically, seccomp create a sandbox which limit user's ability to use syscalls. Using seccomp, we can create a environment that allow/disallow specific syscall being used.
The first version seccomp released at 2005. At that time, seccomp only have one node: strict mode. In this mode, users are only allowed to use four type of syscall read
, write
, exit
and sigreturn
. In strict mode, if user use any of syscall other than those four, the program will terminate immediately. (This is not very useful because there are too many limits!)
Then, in 2012, the second version of seccomp was introduced. Now seccomp have a new mode called filter mode. In this mode, user can specify syscalls that are allowed to run using BPF (Berkeley Packet Filter) virtual machine.
1 | /* Valid values for seccomp.mode and prctl(PR_SET_SECCOMP, <mode>) */ |
Seccomp strict mode
Lets start with the basic mode of seccomp - strict mode.
To do that, we are going to ues a syscall called prctl
.
more on prctl man page
1 | #include <sys/prctl.h> |
prctl
is a syscall that allow you manipulate various aspects of the calling thread or process. There are lot of option we can use. In the case of of seccomp, we only need to focus on PR_GET_SECCOMP
and PR_SET_SECCOMP
To get current seccomp mode, we can simply use prctl(PR_GET_SECCOMP)
To change current seccomp mode, we need to use prctl(PR_SET_SECCOMP,mode,arg1...)
Here is the example program using seccomp strict mode
1 | // https://man7.org/linux/man-pages/man2/seccomp.2.html |
Output:
1 | 17:01:07 $ ./seccomp_strict |
As we can see, after we switch to strict mode, we can only use read
, write
, exit
, sigreturn
. If we try to use other syscall, such as prctl
, the program would quit immediately.
Seccomp BPF (filter mode)
As we said, filter mode allow user use bpf filter and limit the usage of certain syscall. So before talking about seccomp, Lets talk about bpf first.
BPF was initially used in data link layer, allowing user to run code quickly inside kernel. So, user can manipulate witch data packets such filter data packet using bpf. Later bpf was also used in other places.
seccomp filter mode also use a bgf vritual machine for syscall filtering.
To do that, we need make a small program and load that program to kernel. Noted that loading process is invertible. So, once you loaded your bpf filter to the kernel, you will not able to modify or delete that.
a bpf program is defined by sock_fprog
. In this struct, there is a len field showing the total number of instructions and a pointer to a list of instructions.
1 | struct sock_fprog { /* Required for SO_ATTACH_FILTER. */ |
each instruction is defined by sock_filter
. We can use different instruction code to perform different operation.
1 | struct sock_filter { /* Filter block */ |
sock_fprog
and sock_filter
are defined in filter.h. Detailed explanation be also be found in the header file.
bpf instruction include basic load, save, arithmetic op, jmp (condition/uncondition), and return.
all these instructions can be found in bpf_common.h.
When writing the bpf program, we can use two macro (BPF_STMT
and BPF_JUMP
to simplify our code.
1 | #ifndef BPF_STMT |
BPF_STMT
is used for manipulate register and return value. It has two parameter. the first parameter is bpf instruction and second parameter is the value.
1 | // find load nr (syscall number) to internal register |
BPF_LD
means load value from seccomp_data
, BPF_ABS
means use absolute offset for data. BPF_K
means load value to that internal register.
1 | BPF_STMT(BPF_RET | BPF_K,SECCOMP_RET_KILL) |
This just means return kill and end the process
BPF_JUMP
is used for design control flow of our mini program. There are four arguments: instruction number, the value, jmp number if true, jump number is false
1 | BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K ,__NR_execve , 0, 2) |
This statement means if value in the internal register is equal to __NR_execve
, it will skip next jt=0
instruction (so doesn't do anything). if they are not equal, it will skip next jf=2
instructions.
After we create struct for seccomp, we can load our program to kernel using prctl(PR_SET_SECCOMP,SECCOMP_MODE_FILTER,&prog)
.
One thing to mention is that we must set PR_SET_NO_NEW_PRIVS
to 1, otherwise seccomp filter mode won't work prctl(PR_SET_NO_NEW_PRIVS,1,0,0,0);
Lets look at an example which trying to print out the flag for us.
seccomp_filter.c
1 | // https://man7.org/linux/man-pages/man2/seccomp.2.html |
Output:
1 | hello world1! |
We can see seccomp successfully stop open
syscall from executing!
We can also validate our seccomp filter using a tool call seccomp-tools
1 | 17:06:17 $ seccomp-tools dump ./seccomp_filter |
Common strategies in CTF
Now lets try to bypass our seccomp restriction.
Seccomp with no x32 ABI check
Here is our code from previous example. (see seccomp_filter_no_x32_abi_check.c
)
Is there anyway we can bypass that? The answer is yes. Because we restricting x64
syscall, x32
syscalls are not limited!
We can still use x32 ABI syscalls
Therefore, if set x32 syscall flag bit for our syscall number, we can use our system call again!
We replace our syscall of open with __NR_open|__X32_SYSCALL_BIT
.
seccomp_filter_no_x32_abi_check_bypass.c
1 | // https://github.com/torvalds/linux/blob/master/arch/x86/entry/syscalls/syscall_64.tbl |
Execute our program again, we get the flag!
Seccomp with no arch check
To prevent user from calling x32 ABI calls, we add another restriction. Check if syscall number is larger than 0x40000000
1 | BPF_JUMP(BPF_JMP | BPF_JGE| BPF_K,__X32_SYSCALL_BIT,1,0), |
seccomp_filter_no_arch_check.c
Output for seccomp
1 | 17:48:35 $ seccomp-tools dump ./seccomp_filter_no_arch_check |
Now, previous method wont work because if our syscall number is larger than 0x40000000, program will return Kill and stop executing.
But, in this example, our rule doesn't check syscall for i386
architecture. We can use retf
and return to 32 bit. Then, we can use syscall from i386
.
Here is the shellcode we can use to return to i386 code execution.
- xor return address with 0x2300000000 to return to 32 bit
- push return address to the stack
- use retf
full exploit can be found in seccomp_filter_no_arch_check_bypass.c
Result
1 | 17:57:59 $ ./seccomp_filter_no_arch_check_bypass |
Other
TBD