Paul Melson's Blog: security

Showing posts with label security. Show all posts

Saturday, September 19, 2015

BSides Augusta Talk

Earlier this month I had the privilege of speaking at BSides Augusta. I gave a lightning talk on working with the Viper Framework for static analysis.

Here's the talk:

I also released the module and API scripts I wrote for the talk.

I cannot say enough about the talent and quality technical content in the BSides Augusta talks. This is easily a "Top 5" defensive security event. I seriously have no idea how I managed to sneak into this speaker lineup. Definitely going back next year.

Tuesday, August 20, 2013

BSides Detroit Presentation

In June I gave a presentation at BSides Detroit entitled, "Broke, Note Broken: An Effective Information Security Program With a $0 Budget." Here's the video:

I have teased the BSides Detroit organizers that they ought to rename their conference to ASides Detroit because, unlike other BSides events, it doesn't coincide with another security conference, and also because it is has the best content and activities of any security conference in Detroit. If you're in Michigan or the Great Lakes region at all, I recommend making plans to attend next year. I'll be there.

Also, here are some other upcoming security-related events taking place in Michigan:

GrrCON (Sep 12-13, Grand Rapids)
mi4n6 meeting (Sep 19, Livonia)
Michigan Cyber Summit (Oct 25, Novi)

Saturday, October 13, 2012

GrrCON 2012 Forensics Challenge Walkthrough

This is a walk-through of the GrrCON 2012 Forensics Challenge that was designed by Jack Crook (@jackcr). Special thanks to Jack for making it so much fun and challenging!

You can read about the challenge here.
You can download the challenge files from the links here.
You can watch Jack's MiSec presentation on the challenge here.

1. How was the attack delivered?

Open out.pcap in Wireshark, find first TCP session, and follow the TCP stream.

Oh, look! A file that ends in *.doc.exe, that can't be good! Note the "MZ" file magic number and "This program cannot be run in DOS mode" text -- sure signs that this is a Win32 executable file.

Answer: HTTP download of http://66.32.119.38/tigers/BrandonInge/Diagnostics/swing-mechanics.doc.exe

2. What time was the attack delivered?

In Wireshark, find the HTTP GET request ACK packet from the stream we just looked at. In the Frame section of the packet, locate the timestamp.

Answer: Apr 27, 2012 22:00:59

3. What was that name of the file that dropped the backdoor?

Looking at the same HTTP GET request in Wireshark, what was the name of the file from the URL?

Answer: swing-mechanics.doc.exe

4. What is the ip address of the C2 server?

In Wireshark, clear the current TCP stream filter, and browse through the packets. As the HTTP session with the malware dropper ends, we see a new outbound connection to TCP port 443 from our victim. The destination address is the command & control (C2) server for our back door.

Answer: 221.54.197.32

5. What type of backdoor is installed?

Run foremost to extract the Win32 EXE file we found in the first question:

foremost -t exe out.pcap

Foremost will create an output directory with a subdirectory named 'exe' that should contain our backdoor. Now upload the file to VirusTotal. You should see that VirusTotal has already scanned this file. When I did it on 10/6, the last scan date was 9/28, the first day of GrrCON. :)

Answer: Poison Ivy

6. What is the mutex the backdoor is using?

This is the first answer to the challenge you have to work hard for. In order to do this the right way, you must use the memory dump to identify which process initiated the connection to the C2 server, then use its PID to find the base address and memory range, then use that to match any mutexes for that range. (You can cheat here and Google search Poison Ivy mutexes and see if any of them are present in the mutantscan output, too, but as I said, that's cheating. :))

First, find the process that's connecting to the C2 from question #5.

vol.py -f memdump.img connscan |grep 221.54.197.32

We see it's the process at PID 1096. So now we need find out what process it is and, more importantly, it's base address and memory range.

vol.py -f memdump.img psscan |grep 1096

Uh-oh, that's explorer.exe, isn't it? Process injecting basterds! The base address of our pwned process is 0x0214a020. Any mutexes we find in that range are of interest to us.

vol.py -f memdump.img -s mutantscan |grep 0x0214

There happens to be a funny-looking mutex close to our base address. It's not a guarantee that this is the answer we're looking for, but a quick Google search for "Poison Ivy mutex" validates the finding.

Answer: ")!VoqA.I4"

7. Where is the backdoor placed on the filesystem?

To answer this question, we'll examine the SleuthKit file (compromised.timeline) that Jack was kind enough to include in this challenge. We'll start by looking around the time that the backdoor was downloaded (4/27/12, 22:00:59) in question #2 and working forward. Also, we'll want to look for any files that are the same size as the one we extracted from the pcap file with foremost (8,192 bytes) in question #5.

(Note: If you got stuck here at GrrCON because you assumed that the filesystem time and the packet capture time were perfectly in sync, you learned the most valuable lesson there is in DFIR. There is always drift in timestamps between sources. Unfortunately, you learned it the hard way.)

At 21:59:20 on 4/27/12, we find the prefetch temp file for the swing-mechanics.doc.exe file, and immediately after it, another file that matches the size of that file from our pcap being written to c:\windows\system32\svchosts.exe.

(Note: There is a svchost.exe file in %systemroot%\system32 on WinXP and Win7, but there is no svchosts.exe, another clue that this is not legit.)

Answer: C:\WINDOWS\system32\svchosts.exe

8. What process name and process id is the backdoor running in?

Now, if you got this far, but cheated at question #6 instead of doing the work, you may have run the volatility pslist module, saw svchost.exe, and given the wrong answer. Oops! Cheaters never prosper. We already know that the right answer is explorer.exe and its PID is 1096.

Answer: explorer.exe 1096

9. What additional tools do you believe were placed on the machine?

Back to where we left off in question #7. Keep working forward in the SleuthKit timeline, and...

That's weird. Prefetch files indicate a program launch. Somebody ran net.exe, ipconfig.exe, and ping.exe. Likely our attacker testing network connectivity. ;-) But wait! There's more!

One thing DFIR will do to you is make you something of an expert on the names of files and folders that live within C:\WINDOWS, and, well, this doesn't look right for a lot of reasons. And when we see a file named sysmon.exe (which normally is in \WINDOWS\system32) created in this folder, and it's the same size as our binary from question #5 (8,192 bytes), we know we're looking at more bad stuff. So in addition to a backup copy of our backdoor, there is a text file of some kind and four additional executables named f.txt, g.exe, p.exe, r.exe, and w.exe.

You can also find the handles to these files in memory with volatility as well:

vol.py -f memdump.img filescan |grep -i svchosts.exe
vol.py -f memdump.img filescan |grep -i systems

Answer: g.exe, p.exe, r.exe, w.exe, and sysmon.exe

10. What directory was created to place the newly dropped tools?

Answer: C:\WINDOWS\system32\systems

11. How did the attacker escalate privileges?

We can assume from the work we did in #9 and #10, that those binaries aren't copies of calc.exe, so likely one of those was used for privilege escalation.

It would be great if we could extract them from the memory image, but since we didn't see them when we used the volatility psscan module, our chances aren't very good. Maybe we can find the command syntax that was used and get an idea of which tool was used to do what?

vol.py -f memdump.img cmdscan

Well, that looks like an FTP command mixed in with Jack creating the memory dump we're analyzing, which is interesting, but not what we're looking for. Yet.

(Note: There's a good point to be made here about how by gathering the evidence, evidence was also destroyed. In the process of copying down and running mdd, Jack also overwrote most of the cmd.exe history that we are interested in for this question.)

So, no easy win to be had here. Maybe we can find what we're looking for in strings. We'll use strings and the volatility strings module to pull all of the strings >5 in length out of our memory image and see if we can find anything in there. By looking at the SleuthKit timeline file, we see that w.exe was the first of the suspicious binaries to be launched first, so we'll look for that in particular.

strings -n 5 -t o memdump.img >strings.txt
vol.py -f memdump.img strings -s strings.txt >vol-strings.txt
grep w\.exe vol-strings.txt

Bingo! Those command line arguments are the username, domain name, NTLM hash, and program to run for a pass the hash attack tool of some sort. So now our attacker is running cmd.exe as Administrator. Possibly for the whole COMPANY-A domain.

If this is your network, this is where you excuse yourself to put on clean shorts.

Answer: pass-the-hash attack

12. What level of privileges did the attacker obtain?

See above.

Answer: Administrator

13. How was lateral movement performed?

Once we understand that the attacker has become COMPANY-A\Administrator on what is probably an Active Directory domain controller, they can go wherever they want. Let's look for hostnames that aren't dc01.company-a.com and see if they did anything interesting there.

grep -i \.company-a\.com vol-strings.txt

In addition to our own hostname, we see the following:

dc01.company-a.com
res-lab02.company-a.com

If we look back to when we ran the volatility connscan plugin for question #6, we saw a bunch of NetBIOS and LDAP connections to 172.16.150.10, which is dc01.company-a.com. That's pretty much a tell-tale sign that dc01 is the COMPANY-A domain controller. Which means our attacker is in fact a domain admin, and can pivot freely onto dc01 and anything else he wants. Maybe he'll do something like map a drive later. Who knows?

Answer: Credential re-use as Administrator

14. What was the first sign of lateral movement?

Now that I think of it, was there anything from 172.16.150.10 in the pcap file?

ssldump -n -r out.pcap

You can do this in Wireshark, too, but one trick I wanted to show off is ssldump's ability to summarize all of the TCP sessions in a pcap file. Oh, and it looks like dc01 is also phoning home to the C2 server. Hope you packed two pair of clean shorts.

Answer: Well, I saw the C2 traffic from dc01 in the pcap file before any of the other evidence, so that's my answer.

(Note: I think the login to the domain controller as Administrator from a workstation, which came first, should also be caught by the security ops team if they are monitoring the security EventLog on the domain controllers. (Which they should be!))

15. What documents were exfiltrated?

For this one, we had to wander around in the vol-strings.txt file we made to put the pieces together:

less vol-strings.txt

So I admit, this is a bit of a guess, but all of this looks suspicious to me. Here we have a set of files that look like the kinds of things we would want to exfiltrate. Then not that far away, we have a net use mapping a drive to dc01, then making a local directory named "1" (if you look around some more, you discover it's C:\WINDOWS\system32\systems\1). After that, the files from the shared drive are copied to that folder, and that folder is compressed in a rar file and password protected. Then an FTP command is made. That looks like data exfiltration to me.

Answer: confidential1.pdf, confidential2.pdf, confidential3.pdf

(Note: You will see later that I'm close, but managed to miss about half of the files.)

16. How and where were the documents exfiltrated?

Answer: FTP to 66.32.119.38

17. What additionl steps did the attacker take to maintain access?

Unless I missed something (which is actually quite likely), we already talked about this in question #14.

Answer: Installed Poison Ivy RAT on dc01

18. How long did the attacker have access to the network?

So, there are two possible answers Jack is looking for. There's the cynical defeatest answer, "Clearly as long as he wanted." Or there's the specific answer whereby we look at the time from the start of the first C2 connection to the end of the data exfiltration. For that, we fire up Wireshark:

The first screen is the first packet of the first C2 connection, when our attacker actually got control of the first victim system. The second screen has the filter tcp.flags.fin == 1 applied to prove a point. The last packets in the pcap file are ACKs for the C2 connections to both 172.16.150.20 and 172.16.150.10 (res-lab01 and dc01 respectively). The FTP connections complete at 22:13:26, but the C2 goes on, likely past the end of the file.

Answer: 12 minutes, 21 seconds (or indefinitely)

19. What is the secret code inside the exfiltrated documents?

To get at the secret documents, we need to recreate our own copy of the encrypted rar file used for exfiltration, then decrypt it, extract the files, and view them. Fortunately, we already know everything we need to get this done.

First, extract the rar file from the pcap.

foremost -t rar out.pcap

Now, extract the files. We'll need the password, but we caught that in the volatility strings we sifted through for question #15.

unrar x -pqwerty 00002134.rar

Note the bottom of the screenshot there. Jack's got an evil sense of humor. Those aren't even PDFs! But that is in fact an OpenOffice file.

Answer: 76bca1417cb12d09e74d3bd4fe3388e9

20. What is the password for the backdoor?

Whilst reading the article we Googled for about Poison Ivy mutexes for question #6, we also learned that Poison Ivy doesn't typically use packers or cryptors, and that the C2 server and password are coded in the binary file. I wonder if they'll stick out like a sore thumb...

strings 00000002.exe

We totally recognize that IP address as the C2 server from question #4. You'll never guess Jack's favorite baseball team.

Answer: tigers

If you made it this far, thanks for reading. Hope you liked it. Oh, and your reward for reading the whole thing: In case you didn't catch it, this challenge is essentially a mirror of the attack on RSA that led to the theft of their token seed data files in 2011. When I made that realization, it gave me lulz. Jack, you're the man! :)

One more bonus! Even though it wasn't one of the challenge answers, here's the spear-phish that started the whole thing:

Brandon,
I have been watching you swing for the last few weeks. I believe I have come up with a major break through in your mechanics. If these adjustments are made I believe you will be back up in the bigs and batting .280 within no time. Please review the attached document before our next hitting session. Here
Regards,
Lloyd McClendon

Thursday, May 20, 2010

The SIEM Market Discussion Continues

Bill Roth of LogLogic commented on my Twitter exchange with Rocky DeStefano of Visible Risk where we talked about LogLogic's announcement that they were discounting their SIEM product. I then wrote a reply, and it got a little long. So I made it a blog post instead.

Rocky, Paul:
The ClueTrain Manifesto calls markets "conversations", so here goes.....

I think you're falling into a the trap of "conventional wisdom". First off, the basic assumption that the world falls neatly into the SIEM categorization is just plain false. I stand by LogLogic's model....it all starts with log management as the crucial piece, without that key use cases like network forensics are not even possible. Second, the notion that dropping the price is bad is just plain weird. Is LogLogic dropping the price to sell more? Sure we are. Are we dropping the price to take market share? Sure we are. Are we seeing a great response? Sure we are. Since when is saving people money a bad thing?

And we're always interested in a podcast. :)

Bill Roth, EVP
LogLogic

Hi Bill,

Thanks for the comment! And thanks for participating in the dialogue. I think it's awesome that LogLogic is out front and engaging on its business decisions. Very refreshing!

As to your point about log management being that crucial initial component of a SIEM implementation, I agree completely. Log management has also developed as its own market segment as well, independent of SIEM. But I don't need to tell you that. :-)

On the topic of LogLogic's decision to discount its SIEM product, I didn't mean - and I don't believe Rocky did either - that charging less for SIEM is bad, or even a bad business move.

That said, I do believe that for some significant portion of potential customers log management is a commodity technology. However, from my own experience and from everything I've seen to date, SIEM is not a commodity technology, and I'm not convinced it will be. As such, I don't see price as a strong competitive differentiator in the SIEM market.

Following the recent recession, where IT capital budgets still haven't caught up to the (hopefully sustained) economic upturn, I imagine the feedback on LogLogic's price cut has been positive, and that you'll see some SIEM sales where you wouldn't have but for the discount. But in the mid- to long-term, I have my doubts as to whether there is any meaningful gain in market share to be had for LogLogic - or any SIEM vendor for that matter - simply by competing on price with other SIEM vendors.

Let's be frank, if price were a big piece of why companies choose a particular SIEM, Cisco MARS would have the lion's share of the market and ArcSight would be folding. Instead, it's the other way around.

Twitter Killed the Blog Star

I've been really busy both in my personal and professional life for the past year or so, with no signs of slowing down soon. But I have to acknowledge that the main reason my blog posts have fallen off is Twitter. Now, all of the ideas that I have that I might have developed and expanded into a blog post are prematurely evaluated for length. If they can be abbreviated to a couple of 140-character haikus or less, they go on Twitter. Which means they never grow up to be blog posts. They're like the high school dropouts of ideas.

But every once in a while, a Twitter exchange becomes so interesting that, despite the compressed and fleeting nature of Twitter, it turns into something worthy of framing. The other night, Rocky DeStefano of Visible Risk and I had an exchange on SIEM that I thought the wider world might find interesting. The background to the conversation is this post from Rocky's blog about the recent announcement from LogLogic that they were discounting their SIEM product, and then this responding blog post from LogLogic.

rockyd
The LogLogic response ->> http://bit.ly/bAQSZO to my discounting SIEM Post ( http://bit.ly/aiW3kB )
8:47 PM May 18th via TweetDeck

rockyd
I need to noodle on the LogLogic response more. I appreciate the conversation, I think I may see the opposite end of the customer spectrum.
9:02 PM May 18th via TweetDeck

pmelson
@rockyd I think you nailed the issue. If you *NEED* SIEM, you won't compromise features/functionality for capital cost savings.
9:06 PM May 18th via TweetDeck

pmelson
@rockyd If Cisco couldn't make "Free SIEM With Purchase" work, it's not ever going to work.
9:07 PM May 18th via TweetDeck

rockyd
@pmelson let's be honest how could they possible respond any differently than they did? time for a podcast on the subject ?
9:50 PM May 18th via TweetDeck

pmelson
@rockyd They could just fess up. "We're shipping log management appliances, but SIEM isn't moving. So we put it on clearance sale." :-)
9:52 PM May 18th via TweetDeck

pmelson
@rockyd I think with Gartner's SIEM MQ being released, we're about to see another round of SIEM casualties as VC pulls out.
9:54 PM May 18th via TweetDeck

rockyd
@pmelson There has to be quickening soon, there is way too much of the same thing in the market.
9:57 PM May 18th via TweetDeck

pmelson
@rockyd Right. I've been thinking about the key SIEM differentiators and I've only got three.
10:00 PM May 18th via TweetDeck

rockyd
@pmelson which three?
10:06 PM May 18th via TweetDeck

rockyd
@pmelson Like - Sources, Scalability, Analytical Usage, Correlation / Statistical Evaluation, and getting Intelligent information out?
10:08 PM May 18th via TweetDeck

pmelson
@rockyd 1) performance/scalability 2) UI and drill-down 3) supported sources.
10:07 PM May 18th via TweetDeck

rockyd
@pmelson there are some others like context of Host, Vuln, Registry, Applications and Users that lead you towards more advanced usage
10:09 PM May 18th via TweetDeck

pmelson
@rockyd OK, so asset data model(s) makes 4, pre-defined content is 5? That's still not a lot.
10:15 PM May 18th via TweetDeck

rockyd
@pmelson each is several years of development and refinement with customers.
10:32 PM May 18th via TweetDeck

rockyd
@pmelson this comes down to a compliance check box sale versus a security team needing to integrate a tool into their process.
10:35 PM May 18th via TweetDeck

pmelson
@rockyd Agree. But a handful of differentiators == a handful of potential market leaders. Time to thin the herd. Again.
10:42 PM May 18th via TweetDeck

rockyd
@pmelson now I see where you're headed. BTW I think you'll see 3 more acqusitions by end of year.
10:45 PM May 18th via TweetDeck

rockyd
I was thinking about creating a "vegas odds" website for SIEM Quickending and donate some portion of the funds to HFC.
10:47 PM May 18th via TweetDeck

pmelson
@rockyd A SIEM futures market? Very DARPA!
10:49 PM May 18th via TweetDeck

So there, for your parsing and edification, some thoughts on the SIEM product space, the recent Gartner MQ for SIEM, and the near-term ramifications of Gartner's paper on the market.

Also, if you aren't already, you should be reading Rocky's blog, especially if you're interested in SIEM and security ops. Rocky's a guru in this space, and in addition to his blog he has already put together some great podcasts since launching his latest venture, Visible Risk.

Wednesday, April 14, 2010

Snort Signatures for New Koobface Variant

The first rule is actually how we caught the first incident. The binary is served on non-standard HTTP ports via fast-flux servers. It's a signature we've had in place for years.

alert tcp $HOME_NET any -> $EXTERNAL_NET !80 (msg: "LOCAL .exe file download on port other than 80"; flow:established; content: "GET"; depth:4; content: ".exe"; nocase; classtype:misc-activity; sid:9000160; rev:1;)

And these are designed to catch the bot HTTP checkins we've seen so far. This is likely to be more of a whack-a-mole effort as we've already seen the checkin URL format change once.

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"LOCAL Koobface action=fbgen checkin"; flow:to_server,established; content:"POST"; content:"/.sys/?
action=fbgen"; nocase; classtype:trojan-activity; sid:9000220; rev:1;)

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"LOCAL Koobface go.js checkin"; flow:to_server,established; content:"POST"; content:"/go.js?"; nocase; classtype:trojan-activity; sid:9000221; rev:1;)

Friday, January 22, 2010

Security Metrics and Data Visualization

I've just finished compiling the security incident handler case statistics for 2009. This is the second year in a row that I've used the same set of metrics, and having two years worth of data has led to some interesting observations about security trends within my employer's environment.

One set of statistics that may be of interest to the general Internet public is the volume of malware cases that we have worked over the past two years.

There are a couple of things worth pointing out in this graph. The first, and perhaps most obvious one, is that there is a drop-off in malware related cases in 2009. Surely, that can't be right? It is, but it's due to implementing some new security technologies in December of 2008. In fact, those countermeasures reduced the number of malware cases we handled in 2009 by roughly 65% compared to 2008. I want to say two things about this. First, this demonstrates the effectiveness of the preventative countermeasures that we employed and confirms the value of those countermeasures. Notice that I'm not saying that it proves ROI. But the bottom line is that it was worth it. The second thing I want to point out about that decline, however, is that it's just a decline. It did not eliminate the problem. In fact, in 2009 we saw malware chip away at other defenses that were highly effective only two years before. And I suspect that, if we do nothing else about it, that those levels will begin to rise in 2010 and regain the same level of frequency we saw in 2008 if not higher. There's a hint of that in the graph towards the end of 2009.

The next thing I want to point out about this graph is the peak frequency. It is consistent. Every three months, there is a spike in malware incidents in our environment. I would love to see statistics from other companies or the Internet at large to see if this is an Internet-wide pattern. I suspect that it is. Despite the new countermeasures, despite the decrease in order of magnitude, the spikes occur like clockwork every third month. That leads me to believe two things. First, I believe that this pattern is driven externally since it didn't deviate, even when our environment changed significantly. Second, I believe that this is no accident. The vendors that produce malware/botnet "kits" are responsible for introducing most of the new exploits and anti-detection capabilities that we see on a regular basis. Their stuff is used more widely than custom malware as well. Therefore, this leads me to believe that there is one large group responsible for the majority of the malware in the wild, and they're on a 90-day release cycle. I've got no intelligence data to support this, but I have a hard time believing that this pattern repeats itself, without exception, for two years straight out of pure coincidence.

Bottom line, this is the kind of useful information that trend analysis can give you, and why metrics are worth gathering and analyzing.

Monday, December 28, 2009

Malware Analysis Toolkit for 2010

Back in 2008 I posted a list of the tools I use for doing malware analysis. The tools I use have changed over time, and rather than just talk about a couple of recent additions, I decided I'd put a current complete list up with links. This is by no means a comprehensive list of malware analysis tools, it's just what I like and use.

Platform

VMWare Workstation
The "vulnerable stuff:"

Windows XP
Internet Explorer 7/8
Firefox
Acrobat Reader
Flash Player

General Tools

Analysis Tools

Binary Tools

JavaScript & HTTP Tools

PDF & Flash Tools

Web Sites as Tools

Wednesday, November 18, 2009

ArcSight Logger VS Splunk

You are here because you are searching for information on Splunk vs. ArcSight Logger. I actually wrote this post months before posting it, but sat on it for reasons that may become apparent as you read on.

If you want to hear me talk about my experience with Logger 4.0 through the beta process and beyond, you can check out the video case study I did for ArcSight. In short, Logger is good at what it does, and Logger 4.0 is fast. Ridiculously fast.

But that's not what I want to talk about. I want to talk about the question that's on everyone's mind: ArcSight Logger vs. Splunk?

Comparing features, there's not a strong advantage in either camp. Everybody's got built-in collection based on file and syslog. Everybody's got a web interface with pretty graphs. The main way Logger excels here is in its ability to natively front-end data aggregation for ArcSight's ESM SIEM product. But if you've already got ESM, you're going to buy Logger anyway. So that leaves price and performance as the remaining differentiators.

Splunk can compete on price, especially for more specialized use cases where Logger needs the ArcSight Connector software to pick up data (i.e. Windows EventLog via WMI, or database rows via JDBC). And if you don't care about performance, implying that your needs are modest, Splunk may be cheaper for you for even the straightforward use cases because of the different licensing model that scales downward. So for smaller businesses, Splunk scales down.

For larger businesses, Logger scales up. For example, if you need to add storage capacity to your existing Logger install, and you didn't buy the SAN-attached model, you just buy another Logger appliance. You then 'peer' the Logger appliances, split or migrate log flows, and continue to run search & reporting out of the same appliance you've been using, across all peer data stores. With Splunk? You buy and implement more hardware on your own. And pay for more licenses.

My thinking on performance? Logger 4.0 is a Splunk killer, plain and simple. To analogize using cars, Splunk is a Ford Taurus for log search. It gets you down the road, it's reliable, you can pick the entry model up cheap, and by now you know what you're getting. Logger 4.0, however, is a Zonda F with a Volvo price tag.

To bring the comparison to a fine point, I'd like to share a little story with you. It's kind of gossipy, but that makes it fun.

When ArcSight debuted Logger 4.0 and announced its GA release at their Protect conference last fall, they did a live shoot-out of a Logger 7200 running 4.0 with a vanilla install of Splunk 4 on comparable hardware and the same Linux distro (CentOS) that Logger is based on. They performed a simple keyword search in Splunk across 2 million events, which took just over 12 minutes to complete. That's not awful. But that same search against the same data set ran in about 3 seconds on Logger 4.

This would be an interesting end to an otherwise pretty boring story if it weren't for what happened next. Vendors other than ArcSight - partners, integrators, consultants, etc. - participate in their conference both as speakers and on the partner floor. One of these vendors, an integrator of both ArcSight and Splunk products, privately called ArcSight out for the demo. His theory was that a properly-tuned Splunk install would perform much better. Now, it's a little nuts (and perhaps a little more dangerous) to be an invited vendor at a conference and accuse the conference organizer of cooking a demo. But what happened next is even crazier. ArcSight wheeled the gear up to this guy's room and told him that if he could produce a better result during the conference that they would make an announcement to that effect.

Not one to shy away from a technical challenge, this 15-year infosec veteran skipped meals, free beer, presentations, more free beer, and a lot of sleep to tweak the Splunk box to get better performance out of it. That's dedication. There's no doubt in my mind that he wanted to win. Badly. I heard from him personally at the close of the conference that not only did he not make significant headway, but that all of his results were worse than the original 12 minute search time.

You weren't there, you're just reading about it on some dude's blog, so the impact isn't the same. But that was all the convincing I needed.

But if you need more convincing; we stuffed 6mos of raw syslog from various flavors of UNIX and Linux (3TB) into Logger 4 during the beta. I could keyword search the entire data set in 14 seconds. Regex searches were significantly worse. They took 32 seconds.

Monday, November 9, 2009

Reversing JavaScript Shellcode: A Step By Step How-To

With more and more exploits being written in JavaScript, even some 0-day, there is a need to be able to reverse exploits written in JavaScript beyond de-obfuscation. I spent some time this weekend searching Google for a simple way to reverse JavaScript shellcode to assembly. I know people do it all the time. It's hardly rocket science. Yet, I didn't find any good walk-throughs on how to do this. So I thought I'd write one.

For this walk-through, I'll start with JavaScript that has already been extracted from a PDF file and de-obfuscated. So this isn't step 1 of fully reversing a PDF exploit, but for the first several steps, check out Part 2 of this slide deck.

What you'll need:

A safe place to play with exploits (I'll be using an image in VMWare Workstation.)
JavaScript debugger (I highly recommend and will be using Didier Stevens' modified SpiderMonkey.)
Perl
The crap2shellcode.pl script, which you'll find further down in this post
A C compiler and your favorite binary debugger

I'll be using one of the example Adobe Acrobat exploits from the aforementioned slides for this example. You can grab it from milw0rm.

Step 1 - Converting from UTF-encoded characters to ASCII
Most JavaScript shellcode is encoded as either UTF-8 or UTF-16 characters. It would be easy enough to write a tool to convert from any one of these formats to the typical \x-ed UTF-8 format that we're used to seeing shellcode in. But because of the diversity of encoding and obfuscation showing up in JavaScript exploits today, it's more reliable to use JavaScript to decode the shellcode.

For this task, you need a JavaScript debugger. Didier Stevens' SpiderMonkey mod is a great choice. Start by preparing the shellcode text for passing to the debugger. In this case, drop the rest of the exploit, and then wrap the unescape function in an eval function:

Now run this code through SpiderMonkey. SpiderMonkey will create two log files for the eval command, the one with our ASCII shellcode is eval.001.log.

Step 2 - crap2shellcode.pl
This is why I wrote this script, to take an ASCII dump of some shellcode and automate making it debugger-friendly.

---cut---


#!/bin/perl
#
# crap2shellcode  - 11/9/2009 Paul Melson
#
# This script takes stdin from some ascii dump of shellcode
# (i.e. unescape-ed JavaScript sploit) and converts it to
# hex and outputs it in a simple C source file for debugging.
#
# gcc -g3 -o dummy dummy.c
# gdb ./dummy
# (gdb) display /50i shellcode
# (gdb) break main
# (gdb) run
#

use strict;
use warnings;

my $crap;
while($crap=<stdin>) {
  my $hex = unpack('H*', "$crap");

  my $len = length($hex);
  my $start = 0;

  print "#include <stdio.h>\n\n";
  print "static char shellcode[] = \"";

  for (my $i = 0; $i < length $hex; $i+=4) {
    my $a = substr $hex, $i, 2;
    my $b = substr $hex, $i+2, 2;
    print "\\x$b\\x$a";
  }
  print "\";\n\n";
}

print "int main(int argc, char *argv[])\n";
print "{\n";
print "  void (*code)() = (void *)shellcode;\n";
print "  code();\n";
print "  exit(0);\n";
print "}\n";
print "\n";

--paste--

The output of passing eval.001.log through crap2shellcode.pl is a C program that makes debugging the shellcode easy.

Step 3 - View the shellcode/assembly in a debugger
First we have to build it. Since we know that this shellcode is a Linux bindshell the logical choice for where and how to build is Linux with gcc. Similarly, we can use gdb to dump the shellcode. For Win32 shellcode, we would probably pick Visual Studio Express and OllyDbg. Just about any Windows C compiler and debugger will work fine, though.

To build the C code we generated in step 2 with gcc, use the following:

gcc -g3 shellcode.c -o shellcode

The '-g3' flag builds the binary with labels for function stack tracing. This is necessary for debugging the binary. Or at least it makes it a whole lot easier.

Now open the binary in gdb, print *shellcode in x/50i format, set a breakpoint at main(), and run it.

$ gdb ./shellcode
(gdb) display /50i shellcode
(gdb) break main
(gdb) run

Sunday, October 18, 2009

Two-For-One Talk: Malware Analysis for Everyone

These two mini-talks were originally going to be blog posts, but I needed a speaker for this month's ISSA meeting. So I volunteered myself. Here are the slides.

Two-For-One Talk: Malware Analysis for Everyone

View more presentations from pmelson.

Wednesday, September 23, 2009

Queries: Excel vs. ArcSight

Since ArcSight ESM 4.0, reports and trends have been based on queries. Considering that ESM runs on top of Oracle, a query in ESM is exactly what you think it is. Queries are an extremely flexible way to get at event data. But as the name implies, they go against the ARC_EVENT_DATA tablespace, and therefore you can't use them to build data monitors or rule conditions, since those engines run against data prior to insertion into the database.

Anyway, I've got a story about how cool queries are. And about how much of an Excel badass I am. And also about how queries are still better. Last month, I got a request from one of our architects who was running down an issue related to client VPN activity. Specifically, he wanted to know how many remote VPN users we had over time for a particular morning. Since we feed those logs to ESM, I was a logical person to ask for the information.

So I pulled up the relevant events in an active channel and realized that I wasn't going to be able to work this one out just sorting columns. So, without thinking, I exported the events and pulled them up in Excel. So here's the Excel badass part:

If you want to copy it, here it is:
=SUM(IF(FREQUENCY(MATCH(A2:A3653,A2:A3653,0),MATCH(A2:A3653,A2:A3653,0))>0,1))

So A is the column that usernames are in. This formula uses the MATCH function to create a list of usernames and then the FREQUENCY function to count the unique values in the match lists. You need two MATCH lists to make FREQUENCY happy because it requires two arguments, hence the redundancy. It took about an hour for me to put it together, most of that was spent finding the row numbers that corresponded to the time segment borders.

But as I finished it up and sent it off to the requesting architect, I thought, there must be an easier way. And of course there is. So here's how you do the same thing in ESM using queries:

So, it's just EndTime with the hour function applied, and TargetUserName with the count function applied, and the Unique box (DISTINCT for the Oracle DBA's playing at home) checked. And then on the Conditions tab you create your filter to select only the events you want to query against. That's it.

Once the query is created, just run the Report Wizard and go. All told, it's about 90 seconds to the same thing with a query and report that it took an hour to do in Excel.