Popular Science published a list of The Worst Jobs in Science.  Sixth on the list of ten is working at MSRC.  According to the article, this means being a Microsoft Security handler is worse than:
Studying whale feces (hey, look, this one ate krill.)
Forensic entomologist (the maggots on that decomposing body are how old?)
Olympic drug tester (IN THE CUP! IN THE CUP!)
Gravity research subject (OK, doing the weightless-in-a-plane thing sounds fun)
So can we get somebody from MSRC to respond?
Let's ask!
Thursday, June 28, 2007
Tuesday, June 26, 2007
ISN Funny
I <3 The Onion. But I especially love it when they show up mixed in with the "real news."
http://www.infosecnews.org/pipermail/isn/2007-June/014889.html
Friday, June 22, 2007
Data Theft: When To Worry
Data breaches as a result of laptop theft have gotten a lot of press over the past couple of years.  There have been dozens of these incidents, and there's even a Hall of Shame. Despite this, there has never been a publicly disclosed link between a laptop theft where personal data was stolen and the appearance of some or all of that data on the black market.
My theory is that laptop theft, like most theft, is a crime of opportunity. Just about anyone can steal a laptop from an airport, office, or car and sell it on eBay or at a pawn shop. Or keep it. And they can make a few hundred dollars doing it. The theft itself takes a few seconds. To target a laptop with valuable data on it would require a lot more reconnaissance and planning. Hours, days, even months to identify the one that had the right data and find a time when it was unprotected enough to steal. The truth is, identity dumps are a whole lot easier and less risky to steal than targeting a laptop. This is no excuse to not encrypt laptop hard drives. But it is much ado about very little.
However, when you see a story like this, it's time to worry. This wasn't lost in the mail or displaced by Iron Mountain. It was stolen from a car. And it's a DLT tape. The black market value of a used backup tape? Less than the CD in the car's stereo. So it may be a crime of opportunity, but backup tapes imply valuable data. Why back it up otherwise?
And what would a story about stolen data be without the excruciating attempts to downplay the breach. The governor's office says that it is unlikely that the information has been accessed because it requires special hardware and software. As if that wasn't bad enough, it turns out that the reason that the tape was in an intern's car in the first place is because it was part of a standard security procedure. Now, off-site backups at a 22-yr-old's apartment may or may not be better than no off-site storage at all. But this is downright irresponsible. Somebody was willing to break into a car to get a tape with hundreds of thousands of dumps on it, worth potentially millions of dollars on the black market. What would have happened if that intern had been there when the tape was stolen?
So if you haven't contracted with an off-site storage company or aren't already using your corporate locations to do off-site storage, please think about it. I'd like to tell you that I think this story is uniquely asinine, but the truth is that I've personally made this recommendation to at least a half-dozen clients over the past 5 years because their off-site practice put an employee at risk.
My theory is that laptop theft, like most theft, is a crime of opportunity. Just about anyone can steal a laptop from an airport, office, or car and sell it on eBay or at a pawn shop. Or keep it. And they can make a few hundred dollars doing it. The theft itself takes a few seconds. To target a laptop with valuable data on it would require a lot more reconnaissance and planning. Hours, days, even months to identify the one that had the right data and find a time when it was unprotected enough to steal. The truth is, identity dumps are a whole lot easier and less risky to steal than targeting a laptop. This is no excuse to not encrypt laptop hard drives. But it is much ado about very little.
However, when you see a story like this, it's time to worry. This wasn't lost in the mail or displaced by Iron Mountain. It was stolen from a car. And it's a DLT tape. The black market value of a used backup tape? Less than the CD in the car's stereo. So it may be a crime of opportunity, but backup tapes imply valuable data. Why back it up otherwise?
And what would a story about stolen data be without the excruciating attempts to downplay the breach. The governor's office says that it is unlikely that the information has been accessed because it requires special hardware and software. As if that wasn't bad enough, it turns out that the reason that the tape was in an intern's car in the first place is because it was part of a standard security procedure. Now, off-site backups at a 22-yr-old's apartment may or may not be better than no off-site storage at all. But this is downright irresponsible. Somebody was willing to break into a car to get a tape with hundreds of thousands of dumps on it, worth potentially millions of dollars on the black market. What would have happened if that intern had been there when the tape was stolen?
So if you haven't contracted with an off-site storage company or aren't already using your corporate locations to do off-site storage, please think about it. I'd like to tell you that I think this story is uniquely asinine, but the truth is that I've personally made this recommendation to at least a half-dozen clients over the past 5 years because their off-site practice put an employee at risk.
Malware Season Epilogue
I found something interesting while catching up on my reading.  If you've been reading anything malware-related, you already know that MPack is the big deal this week.  I was reading Vicente Martinez's paper on MPack (PDF Link) and noticed that it uses JavaScript obfuscation methods very similar to those used by the malware I've been writing about this week.
There were significant differences between what I found and what Vicente describes, so I doubt that what I found was created with MPack. But I do think it's worth watching web traffic for the presence of a JavaScript function named dF(), since it can be tied to malware delivery. So here you go:
alert tcp $EXTERNAL_NET 80 -> $HOME_NET any (msg:"LOCAL Possible obfuscated JavaScript dropper MPack"; content:"<script>"; nocase; content:"unescape"; nocase; content:"|64462827|"; classtype:trojan-activity; sid:9000130; rev:1;)
Added Note: The third string, |64462827| is hex for dF('
There were significant differences between what I found and what Vicente describes, so I doubt that what I found was created with MPack. But I do think it's worth watching web traffic for the presence of a JavaScript function named dF(), since it can be tied to malware delivery. So here you go:
alert tcp $EXTERNAL_NET 80 -> $HOME_NET any (msg:"LOCAL Possible obfuscated JavaScript dropper MPack"; content:"<script>"; nocase; content:"unescape"; nocase; content:"|64462827|"; classtype:trojan-activity; sid:9000130; rev:1;)
Added Note: The third string, |64462827| is hex for dF('
Wednesday, June 20, 2007
Malware Season Pt. 2
When we last left our heroes, they had de-obfuscated some JavaScript and downloaded a malware binary file named 'bin.exe'.
As you might have guessed, this binary was packed in order to make detecting its contents more difficult. I ran it through PEiD to determine what packer was used:

At this point, I don't even bother trying to unpack it. Instead, I try to load it in the GenOEP and ScanEP PEiD plugins and then I try to open it in OllyDbg. They all fail. Now I start to fear that I'm doomed to repeat past frustrations. But, what the hell, I'll try and unpack it anyway:

That was lucky. It can't be this easy. You may have noticed that the file in that screen shot is svhost32.exe, not bin.exe. This is because I was playing with it, trying to get it to run in SysAnalyzer. Since the VBScript dropper saves the file as svhost32.exe, I thought that might be worth a try. Anyway, to make sure there aren't more layers of packing going on here, I take another whack at it with PEiD:

I didn't see that coming, but I'm not looking a gift horse in the mouth. So now we should be able to do stuff like launch it in SysAnalyzer or OllyDbg. Sure enough, it runs from SysAnalyzer and we get the goodies:


Once it's running with SysAnalyzer, we can get the scoop. It uses AutoItv3 to download itself again as svhost.exe, modifies a mess of registry keys to run at start up as well as hijack Explorer and IE startup pages, presumably to drive up ad hits for the distributor.
 A quick Google search, and we have a name for it: Sohanad.  So it's not new malware, really, just slightly modified from the original so as to get by more AV scanners.  I wonder how many:
A quick Google search, and we have a name for it: Sohanad.  So it's not new malware, really, just slightly modified from the original so as to get by more AV scanners.  I wonder how many:

The packed executable that we downloaded is detected by 13/31 AV products used by VirusTotal. Just for kicks, what happens if we try the unpacked file from earlier:
 Ugh.  Only 5/31 detect it now, when it's not obfuscated.  The irony is overwhelming.  Quick, somebody in the AV R&D field write a paper on using un-obfuscated code as a means of bypassing AV detection.  This is hot!
Ugh.  Only 5/31 detect it now, when it's not obfuscated.  The irony is overwhelming.  Quick, somebody in the AV R&D field write a paper on using un-obfuscated code as a means of bypassing AV detection.  This is hot!
Lastly, I contacted McAfee for an EXTRA.DAT file for both the packed and unpacked binaries and notified SANS ISC of the hacked web site with the dropper as well as the site hosting the binary.
I'd like to say, if you do run into sites hosting malware, the handlers at ISC are a great resource for coordinating response and clean-up. In this case, the hacked site was cleaned up and the malicious site was taken down within a day of my contacting ISC. They contacted the responsible parties and got it done. Doing this by yourself is hard and annoying work, and I am grateful to the ISC folks that they're willing to let us offload this stuff to them. So when you're out and about at conferences this summer and you see any of the ISC handlers, remember to thank them and maybe buy them a beer or something.
As you might have guessed, this binary was packed in order to make detecting its contents more difficult. I ran it through PEiD to determine what packer was used:
At this point, I don't even bother trying to unpack it. Instead, I try to load it in the GenOEP and ScanEP PEiD plugins and then I try to open it in OllyDbg. They all fail. Now I start to fear that I'm doomed to repeat past frustrations. But, what the hell, I'll try and unpack it anyway:
That was lucky. It can't be this easy. You may have noticed that the file in that screen shot is svhost32.exe, not bin.exe. This is because I was playing with it, trying to get it to run in SysAnalyzer. Since the VBScript dropper saves the file as svhost32.exe, I thought that might be worth a try. Anyway, to make sure there aren't more layers of packing going on here, I take another whack at it with PEiD:
I didn't see that coming, but I'm not looking a gift horse in the mouth. So now we should be able to do stuff like launch it in SysAnalyzer or OllyDbg. Sure enough, it runs from SysAnalyzer and we get the goodies:
Once it's running with SysAnalyzer, we can get the scoop. It uses AutoItv3 to download itself again as svhost.exe, modifies a mess of registry keys to run at start up as well as hijack Explorer and IE startup pages, presumably to drive up ad hits for the distributor.
The packed executable that we downloaded is detected by 13/31 AV products used by VirusTotal. Just for kicks, what happens if we try the unpacked file from earlier:
Lastly, I contacted McAfee for an EXTRA.DAT file for both the packed and unpacked binaries and notified SANS ISC of the hacked web site with the dropper as well as the site hosting the binary.
I'd like to say, if you do run into sites hosting malware, the handlers at ISC are a great resource for coordinating response and clean-up. In this case, the hacked site was cleaned up and the malicious site was taken down within a day of my contacting ISC. They contacted the responsible parties and got it done. Doing this by yourself is hard and annoying work, and I am grateful to the ISC folks that they're willing to let us offload this stuff to them. So when you're out and about at conferences this summer and you see any of the ISC handlers, remember to thank them and maybe buy them a beer or something.
Malware Season Pt. 1
Like I said last week, I'm going to write up my experience from last week with taking some "0h-day" malware (read: undetected by IDS or AV) from log finding back to analysis of the dropper and binary. This is a 2-parter, the first part covers from discovery through the dropper to getting a copy of the binary.
If you're aware of what's been going on in the malware arena for the past few years, and has visibly worsened over the last 6-9mos, then you know that you can't rely on your AV vendor to catch it all. (Remember all of that 'defense-in-depth' stuff from the Information Assurance "awakening" 4-5 years ago? Yeah, this is where it should be saving your bacon.) So one thing I've taken to doing is looking at firewall logs for outbound web requests that end in ".exe". I found one that was a "http://IP:PORT/bin.exe" It would be nice if the FQDN were captured here, but it's not. And as such, that file is just out of reach.
Using ArcSight, it was pretty easy to give that download some context by pulling up all of the Internet traffic to and from that workstation for 30 minutes before and 30 minutes after. After digging around, I found one of the web sites the user had visited that contained a single line of JavaScript on each page that contained an unescape() call that looked glaringly suspicious.

In addition to the unescape call, there's also another function named dF() that has a whole mess of obfuscated code in it. The first step is to find out what the unescape function will actually do. I used Rhino JavaScript shell:

The unescape creates a second set of script tags and defines the dF function. Now I can define dF in Rhino's shell and see what that does:

Oops. I needed to replace document.write with print in order to actually see output of the dF function:

There, that should work:

Now I've got a human-readable VBScript dropper with a URL. Grabbing the binary malware at this point is as easy as using wget. Despite the fact that the malware URL can easily be read, there's still obfuscation at work here. Check out the a1,a2,a3,a4 variables and then str1 = a1&a2&a3&a4 used to hide the string "Adodb.Stream." I wonder why. Like I said, both the JavaScript payload and the binary went undetected by the workstation's antivirus. Our IDS fired on both, but they were fairly generic detects - one for the JavaScript unescape function and another for the packed executable download.
Tomorrow I'll write up the basic analysis of the binary that I did along with some info on the type of external follow-up.
If you're aware of what's been going on in the malware arena for the past few years, and has visibly worsened over the last 6-9mos, then you know that you can't rely on your AV vendor to catch it all. (Remember all of that 'defense-in-depth' stuff from the Information Assurance "awakening" 4-5 years ago? Yeah, this is where it should be saving your bacon.) So one thing I've taken to doing is looking at firewall logs for outbound web requests that end in ".exe". I found one that was a "http://IP:PORT/bin.exe" It would be nice if the FQDN were captured here, but it's not. And as such, that file is just out of reach.
Using ArcSight, it was pretty easy to give that download some context by pulling up all of the Internet traffic to and from that workstation for 30 minutes before and 30 minutes after. After digging around, I found one of the web sites the user had visited that contained a single line of JavaScript on each page that contained an unescape() call that looked glaringly suspicious.
In addition to the unescape call, there's also another function named dF() that has a whole mess of obfuscated code in it. The first step is to find out what the unescape function will actually do. I used Rhino JavaScript shell:
The unescape creates a second set of script tags and defines the dF function. Now I can define dF in Rhino's shell and see what that does:
Oops. I needed to replace document.write with print in order to actually see output of the dF function:
There, that should work:
Now I've got a human-readable VBScript dropper with a URL. Grabbing the binary malware at this point is as easy as using wget. Despite the fact that the malware URL can easily be read, there's still obfuscation at work here. Check out the a1,a2,a3,a4 variables and then str1 = a1&a2&a3&a4 used to hide the string "Adodb.Stream." I wonder why. Like I said, both the JavaScript payload and the binary went undetected by the workstation's antivirus. Our IDS fired on both, but they were fairly generic detects - one for the JavaScript unescape function and another for the packed executable download.
Tomorrow I'll write up the basic analysis of the binary that I did along with some info on the type of external follow-up.
Thursday, June 14, 2007
SonicWall's Bargain Buy
My wife is pretty cool.  She's one of the new generation of self-employed eBay mom's.  But even before she started an eBay biz, she was a bargain shopper.  One of the best.  And I don't mean, "Look what I got for 35% off at Macy's!"  I mean, "Look what I found for 99% off on the clearance rack!"  Seriously - if bargains are ancient treasure, I married Indiana Jones.
Anyway, somebody over at SonicWall put together a deal that would make my wife very proud. Check it out: SonicWall bought Aventail this week for $25M cash. I don't write much about the firewall / VPN market, but if you know me you know that I've been doing firewall stuff for the past decade, and I still keep an eye on it. SonicWall is really only a player in SMB/SOHO market. Their enterprise stuff isn't even on the radar where Cisco, Juniper, and Check Point are concerned. But the weird thing about Aventail is that every time I talk to a client or a colleague about SSL VPN, Aventail keeps coming up. They're a small vendor, but they have a slick product, and they do well in trade press comparos. So the fact that they were purchased comes as no surprise. And with this deal, SonicWall just bought their way back into the market for practically pennies. Nice.
Plus, it's about time we saw an acquisition that made sense. It seems like the last 9mos have been nothing but mismatched purchases.
Anyway, somebody over at SonicWall put together a deal that would make my wife very proud. Check it out: SonicWall bought Aventail this week for $25M cash. I don't write much about the firewall / VPN market, but if you know me you know that I've been doing firewall stuff for the past decade, and I still keep an eye on it. SonicWall is really only a player in SMB/SOHO market. Their enterprise stuff isn't even on the radar where Cisco, Juniper, and Check Point are concerned. But the weird thing about Aventail is that every time I talk to a client or a colleague about SSL VPN, Aventail keeps coming up. They're a small vendor, but they have a slick product, and they do well in trade press comparos. So the fact that they were purchased comes as no surprise. And with this deal, SonicWall just bought their way back into the market for practically pennies. Nice.
Plus, it's about time we saw an acquisition that made sense. It seems like the last 9mos have been nothing but mismatched purchases.
Wednesday, June 13, 2007
Malware Hunting for Lazy Idiots
What I want to write about right now is a couple of neat new tricks I figured out while tracking down some malware today.  It's JavaScript de-obfuscation and packed executables that won't run in SysAnalyzer.  Fun!  Especially since this time I was successful, front-to-back.  I think I may still post a detailed description.  But not right now.  I haven't got time.  Yards don't mow themselves.
But what I did want to mention was that Niels Provos has released a new version of SpyBye. The cool part is that it built on my Mac with zero difficulty. Ironically, it builds on OpenBSD, but dumps core complaining about an unrecognized symbol from libevent. What's cool about the new SpyBye is the '-x' switch that lets you run it as an interactive proxy from your browser. This makes analysis easier to do since you can let the browser step through the scripts and iframes and all that crap that sucks up your time when trying to do it manually. What's not that cool about SpyBye are the good_patterns/bad_patterns files, and how limited and basic the content is. For example, none of the aforementioned malware or its droppers set off SpyBye. Moreover, the exploit still works on vulnerable browsers. I've been playing with adding regex patterns to the bad_patterns file to get it to detect all of the known-bad stuff I already have Snort signatures for. Once that's done maybe I'll post them up here or e-mail them to Niels or something.
The reason I love the idea of SpyBye, HoneyC, or any other honeyclient-ish tool is that they all imply an easy way to find new malware that your IDS and AV don't already stop. It sounds so much easier than rifling through proxy logs or tcpdump payloads looking for "stuff that don't look right." But I always have a little bit of buyer's (OK, compiler's?) remorse when I get something like SpyBye up and running. Sure, it works the way it was written to work, but it doesn't magically find lots of cool zero-day browser sploits in the wild. Even when I know they're there. I guess finding the "really cool" malware is still hard and requires some luck. Which sucks, because like the title says, I would much rather be a lazy idiot.
But what I did want to mention was that Niels Provos has released a new version of SpyBye. The cool part is that it built on my Mac with zero difficulty. Ironically, it builds on OpenBSD, but dumps core complaining about an unrecognized symbol from libevent. What's cool about the new SpyBye is the '-x' switch that lets you run it as an interactive proxy from your browser. This makes analysis easier to do since you can let the browser step through the scripts and iframes and all that crap that sucks up your time when trying to do it manually. What's not that cool about SpyBye are the good_patterns/bad_patterns files, and how limited and basic the content is. For example, none of the aforementioned malware or its droppers set off SpyBye. Moreover, the exploit still works on vulnerable browsers. I've been playing with adding regex patterns to the bad_patterns file to get it to detect all of the known-bad stuff I already have Snort signatures for. Once that's done maybe I'll post them up here or e-mail them to Niels or something.
The reason I love the idea of SpyBye, HoneyC, or any other honeyclient-ish tool is that they all imply an easy way to find new malware that your IDS and AV don't already stop. It sounds so much easier than rifling through proxy logs or tcpdump payloads looking for "stuff that don't look right." But I always have a little bit of buyer's (OK, compiler's?) remorse when I get something like SpyBye up and running. Sure, it works the way it was written to work, but it doesn't magically find lots of cool zero-day browser sploits in the wild. Even when I know they're there. I guess finding the "really cool" malware is still hard and requires some luck. Which sucks, because like the title says, I would much rather be a lazy idiot.
SIM Sizing - RAM
Sorry for the pregnant pause on this topic.  We went camping this weekend and my brain, well, rebooted.  Good for me and my blood pressure.  Bad for anyone waiting for the final episode of this  gripping trilogy.
Anyhoo, there are basically two places you need to plan for RAM expansion in your SIM. Again, the database. Not much of a surprise. The two drivers here are the log frequency as this will cause the database to store INSERTs into RAM while waiting for disk. This problem will be obvious as your performance will suffer noticeably. The other thing that can cause the amount of RAM your database is using to increase are standard things like the number of simultaneous users or the number of simultaneous reports that run. Also reports or table views (ArcSight calls these 'active channels') that look back through weeks or months worth of events will definitely burn RAM on the database server.
The other component that can run into RAM problems is the point on your SIM where correlation rules run. The reason correlation rules eat memory is because they are based on matching 2 or more events over a window of time. This is not unlike the state tracking table on your firewall.
Let's say you have a rule that looks for a firewall allow message and an IDS alert that have the same source IP address. Then let's say that that rule has a 2 minute window because you have time sync issues. Your SIM is going to track every firewall allow message and every IDS alert that match your filter for 2 minutes, comparing each new IDS event to each firewall event in the table as well as comparing each new firewall event to each IDS event in the table. Now imagine someone fat-fingers that rule or you have really bad time sync issues and you set that window to 20 minutes. I think you get the idea.
The solution to keeping this from spiraling out of control is simply:
1) Write correlation rule filters as simply and precisely as possible.
2) NTP!@ (If you haven't already. This is important for your SIM to function well for so many reasons.)
3) Since you have time sync figured out now, use small windows in your correlation rules.
Anyhoo, there are basically two places you need to plan for RAM expansion in your SIM. Again, the database. Not much of a surprise. The two drivers here are the log frequency as this will cause the database to store INSERTs into RAM while waiting for disk. This problem will be obvious as your performance will suffer noticeably. The other thing that can cause the amount of RAM your database is using to increase are standard things like the number of simultaneous users or the number of simultaneous reports that run. Also reports or table views (ArcSight calls these 'active channels') that look back through weeks or months worth of events will definitely burn RAM on the database server.
The other component that can run into RAM problems is the point on your SIM where correlation rules run. The reason correlation rules eat memory is because they are based on matching 2 or more events over a window of time. This is not unlike the state tracking table on your firewall.
Let's say you have a rule that looks for a firewall allow message and an IDS alert that have the same source IP address. Then let's say that that rule has a 2 minute window because you have time sync issues. Your SIM is going to track every firewall allow message and every IDS alert that match your filter for 2 minutes, comparing each new IDS event to each firewall event in the table as well as comparing each new firewall event to each IDS event in the table. Now imagine someone fat-fingers that rule or you have really bad time sync issues and you set that window to 20 minutes. I think you get the idea.
The solution to keeping this from spiraling out of control is simply:
1) Write correlation rule filters as simply and precisely as possible.
2) NTP!@ (If you haven't already. This is important for your SIM to function well for so many reasons.)
3) Since you have time sync figured out now, use small windows in your correlation rules.
Thursday, June 7, 2007
June GRSec Announced
GRSec will be getting together over the summer as promised.  Note the move to Tuesday night.  I've heard from several people that Wednesdays and Thursdays don't work.  So we're looking forward to some new faces.  Hope to see you there!
Wednesday, June 6, 2007
TJX CEO Apology
Apparently TJX CEO Carol Meyrowitz (who, it bears mentioning, was not CEO at the time that the breach occurred) apologized for The Biggest Data Breach Ever at a shareholder meeting earlier this week.  Unfortunately the entirety of the meeting is not online yet, but boston.com quotes her as saying,
"But we had locks."
I'm forced to assume that this is a metaphor, not meant to be taken literally. You know, cuz shareholders are dumb and don't understand words like "authentication" or "encryption." So reading between the lines here, Meyrowitz is contradicting what a well-respected Gartner source has said about the lack of wireless security.
I hope Ms. Meyrowitz isn't offended if I don't take her word for it. Either way, it's what she apparently didn't say that bothers me. There seems to have been no talk of what steps TJX has taken or how much they've invested in improving IT security at TJX in order to reduce the risk of a second breach. I'm no stock trader, but uh...
Recommendation: Strong Sell
"But we had locks."
I'm forced to assume that this is a metaphor, not meant to be taken literally. You know, cuz shareholders are dumb and don't understand words like "authentication" or "encryption." So reading between the lines here, Meyrowitz is contradicting what a well-respected Gartner source has said about the lack of wireless security.
I hope Ms. Meyrowitz isn't offended if I don't take her word for it. Either way, it's what she apparently didn't say that bothers me. There seems to have been no talk of what steps TJX has taken or how much they've invested in improving IT security at TJX in order to reduce the risk of a second breach. I'm no stock trader, but uh...
Recommendation: Strong Sell
Tuesday, June 5, 2007
SIM Sizing - CPU (+ Performance Tuning)
There are a couple of places across a traditional SIM that are more susceptible to performance degradation than others. Here's the short list:
The Database
There basically two things that can beat up your database. The first is basically drowning it with INSERT's of new log data. While this will manifest as poor performance and high CPU utilization, the problem is most likely disk array write performance. The answer to that problem is most likely expensive. Sorry.
The second thing that can kill database performance is event searches. This can happen in reporting or table views or pattern discovery or even in charts and graphs. (It can also happen in correlation rule filters - more on that below.) Think of it this way; whatever means you are using to search events, especially historic events (double-especially if there's compression in the mix here) has to be translated in to some horrid, fugly SELECT statement, probably with multiple JOIN's. Use these in rules, graphs, or regularly scheduled reports and you can drown your database server in work to the point that the stuff you're actively doing is unusably slow. The answer here is a combination of giving lots of CPU to your database servers and writing smart search/filter statements.
Correlation Rules
Correlation rules are the heart & soul of SIM technology. For more on what they do, check out my old ISSA 'Intro to SIM' preso deck (PDF Link). There are a number of things that can screw you here, and I already mentioned the first one above. Writing filters that are too complex or simple filters that are too vague will come back to haunt you.
Like an IDS, you will need to tune the correlation rules that your SIM ships with. A lot of this will be about eliminating noise and false positives, just like IDS. But also like IDS, some of the tuning will be performance-related. In addition to the filters you write you will also want to think about things like the number of events to match on, time frame (how long to wait for event2 after event1 occurs), etc. A cool thing that ArcSight includes is a real-time graph partial rule matches. In the example below, you can see there are two rules there that need tweaking and will probably free up measurable memory and CPU cycles once I do.

One last tip on rules and performance: If your rule creates a new meta-event, make certain that the new event does not match its own correlation filter. Trust me on this one. It's worth the extra time to double-check before turning it on.
Log Agents/Parsers/Handlers
The final place where CPU load can grow quickly is your log collection points. Somewhere between the log source and the database is code that your SIM uses to convert the log from its original format to a standard format for insertion into the database. The frequency with which log entries hit this code can have an impact on performance. This is where all of the regex matching, sorting, asset category searching, severity calculation, and so on occurs. For well-defined log formats and sources (like firewalls), this tends not to be that intense a process since you have very little diversity to be handled. But for UNIX servers with a variety of services running, there is the potential for serious friction as these parsers try and figure out what the actual log source is and what the message means.
If you have something like a UNIX a server farm that generates thousands of events per second and you want to push it through your SIM, you will need to spread this load out or buy big hardware to handle it.
The Database
There basically two things that can beat up your database. The first is basically drowning it with INSERT's of new log data. While this will manifest as poor performance and high CPU utilization, the problem is most likely disk array write performance. The answer to that problem is most likely expensive. Sorry.
The second thing that can kill database performance is event searches. This can happen in reporting or table views or pattern discovery or even in charts and graphs. (It can also happen in correlation rule filters - more on that below.) Think of it this way; whatever means you are using to search events, especially historic events (double-especially if there's compression in the mix here) has to be translated in to some horrid, fugly SELECT statement, probably with multiple JOIN's. Use these in rules, graphs, or regularly scheduled reports and you can drown your database server in work to the point that the stuff you're actively doing is unusably slow. The answer here is a combination of giving lots of CPU to your database servers and writing smart search/filter statements.
Correlation Rules
Correlation rules are the heart & soul of SIM technology. For more on what they do, check out my old ISSA 'Intro to SIM' preso deck (PDF Link). There are a number of things that can screw you here, and I already mentioned the first one above. Writing filters that are too complex or simple filters that are too vague will come back to haunt you.
Like an IDS, you will need to tune the correlation rules that your SIM ships with. A lot of this will be about eliminating noise and false positives, just like IDS. But also like IDS, some of the tuning will be performance-related. In addition to the filters you write you will also want to think about things like the number of events to match on, time frame (how long to wait for event2 after event1 occurs), etc. A cool thing that ArcSight includes is a real-time graph partial rule matches. In the example below, you can see there are two rules there that need tweaking and will probably free up measurable memory and CPU cycles once I do.
One last tip on rules and performance: If your rule creates a new meta-event, make certain that the new event does not match its own correlation filter. Trust me on this one. It's worth the extra time to double-check before turning it on.
Log Agents/Parsers/Handlers
The final place where CPU load can grow quickly is your log collection points. Somewhere between the log source and the database is code that your SIM uses to convert the log from its original format to a standard format for insertion into the database. The frequency with which log entries hit this code can have an impact on performance. This is where all of the regex matching, sorting, asset category searching, severity calculation, and so on occurs. For well-defined log formats and sources (like firewalls), this tends not to be that intense a process since you have very little diversity to be handled. But for UNIX servers with a variety of services running, there is the potential for serious friction as these parsers try and figure out what the actual log source is and what the message means.
If you have something like a UNIX a server farm that generates thousands of events per second and you want to push it through your SIM, you will need to spread this load out or buy big hardware to handle it.
Reader Mail
Anonymous writes,
"Do you use their Log Management product Logger as well? If so (or if not), what do you see as the differenc [sic] between ESM and Logger?"
We're working on bringing in a demo unit of the ArcSight Logger appliance, but do not have it deployed today. I've read the cut sheets (PDF Link) and sat through the Webex for Logger, so I can sort of answer your question.
Think of ArcSight Logger like Snare plus syslog-ng plus a Google search appliance. You can shovel common event streams into it either using native means (like syslog) or using the ArcSight Connector agents. Logger can then feed events based on boolean filters (like those used in the main ESM product) "upstream" to the ESM product.
Unlike ArcSight ESM, Logger only lets you search & sort events. There's none of the visualization or correlation that ESM has. There's also none of the reporting, case management, asset data, patterns, and so on and so forth.
We're looking at Logger to possibly fulfill two roles. The first is that I'd like to reduce some of the redundancy we currently have with software agents. Having one box grab data from multiple sources and feed it to the ESM Manager server would simplify things on my end a lot. It would also eliminate one of the few remaining points where data loss could occur during downtime of any of our ESM component servers. The second is that I'd like to give our operations and engineering teams easy access to log data without having to deploy and support ESM Consoles for all of them. As I point out in my next post on SIM sizing, one of the places you are likely to encounter performance problems is the database and people searching for events. Offloading some of that through Logger will hopefully save us from having to spend more on database server hardware down the road.
I hope that answers your question.
"Do you use their Log Management product Logger as well? If so (or if not), what do you see as the differenc [sic] between ESM and Logger?"
We're working on bringing in a demo unit of the ArcSight Logger appliance, but do not have it deployed today. I've read the cut sheets (PDF Link) and sat through the Webex for Logger, so I can sort of answer your question.
Think of ArcSight Logger like Snare plus syslog-ng plus a Google search appliance. You can shovel common event streams into it either using native means (like syslog) or using the ArcSight Connector agents. Logger can then feed events based on boolean filters (like those used in the main ESM product) "upstream" to the ESM product.
Unlike ArcSight ESM, Logger only lets you search & sort events. There's none of the visualization or correlation that ESM has. There's also none of the reporting, case management, asset data, patterns, and so on and so forth.
We're looking at Logger to possibly fulfill two roles. The first is that I'd like to reduce some of the redundancy we currently have with software agents. Having one box grab data from multiple sources and feed it to the ESM Manager server would simplify things on my end a lot. It would also eliminate one of the few remaining points where data loss could occur during downtime of any of our ESM component servers. The second is that I'd like to give our operations and engineering teams easy access to log data without having to deploy and support ESM Consoles for all of them. As I point out in my next post on SIM sizing, one of the places you are likely to encounter performance problems is the database and people searching for events. Offloading some of that through Logger will hopefully save us from having to spend more on database server hardware down the road.
I hope that answers your question.
Subscribe to:
Comments (Atom)
 
