Today I learned an interesting lesson about Snort and performance. If you happen to read the snort-users mailing list, you may know a little bit about my problem.
It actually turns out that changes made in the Snort PCRE (perl-compatible regex) library beteween 2.6.0 and 2.6.1 combined with some old rules that I had written caused Snort to hog CPU time like, well, you know. So, here are two lessons learned from today.
The problem with my rules is that they didn't use flow directives to take advantage of the stream4 preprocessor to track TCP connection state, they went straight into the pcre pattern. This caused Snort to try regex matching the payloads of all packets that matched the layer-3 designation:
alert $HOME_NET any -> $EXTERNAL_NET 6660:6670
Since $HOME_NET contains web servers and so on, this turns out to be a lot of traffic, not just outbound IRC traffic. Adding the following to the rule:
...makes use of stream4 state tracking and dials in the rule to match only IRC traffic, which on a good day is zero packets. Additionally, since my rule was looking for specific client-to-client traffic (bot commands), I added the following:
...ahead of the pcre expression in the old rule, so now it dials the rule in even more and the regex is only invoked when the packet matches statefully at layer 3 and starts with PRIVMSG (or any case-variation thereof). These rules are now more accurate and less likely to create false positive alerts, but more importantly the Snort process that was eating 80-100% of the CPU time available to it is now down in the 5-10% range. This is important because lower CPU utilizatiton translates directly into lower dropped packet rates.
Which is a good segue into my second lesson learned today. I have been using Snort's perfmonitor preprocessor to dump sensor performance statistics. This turns out to be very useful for dealing with these very issues. But I would recommend also that if you are using perfmonitor that you also use Andreas Ostling's pmgraph tool. It creates MRTG-like graphs from your perfmonitor output and makes it easy to spot trends and problems in your sensor's performance. This made it very easy for me to identify the problem in the first place as well as be certain that the changes I made corrected the issue, not only with respect to CPU utilization which can be easily checked with top, but also with respect to dropped packets, which only Snort tracks. Besides perfmonitor, the only way do get that data is to 'kill -HUP [snort's pid]' and read syslog. And I have found that data to be unreliable in the past.
Additionally, if you are on Snort 2.6.1.x or later and you use perfmonitor, you have probably noticed that there's a new field in the perfmonitor output. Andreas hasn't updated pmgraph to deal with this new field yet, but I have a diff file I can send you for it. It's small and I would post it here, but it's not handy at the moment.
Finally, a big thank you to Jason, Joel, and Adam at Sourcefire for being so helpful with this issue. Yes, I'm a paying Sourcefire customer, but they probably didn't know that and it didn't matter to them. That's awesome.