Thursday, May 31, 2007

SIM Sizing

This is another topic I'm lifting from LinkedIn Answers. How do you properly size hardware for a SIM implementation and/or growth? I won't lie, I don't have the answer. I have been caught off guard by storage and performance issues in the SIM environment I work with. It's surprising how seemingly little things can have a huge impact on log volume and how that cascades into performance impact across your SIM.

When sizing any application, you are defining requirements for disk storage, disk performance, RAM, and CPU. So I will break this topic up into four posts, one dealing with each specific issue.

Disk Storage
This is probably the single hardest issue to tackle, and is most effected by aggregate event volume and will absolutely change at least once over the course of a year. This is also affected by your retention policy. You'll have to decide how long data stays in the SIM's tables before being shuffled loose this mortal coil.

If you don't work with a SIM already, you may wonder why folks that work with SIM talk about 'events' instead of 'log entries.' Aside from the vague philosophical issues around how many log entries make an event or how many events are documented in a log entry, the answer is simple - your SIM generates new events. This becomes a new sizing problem, "If I throw 1M firewall log entries and 500K EventLog entries at the SIM on a daily basis, how many new events will the SIM generate?" So now you're starting to get an idea of why SIM sizing is tricky. You can't just get by on `wc -l /var/log/messages` when trying to figure out how much storage you need.

Also, your log sources are going to change. Yes, you will probably fall in love with your SIM and want to put everything in it. But even if you don't your logs will change on you. Software updates to your servers or firmware upgrades to your embedded devices can and will change what and how data is logged. A recent real-life example is when we replaced our old content filtering solution for a new one that uses the firewall for enforcement. This didn't increase the number of firewall events we received at all. But it added all sorts of data about URLs and users to the traffic that was subject to content filtering. This doubled the byte count of about 35% of all firewall events inserted into the database. Surprise!

My advice is plan to expand. We're talking SAN-attached storage, volumes that can be resized online, and so on. Also plan to monitor table stats on a regular basis. You want to know before you run out of space that you need to expand.

No comments: