Tuesday, June 9, 2009

From The Inbox

Anonymous writes,

Hi Paul, I am one of those who, as you say, found your blog by googling ArcSight, trying to do some recon on the product for my employer. (I think I see that the most recent posts here are from 2007 so who knows if you or anybody will be seeing my question.) I'm trying to find out, can Arcsight's data be queried programmatically; i.e. is it stored in a relational database, hopefully SQL Server or Oracle, or if not, is there an API or ADO.NET provider that can allow it to be queried, preferably with SQL? Thanks for any info anyone reading can provide.


ArcSight ESM uses Oracle 10g for its back-end database. At one point, and this may still be true, DB2 was also supported. You can query the database directly, and the schema is pretty straightforward. The table ARC_EVENT_DATA is where most of the event data lives, for example. But depending on your use case, that might not be the best way to get data out of ESM.

Also, since you didn't specify, it may be worth mentioning that the same is not true of the ArcSight Logger platform, which is flat storage. Instead of querying the log store directly, Logger can be configured to forward events based on source, type, etc. to another destination, if you need them in real-time. There is a PostegreSQL database on Logger, but it's my understanding that it supports the reports engine, and doesn't store the raw or CEF events in any comprehensive way.

The interesting thing is that the storage technology behind Logger 3.0, because of its performance and relative "cheapness" may become the data store for ESM down the road. It would only make sense, since you could handle MUCH higher event rates with less disk and no Oracle license fee. If it can be done while maintaining the stability and feature set that the Oracle-based data store has, it's a walk-off home run for ArcSight.

2 comments:

Unknown said...

Hi, sorry but the schema structure is not as straight-forward as you seem to assume. Especially when you have to perform queries mixing events and resources. I've created a script giving you the amount of lag you have per device (by computing differences between the various timestamps : agentReceiptTime / managerReceiptTime / endTime) and the query is 20 lines long...

PaulM said...

I will grant you that. As soon as you get into querying multiple tables beyond just event data, like assets and cases, then the queries get hairy.