This past week, we upgraded our production ArcSight environment from 3.5 SP2 to 4.0 SP1. We've been running ArcSight 4.0 in our test environment since August 2007, but as with all things "test" there are always a few things that you won't discover in a test environment.
This is the first part of a 2-part series on the upgrade experience. I want to describe some technical challenges that we experienced that you won't find in the upgrade documentation in hopes that someone else finds this helpful.
Before I start talking about what went wrong, I should say that the process was seemingly painless aside from what I describe here. I say seemingly, because I didn't actually do it. But Tim, who did the 2-part upgrade and redesign over the course of three weeks, was still smiling on Thursday following the upgrade. He seems to have emerged from the gauntlet unscathed. I think this is because Tim's a bad-ass and also because ArcSight dramatically improved the upgrade process in 4.0.
So the first problem we ran into has to do with some changes to the Jetty code in the ArcSight manager between 3.5 and 4.0. Here's the error we got:
[2008-02-06 13:50:55,727][INFO ] [default.com.arcsight.server.Jetty311ServletContainer] [initialize] Key Store: [JKS] /opt/arcsight/manager/config/jetty/keystore com.arcsight.common.InitializationException: Exception initializing 'com.arcsight.server.Jetty311ServletContainer': The keystore may not contain more than 1 entry. Please remove excessive entries. at com.arcsight.server.Jetty311ServletContainer.initialize (Jetty311ServletContainer.java:288)
The keystore file was the same one we'd been using since 3.0. We generated our own CA key pair and certificate from OpenSSL and signed the certificate for the keystore file with it. We then added the CA certificate to the cacerts file that is used by all of the other components. While going through this process, I added the CA cert to the keystore file for posterity.
The CA cert being present in the keystore file is what caused the error, and editing the keystore with 'keytoolgui' to remove the CA cert was all it took to get it back up and running.
The other issues that we ran into occured post-upgrade and had to do with upgraded content. The first issue we saw isn't really an issue at all, rather ArcSight's improved some of its logic around Active Channels. Specifically, Active Channels that were created using one time stamp and configured to sort on another time stamp will give an error now. For example, if you created a channel and set EndTime for "Use as time stamp," and then on the Sort tab, set a sort for ManagerRecieptTime, this would cause an error in 4.0. In 3.5 it would merely hurt the channels performance, but would eventually load. This is a good change, but it may mean having to edit some of your old (and presumably really slow) Active Channels before they work again.
The second issue we saw was around filters. ArcSight has made some major improvements to Filter objects, and I'll talk more about that in the upcoming post. However, one drawback seems to be a bug in the parsing/escaping enhancements in 4.0. Filters that use a string match that contains angle brackets ( [ or ] ) will return null sets. There is no error, the only symptom is null results. In our case, all of the brackets appeared in filters matching on the Name field (gotta love undocumented syslog formats). The solution is to revise your filters to not use the brackets.