Paul Melson's Blog: Useless Statistics Returns!

Wednesday, March 26, 2008

Useless Statistics Returns!

Breaking News: Information Security is still not a science and vendors still suck at statistics.

What? You already knew that? Well, somebody forgot to tell WhiteHat. You'd think they might learn from their competitor's mistakes.

I'll save you 4 of the 5 minutes necessary to read the whole thing by summarizing WhiteHat's press-release-posing-as-a-study for you. They collected vulnerability statistics from an automated scanning tool that they sell (and give away demo use of during their sales cycle). From that, they generated some numbers about what percent of sites had findings, what types of findings were the most common, and what verticals some of the sites' owners are in. Then they let their marketing folks make silly claims based on wild speculation based on inherently flawed data. Anyway, I guess this isn't the first one that WhiteHat has put out there. They've been doing it quarterly for a year. But this is the first time I had a sales guy forward one to me. Can't wait for that follow-up call.

So what's wrong with WhiteHat's "study?" First, they collected data using an automated tool. Anybody that does pen-testing knows that automated tools will generate false positives. And based on my experience - which does not include WhiteHat's product, but does include most of the big name products in the web app scanner space - tests for things like XSS, CSRF, and blind SQL injection are, by their nature, prone to a high rate of false positives. No coincidence, XSS and CSRF top their list of vulnerabilities found by their study.

Second, their data is perhaps even more skewed by the fact that they let customers demo their product during their sales cycle. And if you want to demonstrate value doing pre-sales, you will want to show the customer how the product works when you know there will be results. Enter WebGoat, Hacme Bank, and the like. These are an SE's best friends when doing customer demos because there's nothing worse than scanning the client's web app only to come up with no results. It doesn't show off the product's full capabilities, and it pretty much guarantees that the customer won't buy. Of course, what these do to the "study" is to artificially drive the number of findings up. Way up.

Finally, and perhaps best of all, when Acunetix did this exact same thing last year, it turned into a giant, embarrassing mess. Mostly for Joel Snyder at NetworkWorld. The real killer for me is that I know that Jeremiah Grossman, WhiteHat's CTO and a smart guy, was around for that whole thing.

Oh, well. Maybe we'll luck up and Joel Snyder will give us a repeat performance as well.

But just like last time, the real loser is the infosec practitioner. This kind of "research" muddies the waters. It lacks any rigor or even basic data sampling and normalization methodologies. Hell, they don't even bother to acknowledge the potential skew inherent in their data set. It's not that WhiteHat's number is way off. In fact, I'd say it's probably pretty reasonable. But if they - or if infosec as a professional practice - want to be taken seriously, then they (and we) need to do something more than run a report from their tool for customer=* and hand it to marketing to pass around to trade press.

2 comments:

Jeremiah Grossman said...: Hi Paul,

I just thought I should take a moment and clarify.

The statistics report is based on data collected Sentinel Service, a combination of automated scanning and custom testing performed by humans. False positives are all but entirely eliminated by the operations team. The service quality is equal to or greater than the results by any professional VA firm and our methodology is described in the report.

I would agree that statistics based ONLY on raw scanner output would be lacking in value, but that's not what we've done.

CSRF does NOT top our lists because no one can yet find it effectively with a scanner, also part of the report.

Secondly, no "test" websites of any kind are in the data. The vast majority are real-live production websites offset sometimes by staging and intranet websites. When customers demo the service, its one on their real websites (production or staging), nothing faked.

Of course we'll see if smokin' Joel Snyder has anything to say on the matter.; March 30, 2008 at 7:38 PM
PaulM said...: Hi Jeremiah,

Thanks for the response. If we agree that rigorous data collection is required for this kind of study to be meaningful, and that WhiteHat does in fact address these issues in a straightforward manner, then I would encourage your company to include that in future studies.

I fully understand why you cannot publish the data set in its entirety. However, by publishing your data collection and analysis methods and disclosing how you reduce false positives, you serve WhiteHat's image, but you also serve the larger dialog about how research is carried out in the infosec industry as a whole. So many vendors are so bad about publishing anecdotes, hype, and FUD. Any time you do it right, stand up and say so. Raise the bar for your peers and competitors. Your customers (and their customers) will ultimately thank you.

PaulM; April 1, 2008 at 1:03 PM