Monday, November 18, 2013

The pitfalls in analyzing data

The following extract warns in a light hearted manner of the dangers and pitfalls in analyzing data:

"One of the good things about open data is that the average person can verify the kind of bovine ordure that often passes for insight and inference on TV news channels. The bad thing, however, is that with more data comes the potential for a whole new wave of fallacious analyses.

For instance, has crime gone up or has crime reporting gone up? Let’s say, hypothetically, that the jail occupancy numbers from 1953 to 2012 for the State of Andhra Pradesh show a steadily rising trend with a sudden drop in the 2000s, followed by a steady rise again. You can interpret this data in many ways. 

The opposition could say that this is symptomatic of continuously deteriorating governance. The Police could say that this is proof that they are getting better at catching criminals over time. The chap in charge of prisons in the State could say that it’s indicative of his department’s commitment to increasing jail capacity all the time. The government in power during that sudden drop in the 2000s could claim that it had a Sherlockspalli Holmesreddy whose magic wand pulled the inexorable crime rate line down. The opposition then could argue that it had nothing to do with better policing but the choice to migrate government computers from MS Office to Open Office, a move that resulted in improper use of spreadsheet software thus resulting in the alleged drop in crime. I could argue that the trend correlates directly to the quality of biriyani served in prisons and that the drop in 2000 is due to a change in caterer. And finally, someone with some common sense might even ask if jail occupancy, crime reporting and crime are different things altogether."
Taken from a great piece by Krish Ashok in The Hindu