“Data without a good model is numerical drivel.”

Eli Rabbett, Who You Gonna Trust, Models or Data?:

Paul Krugman makes a useful point at his already established blog

It’s not the reliance on data; numbers can be good, and can even be revelatory. But data never tell a story on their own. They need to be viewed through the lens of some kind of model, and it’s very important to do your best to get a good model. And that usually means turning to experts in whatever field you’re addressing.

because, if nothing else there are things about the data that they know that you do not.  Now Krugman goes on but Eli would like to pause here and, as he did at the NYTimes and discuss how data is not always right.

Data without a good model is numerical drivel. Statistical analysis without a theoretical basis is simply too unrestrained and can be bent to any will. A major disaster of the last years have been the rise of freakonomics and “scientific forecasting” driven by “Other Hand for Hire Experts”

When data and theory disagree, it can as well be the data as the theory. The disagreement is a sign that both need work but if the theory is working for a whole lot of other stuff including things like conservation of energy as well as other data sets, start working on the data first.

Eli’s comment that “Data without a good model is numerical drivel.” reminds John Tukey‘s observation:

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

(That’s not something that “data scientists” or aficionados of Big Data seem inclined to appreciate but so be it.)

See also Andrew Gelman’s post, A statistical graphics course and statistical graphics advice.  If you don’t have a good model then visualizing the data may help you develop one.