MATLAB!

I’m now the proud owner of a MATLAB Home license:

MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java.

Why did I do it?  1)  I want to be able to tinker at home,  2) the license for home use is only $150, 3) IDL, my primary programming language for nearly 20 years now, doesn’t offer home use licenses (Commercial licenses for MATLAB and IDL each run about $2k.), and 4)  MATLAB has become the lingua franca of technical computing.

IDL is great for algorithm development and I like it better than MATLAB for image display – not to mention that in the head-to-head comparisons I’ve done custom algorithms run 2-3x faster in IDL than in MATLAB – but outside of a few niche areas hardly anyone knows it.   I’m finding that’s increasingly an issue at work.  At a minimum I need to be fluent in both.

When I first started using MATLAB at work the analogy I made was switching from French to Spanish as your primary language.  (I don’t speak either but the analogy sounded appropriate.)  The structure of my code would be fine but there’d be tons of grammatical errors.   (Row major vs column major still gives me fits, as do matrix indices running 1 to n as opposed to 0 to n-1.  Oh, and the difference between what you get back from their respective 2D FFT functions?  It took me two weeks to discover that the difference in convention mattered for what I was doing and then to get the MATLAB-IDL mapping right.)

Continue reading

Scaling factor for thin plate smoothing spline?

I have a problem which may benefit from the application of a thin plate smoothing spline.    I need to interpolate values on a 2-d grid where I have sparse data.  The thin plate kernel is the basis for the spline solution.  In 2-d the solution corresponds to

(1)   \begin{eqnarray*} \lefteqn{ \min_{\beta, \theta} \Big( \sum_{i=1}^n{ \left[ y_i - f_i( \beta, \theta) \right]^2} + } \nonumber\\ & \lambda \int_{u,v} \left( \frac{\partial^2f}{\partial u^2} \right)^2 + 2 \left( \frac{\partial^2f}{\partial u \partial v} \right)^2 + \left( \frac{\partial^2f}{\partial v^2}  \right)^2 \mathrm{d}u \mathrm{d}v \Big) \end{eqnarray*}

where y_i = y(u_i,v_i)  are data values and f_i( \beta, \theta) = f(u_i,v_i;\beta, \theta) is the model function.  Once the maximum likelihood parameter values have been determined f(\beta, \theta)  is used to generates y values at all spatial positions, (u,v).

Long story short, in the thin plate model the integral in Eq.(1) is reduced to sum over reproducing kernels

(2)   \begin{equation*} k\left( \mathbf{x}; \mathbf{x}_i \right) = \| \mathbf{x}- \mathbf{x}_i \|^2 \log \left(  \| \mathbf{x} - \mathbf{x}_i \| \right) \end{equation**}

where \mathbf{x} = \left[ u,v \right] are spatial coordinates, \mathbf{x}_i are control points and \| \mathbf{x} - \mathbf{x}_i \| is the Euclidean norm of \mathbf{x} - \mathbf{x}_i.

Referring back to Eq.(1), the minimization process determines the weight coefficient for each of the kernel functions as well as a DC offset and the coefficients for two linear terms in f( x,y).  For a given \lambda value, the number of parameter values to be estimated is equal to the number of control points plus three.  Use Generalized Cross-Validation (GCV) to determine the optimum value of \lambda.

The thin plate formulation, i.e., the reproducing kernel element of it, is nice because it allows you to determine parameter values using matrix math that you do for maximum likelihood estimation.  I get the general form of the solution.  (Follow the link above if you want the details.)  What I’m stuck is that there’s no scaling parameter in Eq.(3).  More specifically, why isn’t the kernel function

(3)   \begin{equation*} k\left( \mathbf{x}; \mathbf{x}_i, s \right) = \frac{ \| \mathbf{x}- \mathbf{x}_i \|^2 }{s^2} \log \left(  \frac{ \| \mathbf{x} - \mathbf{x}_i \| }{s} \right) \end{equation**}

Continue reading

“Data without a good model is numerical drivel.”

Eli Rabbett, Who You Gonna Trust, Models or Data?:

Paul Krugman makes a useful point at his already established blog

It’s not the reliance on data; numbers can be good, and can even be revelatory. But data never tell a story on their own. They need to be viewed through the lens of some kind of model, and it’s very important to do your best to get a good model. And that usually means turning to experts in whatever field you’re addressing.

because, if nothing else there are things about the data that they know that you do not.  Now Krugman goes on but Eli would like to pause here and, as he did at the NYTimes and discuss how data is not always right.

Data without a good model is numerical drivel. Statistical analysis without a theoretical basis is simply too unrestrained and can be bent to any will. A major disaster of the last years have been the rise of freakonomics and “scientific forecasting” driven by “Other Hand for Hire Experts”

When data and theory disagree, it can as well be the data as the theory. The disagreement is a sign that both need work but if the theory is working for a whole lot of other stuff including things like conservation of energy as well as other data sets, start working on the data first.

Eli’s comment that “Data without a good model is numerical drivel.” reminds John Tukey‘s observation:

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

(That’s not something that “data scientists” or aficionados of Big Data seem inclined to appreciate but so be it.)

Continue reading

Visualization of k-means clustering and LOESS

Rafa Irizarry has created GIFs to illustrate k-means clustering and LOESS regression.  See them at Simply Statistics:

No associated descriptions of k-means and LOESS so if you don’t know what those methods are then do a little background reading before checking out his GIFs.

Links:

A scientific idea ready for retirement

Related to the question, “What scientific idea is ready for retirement?” at Edge.org, Andrew Gelman reports this email exchange:

Patrick Steigler: So, what would [your] opinion be on the scientific idea that is ready for retirement?

Gelman: I guess it’s a good thing they didn’t ask me!

Steigler: How about the retirement of p value < .05 for significance?

Gelman: Yeah, but that’s too easy. It’s like if someone asks for a pop music recommendation and you say The Beatles.

Steigler: True, but the idea could still use some drum beating about since it does not seem to be as fashionable as The Beatles yet (but hopefully some day!).

I couldn’t agree more.

Most important reads of 2013

In the spirit of the Must Read section of my Weekly Digests, I looked through my posts and reading list from 2013 and found what I thought were the most important reads.  My main selection criterion was that the piece motivated thinking about bigger issues.   With that in mind, there was one important subject where I didn’t note an exemplar article:  domestic surveillance by the NSA.  Recently there’s been a fair amount of reporting on the NSA being slapped down in the courts – see, e.g., this – but nothing I’ve read has struck me as a “must read” piece which captures the significance of the whole.   That noted, here’s my list of most important reads and listens from 2013 (in no particular order):

Crickets chirping

Andrew Gelman had an interesting post the other day, “Whither the ‘bet on the sparsity principle’ in a nonsparse world?”  An excerpt:

Rob Tibshirani writes:

Hastie et al. (2001) coined the informal “Bet on Sparsity” principle. The l1 methods assume that the truth is sparse, in some basis. If the assumption holds true, then the parameters can be efficiently estimated using l1 penalties. If the assumption does not hold—so that the truth is dense—then no method will be able to recover the underlying model without a large amount of data per parameter.

I’ve earlier expressed my full and sincere appreciation for Hastie and Tibshirani’s work in this area.

Now I’d like to briefly comment on the above snippet. The question is, how do we think about the “bet on sparsity” principle in a world where the truth is dense? I’m thinking here of social science, where no effects are clean and no coefficient is zero…, where every contrast is meaningful—but some of these contrasts might be lost in the noise with any realistic size of data.

I think there is a way out here, which is that in a dense setting we are not actually interested in “recovering the underlying model.” The underlying model, such as it is, is a continuous mix of effects. If there’s no discrete thing to recover, there’s no reason to worry that we can’t recover it!

I had a comment which, while a bit tangential, I thought valuable.  A lightly edited version:

LASSO (Tibshirani’s regression method which incorporates a sparsity-promoting regularization term) is presented in the context of linear mixing model. It’s formulated to deal with a scenario where the weight coeffs of the basis functions are sparse. When we ask “Is the world (non-)sparse?” to what extent are we asking “Is the world (non-)sparse in the context of a linear subspace model?” Linear subspace models are extremely useful but they’re not universal. Suppose your data can be described as a low-dimensional manifold which, when viewed globally, isn’t well accounted for using a subspace model? Would you call your data sparse or non-sparse?

And I provided two references:
C.M. Bachmann et al, “Improved Manifold Representations of Hyperspectral Imagery
S.T. Roweis and L.K. Saul, “Nonlinear Dimensionality Reduction by Locally-Linear Embedding,” Science, vol. 290, pp.2323-2326, 22 Dec 2000.

Continue reading