Generalized ridge regression
Assume that we are in the standard supervised learning setting, where we have a response vector and a design matrix . Ridge regression is a commonly used regularization method which looks for that minimizes the sum of the RSS and a penalty term:
where is a hyperparameter. In generalized ridge regression, we solve a more complicated problem:
where (i) is a (usually diagonal) weight matrix that gives different observations different weights, and (ii) is a penalty matrix. (We’ve absorbed the hyperparameter into .)
(Note: Sometimes when people refer to “generalized ridge regression”, they mean the problem above but with and , where is from the singular value decomposition and is some positive definite diagonal matrix.)
The generalized ridge regression solution can be solved for in the same way as ridge regression. Taking derivatives with respect to and setting it equal to zero, we obtain
If is invertible, then we can solve for :
Given a set of observations , a smoothing spline is the function which is the solution to
where is a smoothing hyperparameter, and the argmin is taken over an appropriate Sobolev space for which the second term above is well-defined.
When , can be any function that interpolates the data. When , we must have for (almost surely) all , meaning that must be the least squares fit. As increases from 0 to , the functions go from very rough to very smooth.
It turns out that for , we can show that the smoothing spline is a natural cubic spline with knots at the unique values of the (see Section 5.2.1 of Elements of Statistical Learning).
How does generalized ridge regression relate to smoothing splines?
Since a smoothing spline is a natural cubic spline with knots at the unique values of the , we can write the solution as a linear combination of the natural spline basis functions:
where are the natural spline basis functions:
Writing with , and with , the minimization problem over now becomes a minimization problem over :
which is a generalized ridge regression problem! Hence, we can solve for :
The one remaining loose end is to show that is invertible. Slide 2 of reference 2 is a one-slide proof that the matrix is positive definite, and hence it must be invertible.
- van Wieringen, W. Ridge regression.
- Hansen, N. R. (2009). Statistics Learning.