Day 17 Smoothing Splines

17.1 Announcements

  • Comments on Assignment 4
  • Assignment 5 is due this Friday

17.2 Non-parametric tools

  • Minimize/relax assumptions
  • No free lunches!
    • Interpretability
    • bias-variance tradeoff

17.2.1 Splines

  • Splines are special cases of non-parametric tools.
  • Introduced in the sixties (Schoenberg, 1964)
  • They provide a flexible tool to model the variability in the data, where the functional (~ “the shape”) is unknown

Polynomials:

  • good for local approximation
  • bad for global approximation

We can represent the data with the equation \[y_i = \beta_0 + g(x_i) + \varepsilon_i, \\ \varepsilon_i \sim N(0, \sigma^2),\] where

  • \(g(x_i) = \sum_{i=1}^k B_i^m(x) \beta_i\).

  • B-splines

    • minimize \(\sum_{i=1}^n \left\{ y_i - (\mathbf{x}_i'\boldsymbol{\beta}_x + \mathbf{B}^T(\mathbf{x}_i) \boldsymbol{b}) \right\}^2\)
  • Penalized splines

    • low rank smoothers using a B-spline basis
    • minimize \(\sum_{i=1}^n \left\{ y_i - (\mathbf{x}_i'\boldsymbol{\beta}_x + \mathbf{B}'(\mathbf{x}_i) \boldsymbol{b}) \right\}^2 + \lambda \boldsymbol{b}'\mathbf{D}\boldsymbol{b}\)
  • Cyclic splines

17.3 Thin-plate regression splines

  • Origin of the name “thin-plate”
  • radial basis functions
  • supports multiple predictor variables (unlike other basis)
  • avoid the problem of knot placement
  • not so computationally costly, but may become more relevant for large data (scaling \(O(k^3)\))
  • basis functions and basis funciton dimension
  • See Chapter 5 in Wood (2017)

In thin-plate regression, we can represent the data with the equation \[y_i = \beta_0 + g(\mathbf{x}_i) + \varepsilon_i, \\ \varepsilon_i \sim N(0, \sigma^2),\] where \(\mathbf{x}_i\) is a \(d-\)vector with the predictors. Thin-plate smoothers estimate \(g(\cdot)\) by finding the \(\hat{f}(\cdot)\) that minimizes \[||\mathbf{y} - \mathbf{f}||^2 + \lambda J_{md}(f),\] where:

  • \(\mathbf{y}\) is the vector of observations,
  • \(\mathbf{f} = [f(\mathbf{x}_1), f(\mathbf{x}_2), \dots, f(\mathbf{x}_n)]^T\),
  • \(J_{md}(f)\) is penalty functional affecting the ‘wiggliness’ of \(f\)
  • \(\lambda\) is a smoothing parameter controlling the tradeoff between data fitting and smoothness of \(f\)
From Wood (2007)

Figure 17.1: From Wood (2007)

x <- seq(1, 50, by = 1)
set.seed(42)
y <- 15 + 15*sin(sqrt(x*.15  - x*.006))+ rnorm(length(x), 0, 1.5)
plot(x, y)

splines_df <- data.frame(x, y)
m_spline_tp <- gam(y ~ s(x, bs = "tp", k = 7), method = "REML", data = splines_df)

splines_df$bs_spline <- predict(m_spline_tp, type = "response")
splines_df$bs_spline_se <- predict(m_spline_tp, type = "response", se.fit = T)$se.fit

knots <- m_spline_tp$smooth[[1]]$knots

coef(m_spline_tp)[grep("s\\(x\\)", names(coef(m_spline_tp)))]
##    s(x).1    s(x).2    s(x).3    s(x).4    s(x).5    s(x).6 
## -7.076170  9.538457 -4.898365  7.277189 18.206627  6.552186
beta_smooth <- coef(m_spline_tp)[grep("s\\(x\\)", names(coef(m_spline_tp)))]

Xp <- predict(m_spline_tp, type = "lpmatrix")  # each column = basis function

# Remove intercept column for plotting
Xp_smooth <- Xp[, grep("s\\(x\\)", colnames(Xp))]

# Plot basis functions
matplot(x, Xp_smooth, type = "l", lty = 1, col = rainbow(ncol(Xp_smooth)),
        main = "Spline Basis Functions", xlab = "x", ylab = "Basis value")

# Add contribution from each basis function
# multiply basis functions by their contribution 
y_basis <- sweep(Xp_smooth, 2, beta_smooth, "*")
matplot(x, y_basis, type = "l", lty = 1, col = rainbow(ncol(Xp_smooth)),
        main = "Contribution of Each Basis Function", xlab = "x", ylab = "Contribution")

df_basis <- as.data.frame(cbind(x, Xp_smooth)) %>% 
  pivot_longer(cols = `s(x).1`:`s(x).6`)

splines_df %>% 
  ggplot(aes(x, y))+  
  geom_line(aes(y = 18 + value*10, 
                group = name, 
                color = name),
            show.legend = F,
            data = df_basis, 
            alpha = .6)+
  theme_classic()+
  theme(panel.border = element_blank(), 
        panel.grid = element_blank())+
  geom_vline(aes(xintercept = knots), data = data.frame(knots), linetype = 2)+
  geom_line(aes(y = bs_spline+(bs_spline_se*1.96)), linetype = 2)+
  geom_line(aes(y = bs_spline-(bs_spline_se*1.96)), linetype = 2)+
  geom_line(aes(y = bs_spline))+
  coord_cartesian(xlim = c(0, 50))+
  labs(x = "Columns (x1)", y = "y or f(x1)")+
  geom_point()

matplot(x, rowSums(y_basis), type = "l", lty = 1, col = rainbow(ncol(Xp_smooth)),
        main = "Sum of the contributions of all basis functions", xlab = "x", ylab = "f(x)")

17.4 Final comments

  • Generalized additive models
  • Choosing parametric vs. semi-parametric
  • Choosing types of splines
    • B-splines
    • P-splines
    • Thin-plate splines
  • Model selection
  • G-side vs. R-side discussion

17.5 Resources

  • Wood, S.N. (2017). Generalized Additive Models. Chapman and Hall/CRC. [link]
  • Ruppert, D. (2004). Nonparametric Regression and Splines. In: Statistics and Finance. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6876-0_13