Day 6 Linear Mixed Models

September 15th

6.1 Announcements

  • Linear mixed models workshop
  • Assignment 3 is posted. See Google docs here.
  • Assignment 2 grades will be posted this evening.
  • Project proposal due Sept 24 (next week)
    • Schedule an appointment
    • Talk to your classmates to find a partner
  • What’s that wooden cup I drink from:
Lionel Messi drinking mate after winning the World Cup in 2022.
Lionel Messi drinking mate after winning the World Cup in 2022.

6.2 Mixed Models Review

  • Distributions of the data
  • Linear predictor
  • Models for some of the parameters (e.g., \(u \sim N(0, \sigma^2_u)\)).
  • Mixed models are hierarchical models
  • Sharing information between groups

6.3 Estimation of parameters in Mixed Models

6.3.1 Analysis of variance (ANOVA)

  • ANOVA can be considered a special case of the linear model.
  • Each row of the ANOVA table corresponds to a source of variation and the variance of a corresponding to a set of regression coefficients.
ANOVA table for split-plot
Source df Sum of Squares (Partial or type III) Mean Square Expected Mean Square
Block \[r-1\] \[SS_b\] \[\frac{SS_b}{df_b}\] \[\sigma^2_{\varepsilon}+c\sigma^2_w+ac\sigma^2_b\]
A \[a-1\] \[SS_A\] \[\frac{SS_A}{df_A}\] \[\sigma^2_{\varepsilon}+c\sigma^2_w+\phi^2(\alpha)\]
Error(whole plot) \[(r-1)(a-1)\] \[SS_w\] \[\frac{SS_w}{df_w}\] \[\sigma^2_{\varepsilon}+c\sigma^2_w\]
C \[c-1\] \[SS_C\] \[\frac{SS_C}{df_C}\] \[\sigma^2_{\varepsilon}+\phi^2(\gamma)\]
A x C \[(a-1)(c-1)\] \[SS_{AC}\] \[\frac{SS_{AC}}{df_{AC}}\] \[\sigma^2_{\varepsilon}+\phi^2(\alpha \gamma)\]
Error(split plot) \[a(r-1)(c-1)\] \[SS_e\] \[\frac{SS_e}{df_e}\] \[\sigma^2_{\varepsilon}\]

6.3.2 Estimation of fixed effects

Fixed effects are normally estimated using least squares (\(\hat{\boldsymbol{\beta}}_{LSE}\)) or maximum likelihood estimation (\(\hat{\boldsymbol{\beta}}_{MLE}\)). Under the assumption of a normal distribution, \(\hat{\boldsymbol{\beta}}_{LSE}=\hat{\boldsymbol{\beta}}_{MLE} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\).

Least squares estimation (LSE) does not make distributional assumptions, it’s basically just optimizing a function of the squared differences between the estimations and the observations.

\[(\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})^2 = (\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})^T(\mathbf{y}-\mathbf{X}\boldsymbol{\hat{\beta}}) = \mathbf{y}^T\mathbf{y} - 2\mathbf{y}^T\mathbf{X}\hat{\boldsymbol{\beta}} + \hat{\boldsymbol{\beta}}^T \mathbf{X}^T \mathbf{X} \hat{\boldsymbol{\beta}}\]

Set first derivative to zero

\[\underset{\boldsymbol{\hat{\beta}}}{\mathrm{argmin}}\, (\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})^2 = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]

Because LSE does not make distributional assumptions, there’s no variance parameter associated to our model, and we don’t get any uncertainty estimates.

Maximum likelihood estimation (MLE) does make distributional assumptions.

\[\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\hat{\beta}}} {\mathrm{argmax}}\, \mathcal{L}_n(\boldsymbol{\hat{\beta}};\mathbf{y})\]

Computationally, we normally target the log-likelihood:

\[\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\hat{\beta}}} {\mathrm{argmin}}\, - \ln \mathcal{L}_n(\boldsymbol{\hat{\beta}};\mathbf{y})\]

Fixed effects are often called BLUEs:

  • Best: minimum variance
  • Linear
  • Unbiased
  • Estimator

6.3.3 Estimation of variance components

Variance components may be estimated with a range of different methods.

  • Maximum Likelihood Estimation of variance components provides downward biased estimates for small data (most experimental data). \(\ell_{ML}(\boldsymbol{\sigma; \boldsymbol{\beta}, \mathbf{y}}) = - (\frac{n}{2}) \log(2\pi)-(\frac{1}{2}) \log ( \vert \mathbf{V}(\boldsymbol\sigma) \vert ) - (\frac{1}{2}) (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^T[\mathbf{V}(\boldsymbol\sigma)]^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})\)

  • Restricted Maximum Likelihood (REML) avoids MLE’s downward bias under small data and is thus is the default in most mixed effects models.

  • In REML, the likelihood is maximized after accounting for the model’s fixed effects.

  • In REML, \(\ell_{REML}(\boldsymbol{\sigma};\mathbf{y}) = - (\frac{n-p}{2}) \log (2\pi) - (\frac{1}{2}) \log ( \vert \mathbf{V}(\boldsymbol\sigma) \vert ) - (\frac{1}{2})log \left( \vert \mathbf{X}^T[\mathbf{V}(\boldsymbol\sigma)]^{-1}\mathbf{X} \vert \right) - (\frac{1}{2})\mathbf{r}[\mathbf{V}(\boldsymbol\sigma)]^{-1}\mathbf{r}\), where \(p = rank(\mathbf{X})\), \(\mathbf{r} = \mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}}_{ML}\).

    • Start with initial values for \(\boldsymbol{\sigma}\), \(\tilde{\boldsymbol{\sigma}}\).
    • Compute \(\mathbf{G}(\tilde{\boldsymbol{\sigma}})\) and \(\mathbf{R}(\tilde{\boldsymbol{\sigma}})\).
    • Obtain \(\boldsymbol{\beta}\) and \(\mathbf{b}\).
    • Update \(\tilde{\boldsymbol{\sigma}}\).
    • Repeat until convergence.
  • Variance components may be \(<0\); in that case they are set to zero.

  • Method of moments (MoM) is derived from ANOVA tables that include the expected mean squares.

    • Use actual sums of squares and expected mean squares (EMS) from ANOVA to clear for the different variance components.
    • e.g., \(\tilde{\sigma}^2_{w} = \frac{SSw - SS_e}{c}\) (See table above). However, if \(\tilde{\sigma}^2_{w} < 0\), \(\hat{\sigma}^2_{w} = 0\).
  • Issues with variance estimates equal to zero

6.3.4 Estimating random effects

Random effects arise from a multivariate normal distribution

\[\mathbf{u} \sim MVN (\boldsymbol{0}, \mathbf{G}),\] where \(\mathbf{G} = \begin{bmatrix} \sigma^2_b \mathbf{I} & 0 \\ 0& \sigma^2_w \mathbf{I} \end{bmatrix}\).

  • Random effects are shrunk.
  • Shrinkage depends on magnitude of variance estimate.

Random effects are often called BLUPs:

  • Best: minimum variance,
  • Linear
  • Unbiased
  • Predictor

6.3.5 Degrees of freedom in mixed models

  • Denominator degrees of freedom may not be exactly derived from ANOVA tables.
  • Increase the complexity of covariance structure, and the df will be more different to the ones on the ANOVA table.
  • Degrees of freedom are approximated under mixed models for bias correction.

6.4 R demo

Download R code