Day 6 Linear Mixed Models

September 15th

6.1 Announcements

Linear mixed models workshop
Assignment 3 is posted. See Google docs here.
Assignment 2 grades will be posted this evening.
Project proposal due Sept 24 (next week)
- Schedule an appointment
- Talk to your classmates to find a partner
What’s that wooden cup I drink from:

Lionel Messi drinking mate after winning the World Cup in 2022.

Yerba mate Wikipedia
Yerba Mate Tea (Ilex paraguariensis): A Comprehensive Review on Chemistry, Health Implications, and Technological Considerations. Heck and De Mejia, 2007.

6.2 Mixed Models Review

Distributions of the data
Linear predictor
Models for some of the parameters (e.g., \(u \sim N(0, \sigma^2_u)\)).
Mixed models are hierarchical models
Sharing information between groups

6.3 Estimation of parameters in Mixed Models

6.3.1 Analysis of variance (ANOVA)

ANOVA can be considered a special case of the linear model.
Each row of the ANOVA table corresponds to a source of variation and the variance of a corresponding to a set of regression coefficients.

ANOVA table for split-plot

Source	df	Sum of Squares (Partial or type III)	Mean Square	Expected Mean Square
Block	\[r-1\]	\[SS_b\]	\[\frac{SS_b}{df_b}\]	\[\sigma^2_{\varepsilon}+c\sigma^2_w+ac\sigma^2_b\]
A	\[a-1\]	\[SS_A\]	\[\frac{SS_A}{df_A}\]	\[\sigma^2_{\varepsilon}+c\sigma^2_w+\phi^2(\alpha)\]
Error(whole plot)	\[(r-1)(a-1)\]	\[SS_w\]	\[\frac{SS_w}{df_w}\]	\[\sigma^2_{\varepsilon}+c\sigma^2_w\]
C	\[c-1\]	\[SS_C\]	\[\frac{SS_C}{df_C}\]	\[\sigma^2_{\varepsilon}+\phi^2(\gamma)\]
A x C	\[(a-1)(c-1)\]	\[SS_{AC}\]	\[\frac{SS_{AC}}{df_{AC}}\]	\[\sigma^2_{\varepsilon}+\phi^2(\alpha \gamma)\]
Error(split plot)	\[a(r-1)(c-1)\]	\[SS_e\]	\[\frac{SS_e}{df_e}\]	\[\sigma^2_{\varepsilon}\]

See Gelman (2005).

6.3.2 Estimation of fixed effects

Fixed effects are normally estimated using least squares (\(\hat{\boldsymbol{\beta}}_{LSE}\)) or maximum likelihood estimation (\(\hat{\boldsymbol{\beta}}_{MLE}\)). Under the assumption of a normal distribution, \(\hat{\boldsymbol{\beta}}_{LSE}=\hat{\boldsymbol{\beta}}_{MLE} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\).

Least squares estimation (LSE) does not make distributional assumptions, it’s basically just optimizing a function of the squared differences between the estimations and the observations.

\[(\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})^2 = (\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})^T(\mathbf{y}-\mathbf{X}\boldsymbol{\hat{\beta}}) = \mathbf{y}^T\mathbf{y} - 2\mathbf{y}^T\mathbf{X}\hat{\boldsymbol{\beta}} + \hat{\boldsymbol{\beta}}^T \mathbf{X}^T \mathbf{X} \hat{\boldsymbol{\beta}}\]

Set first derivative to zero

\[\underset{\boldsymbol{\hat{\beta}}}{\mathrm{argmin}}\, (\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})^2 = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]

Because LSE does not make distributional assumptions, there’s no variance parameter associated to our model, and we don’t get any uncertainty estimates.

Maximum likelihood estimation (MLE) does make distributional assumptions.

\[\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\hat{\beta}}} {\mathrm{argmax}}\, \mathcal{L}_n(\boldsymbol{\hat{\beta}};\mathbf{y})\]

Computationally, we normally target the log-likelihood:

\[\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\hat{\beta}}} {\mathrm{argmin}}\, - \ln \mathcal{L}_n(\boldsymbol{\hat{\beta}};\mathbf{y})\]

Fixed effects are often called BLUEs:

Best: minimum variance
Linear
Unbiased
Estimator

6.3.3 Estimation of variance components

Variance components may be estimated with a range of different methods.

Maximum Likelihood Estimation of variance components provides downward biased estimates for small data (most experimental data). \(\ell_{ML}(\boldsymbol{\sigma; \boldsymbol{\beta}, \mathbf{y}}) = - (\frac{n}{2}) \log(2\pi)-(\frac{1}{2}) \log ( \vert \mathbf{V}(\boldsymbol\sigma) \vert ) - (\frac{1}{2}) (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^T[\mathbf{V}(\boldsymbol\sigma)]^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})\)
Restricted Maximum Likelihood (REML) avoids MLE’s downward bias under small data and is thus is the default in most mixed effects models.
In REML, the likelihood is maximized after accounting for the model’s fixed effects.
In REML, \(\ell_{REML}(\boldsymbol{\sigma};\mathbf{y}) = - (\frac{n-p}{2}) \log (2\pi) - (\frac{1}{2}) \log ( \vert \mathbf{V}(\boldsymbol\sigma) \vert ) - (\frac{1}{2})log \left( \vert \mathbf{X}^T[\mathbf{V}(\boldsymbol\sigma)]^{-1}\mathbf{X} \vert \right) - (\frac{1}{2})\mathbf{r}[\mathbf{V}(\boldsymbol\sigma)]^{-1}\mathbf{r}\), where \(p = rank(\mathbf{X})\), \(\mathbf{r} = \mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}}_{ML}\).
- Start with initial values for \(\boldsymbol{\sigma}\), \(\tilde{\boldsymbol{\sigma}}\).
- Compute \(\mathbf{G}(\tilde{\boldsymbol{\sigma}})\) and \(\mathbf{R}(\tilde{\boldsymbol{\sigma}})\).
- Obtain \(\boldsymbol{\beta}\) and \(\mathbf{b}\).
- Update \(\tilde{\boldsymbol{\sigma}}\).
- Repeat until convergence.
Variance components may be \(<0\); in that case they are set to zero.
Method of moments (MoM) is derived from ANOVA tables that include the expected mean squares.
- Use actual sums of squares and expected mean squares (EMS) from ANOVA to clear for the different variance components.
- e.g., \(\tilde{\sigma}^2_{w} = \frac{SSw - SS_e}{c}\) (See table above). However, if \(\tilde{\sigma}^2_{w} < 0\), \(\hat{\sigma}^2_{w} = 0\).
Issues with variance estimates equal to zero
- Type I error inflation
- Frey et al. (2024)

6.3.4 Estimating random effects

Random effects arise from a multivariate normal distribution

\[\mathbf{u} \sim MVN (\boldsymbol{0}, \mathbf{G}),\] where \(\mathbf{G} = \begin{bmatrix} \sigma^2_b \mathbf{I} & 0 \\ 0& \sigma^2_w \mathbf{I} \end{bmatrix}\).

Random effects are shrunk.
Shrinkage depends on magnitude of variance estimate.

Random effects are often called BLUPs:

Best: minimum variance,
Linear
Unbiased
Predictor

6.3.5 Degrees of freedom in mixed models

Denominator degrees of freedom may not be exactly derived from ANOVA tables.
Increase the complexity of covariance structure, and the df will be more different to the ones on the ANOVA table.
Degrees of freedom are approximated under mixed models for bias correction.
- Satterthwaite approximation
- Kenward-Roger approximation

6.4 R demo

Download R code