Day 6 Linear Mixed Models
September 15th
6.1 Announcements
- Linear mixed models workshop
- Assignment 3 is posted. See Google docs here.
- Assignment 2 grades will be posted this evening.
- Project proposal due Sept 24 (next week)
- Schedule an appointment
- Talk to your classmates to find a partner
- What’s that wooden cup I drink from:
6.2 Mixed Models Review
- Distributions of the data
- Linear predictor
- Models for some of the parameters (e.g., \(u \sim N(0, \sigma^2_u)\)).
- Mixed models are hierarchical models
- Sharing information between groups
6.3 Estimation of parameters in Mixed Models
6.3.1 Analysis of variance (ANOVA)
- ANOVA can be considered a special case of the linear model.
- Each row of the ANOVA table corresponds to a source of variation and the variance of a corresponding to a set of regression coefficients.
| Source | df | Sum of Squares (Partial or type III) | Mean Square | Expected Mean Square |
|---|---|---|---|---|
| Block | \[r-1\] | \[SS_b\] | \[\frac{SS_b}{df_b}\] | \[\sigma^2_{\varepsilon}+c\sigma^2_w+ac\sigma^2_b\] |
| A | \[a-1\] | \[SS_A\] | \[\frac{SS_A}{df_A}\] | \[\sigma^2_{\varepsilon}+c\sigma^2_w+\phi^2(\alpha)\] |
| Error(whole plot) | \[(r-1)(a-1)\] | \[SS_w\] | \[\frac{SS_w}{df_w}\] | \[\sigma^2_{\varepsilon}+c\sigma^2_w\] |
| C | \[c-1\] | \[SS_C\] | \[\frac{SS_C}{df_C}\] | \[\sigma^2_{\varepsilon}+\phi^2(\gamma)\] |
| A x C | \[(a-1)(c-1)\] | \[SS_{AC}\] | \[\frac{SS_{AC}}{df_{AC}}\] | \[\sigma^2_{\varepsilon}+\phi^2(\alpha \gamma)\] |
| Error(split plot) | \[a(r-1)(c-1)\] | \[SS_e\] | \[\frac{SS_e}{df_e}\] | \[\sigma^2_{\varepsilon}\] |
- See Gelman (2005).
6.3.2 Estimation of fixed effects
Fixed effects are normally estimated using least squares (\(\hat{\boldsymbol{\beta}}_{LSE}\)) or maximum likelihood estimation (\(\hat{\boldsymbol{\beta}}_{MLE}\)). Under the assumption of a normal distribution, \(\hat{\boldsymbol{\beta}}_{LSE}=\hat{\boldsymbol{\beta}}_{MLE} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\).
Least squares estimation (LSE) does not make distributional assumptions, it’s basically just optimizing a function of the squared differences between the estimations and the observations.
\[(\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})^2 = (\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})^T(\mathbf{y}-\mathbf{X}\boldsymbol{\hat{\beta}}) = \mathbf{y}^T\mathbf{y} - 2\mathbf{y}^T\mathbf{X}\hat{\boldsymbol{\beta}} + \hat{\boldsymbol{\beta}}^T \mathbf{X}^T \mathbf{X} \hat{\boldsymbol{\beta}}\]
Set first derivative to zero
\[\underset{\boldsymbol{\hat{\beta}}}{\mathrm{argmin}}\, (\mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}})^2 = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]
Because LSE does not make distributional assumptions, there’s no variance parameter associated to our model, and we don’t get any uncertainty estimates.
Maximum likelihood estimation (MLE) does make distributional assumptions.
\[\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\hat{\beta}}} {\mathrm{argmax}}\, \mathcal{L}_n(\boldsymbol{\hat{\beta}};\mathbf{y})\]
Computationally, we normally target the log-likelihood:
\[\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\hat{\beta}}} {\mathrm{argmin}}\, - \ln \mathcal{L}_n(\boldsymbol{\hat{\beta}};\mathbf{y})\]
Fixed effects are often called BLUEs:
- Best: minimum variance
- Linear
- Unbiased
- Estimator
6.3.3 Estimation of variance components
Variance components may be estimated with a range of different methods.
Maximum Likelihood Estimation of variance components provides downward biased estimates for small data (most experimental data). \(\ell_{ML}(\boldsymbol{\sigma; \boldsymbol{\beta}, \mathbf{y}}) = - (\frac{n}{2}) \log(2\pi)-(\frac{1}{2}) \log ( \vert \mathbf{V}(\boldsymbol\sigma) \vert ) - (\frac{1}{2}) (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})^T[\mathbf{V}(\boldsymbol\sigma)]^{-1}(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})\)
Restricted Maximum Likelihood (REML) avoids MLE’s downward bias under small data and is thus is the default in most mixed effects models.
In REML, the likelihood is maximized after accounting for the model’s fixed effects.
In REML, \(\ell_{REML}(\boldsymbol{\sigma};\mathbf{y}) = - (\frac{n-p}{2}) \log (2\pi) - (\frac{1}{2}) \log ( \vert \mathbf{V}(\boldsymbol\sigma) \vert ) - (\frac{1}{2})log \left( \vert \mathbf{X}^T[\mathbf{V}(\boldsymbol\sigma)]^{-1}\mathbf{X} \vert \right) - (\frac{1}{2})\mathbf{r}[\mathbf{V}(\boldsymbol\sigma)]^{-1}\mathbf{r}\), where \(p = rank(\mathbf{X})\), \(\mathbf{r} = \mathbf{y}-\mathbf{X}\hat{\boldsymbol{\beta}}_{ML}\).
- Start with initial values for \(\boldsymbol{\sigma}\), \(\tilde{\boldsymbol{\sigma}}\).
- Compute \(\mathbf{G}(\tilde{\boldsymbol{\sigma}})\) and \(\mathbf{R}(\tilde{\boldsymbol{\sigma}})\).
- Obtain \(\boldsymbol{\beta}\) and \(\mathbf{b}\).
- Update \(\tilde{\boldsymbol{\sigma}}\).
- Repeat until convergence.
- Start with initial values for \(\boldsymbol{\sigma}\), \(\tilde{\boldsymbol{\sigma}}\).
Variance components may be \(<0\); in that case they are set to zero.
Method of moments (MoM) is derived from ANOVA tables that include the expected mean squares.
- Use actual sums of squares and expected mean squares (EMS) from ANOVA to clear for the different variance components.
- e.g., \(\tilde{\sigma}^2_{w} = \frac{SSw - SS_e}{c}\) (See table above). However, if \(\tilde{\sigma}^2_{w} < 0\), \(\hat{\sigma}^2_{w} = 0\).
Issues with variance estimates equal to zero
- Type I error inflation
- Frey et al. (2024)
6.3.4 Estimating random effects
Random effects arise from a multivariate normal distribution
\[\mathbf{u} \sim MVN (\boldsymbol{0}, \mathbf{G}),\] where \(\mathbf{G} = \begin{bmatrix} \sigma^2_b \mathbf{I} & 0 \\ 0& \sigma^2_w \mathbf{I} \end{bmatrix}\).
- Random effects are shrunk.
- Shrinkage depends on magnitude of variance estimate.
Random effects are often called BLUPs:
- Best: minimum variance,
- Linear
- Unbiased
- Predictor
6.3.5 Degrees of freedom in mixed models
- Denominator degrees of freedom may not be exactly derived from ANOVA tables.
- Increase the complexity of covariance structure, and the df will be more different to the ones on the ANOVA table.
- Degrees of freedom are approximated under mixed models for bias correction.