flowchart TD
A[Start:\nHow is your data structured?] --> B{How many\nfactors?}
B -- One --> C{Independent\nobservations?}
B -- Two or more --> D{Balanced\ncells?}
B -- Repeated or\nnested --> E{Same unit\nmeasured twice?}
C -- Yes --> F[One-way ANOVA\nChapter 3]
C -- No --> G{Clustering\nor nesting?}
G -- Nesting --> H[Nested ANOVA\nChapter 7]
G -- Clustering --> I[Linear Mixed\nModel\nChapter 9]
D -- Yes --> J[Two-way\nfactorial ANOVA\nChapter 5]
D -- No --> K[Two-way ANOVA\nType III SS\nChapter 5]
E -- Yes --> L{One or two\nlevel factors?}
E -- No --> M{Hard-to-change\nfactor?}
L -- One --> N[Repeated Measures\nANOVA\nChapter 6]
L -- Two --> O[Mixed Repeated\nMeasures\nChapter 6]
M -- Yes --> P[Split-Plot\nDesign\nChapter 7]
M -- No --> Q[Factorial ANOVA\nwith blocking\nChapter 5]
style A fill:#555555,color:#ffffff
style F fill:#81B29A,color:#ffffff
style H fill:#81B29A,color:#ffffff
style I fill:#81B29A,color:#ffffff
style J fill:#81B29A,color:#ffffff
style K fill:#81B29A,color:#ffffff
style N fill:#81B29A,color:#ffffff
style O fill:#81B29A,color:#ffffff
style P fill:#81B29A,color:#ffffff
style Q fill:#81B29A,color:#ffffff
style B fill:#F6AE2D,color:#ffffff
style C fill:#F6AE2D,color:#ffffff
style D fill:#F6AE2D,color:#ffffff
style E fill:#F6AE2D,color:#ffffff
style G fill:#F6AE2D,color:#ffffff
style L fill:#F6AE2D,color:#ffffff
style M fill:#F6AE2D,color:#ffffff
Appendix C — Decision Trees for Model Selection
This appendix collects all the decision frameworks introduced throughout the book into a single reference. Each diagram is designed to be consulted when starting a new analysis, before opening R, to help identify the appropriate model, check the relevant assumptions, and choose the right error term or distribution family for the data at hand.
C.1 Choosing the Right ANOVA Design
The first question in any analysis is whether the data structure matches the intended model. This tree guides the choice of design based on the number of factors, the relationship between experimental units, and the nature of the response.
C.2 Checking ANOVA Assumptions
Once the design is chosen, the assumptions must be checked before interpreting results. This tree follows the hierarchy of assumptions established in Chapter 2: independence first, homoscedasticity second, normality third.
flowchart TD
A[Fit the model\nextract residuals] --> B{Are observations\nindependent?}
B -- No --> C[Identify the\nclustering structure]
B -- Yes --> D{Homoscedasticity:\nLevene test +\nresidual boxplot}
C --> E[Mixed model\nor aggregate to\ntrue replicates\nChapter 9]
D -- Violated --> F{Equal group\nsizes?}
D -- OK --> G{Normality:\nQ-Q plot +\nShapiro-Wilk}
F -- Yes --> H[Mild violation:\nproceed with\ncaution]
F -- No --> I[Welch ANOVA\nor transform\nChapter 10]
G -- Outliers --> J[Investigate\noutliers:\ncorrect or report]
G -- Heavy tails --> K[Kruskal-Wallis\nor robust ANOVA\nChapter 10]
G -- OK --> L[Proceed with\nstandard ANOVA]
style A fill:#555555,color:#ffffff
style E fill:#81B29A,color:#ffffff
style H fill:#F6AE2D,color:#ffffff
style I fill:#81B29A,color:#ffffff
style J fill:#F6AE2D,color:#ffffff
style K fill:#81B29A,color:#ffffff
style L fill:#81B29A,color:#ffffff
style B fill:#2E86AB,color:#ffffff
style D fill:#2E86AB,color:#ffffff
style F fill:#2E86AB,color:#ffffff
style G fill:#2E86AB,color:#ffffff
style C fill:#E84855,color:#ffffff
C.3 When Assumptions Fail: Choosing a Remedy
When one or more assumptions are violated and cannot be fixed by a simple remedy, this tree guides the choice of alternative analysis. It extends the diagram in Section 10.4 to cover the full range of methods in the book.
flowchart TD
A[Assumption\nviolation detected] --> B{What type\nof response?}
B -- Continuous\npositive --> C{Multiplicative\nvariation?}
B -- Count data --> D{Overdispersed?}
B -- Proportion\nor binary --> E{Overdispersed\nor clustered?}
B -- Continuous\nunbounded --> F{Outliers or\nheavy tails?}
B -- Ordinal --> G[Cumulative link\nmixed model\nChapter 11]
C -- Yes --> H[Log transform\nor Gamma GLM\nChapter 10-11]
C -- No --> I[Square root\nor robust ANOVA\nChapter 10]
D -- Yes --> J[Negative\nBinomial GLMM\nChapter 11]
D -- No --> K[Poisson GLMM\nChapter 11]
E -- Yes --> L[Beta-Binomial\nor Binomial GLMM\nChapter 11]
E -- No --> M[Binomial GLM\nChapter 10]
F -- Outliers --> N[Robust ANOVA\nWRS2\nChapter 10]
F -- Heavy tails --> O[Permutation test\nor Kruskal-Wallis\nChapter 10]
F -- Neither --> P[Proceed with\nANOVA]
style A fill:#555555,color:#ffffff
style G fill:#81B29A,color:#ffffff
style H fill:#81B29A,color:#ffffff
style I fill:#81B29A,color:#ffffff
style J fill:#81B29A,color:#ffffff
style K fill:#81B29A,color:#ffffff
style L fill:#81B29A,color:#ffffff
style M fill:#81B29A,color:#ffffff
style N fill:#81B29A,color:#ffffff
style O fill:#81B29A,color:#ffffff
style P fill:#81B29A,color:#ffffff
style B fill:#F6AE2D,color:#ffffff
style C fill:#2E86AB,color:#ffffff
style D fill:#2E86AB,color:#ffffff
style E fill:#2E86AB,color:#ffffff
style F fill:#2E86AB,color:#ffffff
C.4 Choosing a Multiple Comparison Procedure
After a significant ANOVA F test, the choice of post-hoc procedure depends on whether comparisons were planned in advance and what type of error rate control is appropriate for the scientific context.
flowchart TD
A[Significant\nANOVA] --> B{Comparisons\nplanned in advance?}
B -- Yes --> C{How many\ncontrasts?}
B -- No --> D{Goal of\ncomparisons?}
C -- Few\nk-1 or fewer --> E[Planned contrasts\nno correction\nChapter 4]
C -- Many --> F[Holm or\nBenjamini-Hochberg\nChapter 4]
D -- All pairwise --> G{Equal group\nsizes?}
D -- vs control only --> H[Dunnett\nChapter 4]
D -- Exploratory\nmany groups --> I[Benjamini-Hochberg\nFDR control\nChapter 4]
G -- Yes --> J[Tukey HSD\nChapter 4]
G -- No --> K[Tukey HSD\nor Holm\nChapter 4]
style A fill:#555555,color:#ffffff
style E fill:#81B29A,color:#ffffff
style F fill:#81B29A,color:#ffffff
style H fill:#81B29A,color:#ffffff
style I fill:#81B29A,color:#ffffff
style J fill:#81B29A,color:#ffffff
style K fill:#81B29A,color:#ffffff
style B fill:#F6AE2D,color:#ffffff
style C fill:#2E86AB,color:#ffffff
style D fill:#2E86AB,color:#ffffff
style G fill:#2E86AB,color:#ffffff
C.5 Choosing Between Fixed and Random Effects
This tree addresses the distinction between fixed and random effects introduced in Chapter 2 and developed in Chapters 7 and 9. The decision has direct consequences for the model fitted, the inference drawn, and the scope of conclusions.
flowchart TD
A[Factor to\nclassify] --> B{Are these levels the\nspecific entities\nof interest?}
B -- Yes --> C{Do conclusions\napply only to\nthese levels?}
B -- No --> D{Are levels a\nrandom sample from\na population?}
C -- Yes --> E[Fixed effect\nEstimate group means\nChapter 2]
C -- No --> F[Reconsider:\nare some levels\nrepresentative?]
D -- Yes --> G[Random effect\nEstimate variance\nComponent\nChapter 9]
D -- Unclear --> H{Would you want\nconclusions to\ngeneralise?}
H -- Yes --> G
H -- No --> E
F --> I{Mix of fixed\nand random?}
I -- Yes --> J[Mixed model\nwith both\nChapter 9]
I -- No --> E
style A fill:#555555,color:#ffffff
style E fill:#81B29A,color:#ffffff
style G fill:#81B29A,color:#ffffff
style J fill:#81B29A,color:#ffffff
style B fill:#F6AE2D,color:#ffffff
style C fill:#F6AE2D,color:#ffffff
style D fill:#F6AE2D,color:#ffffff
style H fill:#F6AE2D,color:#ffffff
style F fill:#2E86AB,color:#ffffff
style I fill:#2E86AB,color:#ffffff
C.6 GLMM Distribution and Link Function Selection
When the response is non-normal, this tree guides the choice of distribution family and link function for a generalised linear mixed model.
flowchart TD
A[What is the\nnature of your\nresponse?] --> B{Count data\nnon-negative\ninteger?}
A --> C{Proportion or\nbinary?}
A --> D{Positive\ncontinuous?}
A --> E{Ordered\ncategories?}
A --> F{Continuous\nnormal-ish?}
B -- Yes --> G{Overdispersed?\nratio >> 1}
G -- Yes --> H{Excess\nzeros?}
G -- No --> I[Poisson GLMM\nlog link\nChapter 11]
H -- Yes --> J[Zero-inflated\nNegative Binomial\nglmmTMB\nChapter 11]
H -- No --> K[Negative Binomial\nGLMM\nglmmTMB\nChapter 11]
C -- Binary 0/1 --> L[Binomial GLMM\nlogit link\nChapter 11]
C -- Proportion k/n --> M{Overdispersed?}
M -- Yes --> N[Beta-Binomial\nglmmTMB\nChapter 11]
M -- No --> L
D -- Constant CV --> O[Gamma GLMM\nlog link\nChapter 11]
D -- Log-normal --> P[Gaussian GLMM\nlog link\nor log-transform\nChapter 10-11]
E --> Q[Cumulative Link\nMixed Model\nordinal\nChapter 11]
F --> R[Linear Mixed\nModel\nlme4\nChapter 9]
style A fill:#555555,color:#ffffff
style I fill:#81B29A,color:#ffffff
style J fill:#81B29A,color:#ffffff
style K fill:#81B29A,color:#ffffff
style L fill:#81B29A,color:#ffffff
style N fill:#81B29A,color:#ffffff
style O fill:#81B29A,color:#ffffff
style P fill:#81B29A,color:#ffffff
style Q fill:#81B29A,color:#ffffff
style R fill:#81B29A,color:#ffffff
style B fill:#F6AE2D,color:#ffffff
style C fill:#F6AE2D,color:#ffffff
style D fill:#F6AE2D,color:#ffffff
style E fill:#F6AE2D,color:#ffffff
style F fill:#F6AE2D,color:#ffffff
style G fill:#2E86AB,color:#ffffff
style H fill:#2E86AB,color:#ffffff
style M fill:#2E86AB,color:#ffffff
C.7 Model Selection Strategy for GLMMs
This tree operationalises the two-stage model selection strategy described in Section 11.6, establishing the random effects structure first, then selecting fixed effects by model comparison.
flowchart TD
A[Start:\nSpecify maximal\nrandom effects\nfrom design] --> B{Are all random\neffects justified\nby design?}
B -- Yes --> C[Fit maximal\nrandom effects\nmodel with REML]
B -- No --> D[Remove unjustified\nrandom effects\nfrom model]
D --> C
C --> E{Check random\neffects variance\ncomponents}
E -- Near zero --> F[Consider\nsimplifying\nrandom structure]
E -- Reasonable --> G[Fix random\neffects structure]
G --> H[Refit with ML\nfor fixed effects\ncomparison]
H --> I{Compare candidate\nfixed effect\nmodels by AIC}
I -- ΔAIC > 2 --> J[Select lower\nAIC model]
I -- ΔAIC < 2 --> K[Models equivalent:\nconsider averaging\nor simpler model]
J --> L[Refit winning\nmodel with REML\nfor final estimates]
K --> L
L --> M[Check DHARMa\ndiagnostics]
M -- Problems --> N[Reconsider\ndistribution\nor structure]
M -- Clean --> O[Report results]
style A fill:#555555,color:#ffffff
style C fill:#2E86AB,color:#ffffff
style G fill:#2E86AB,color:#ffffff
style H fill:#2E86AB,color:#ffffff
style J fill:#81B29A,color:#ffffff
style K fill:#F6AE2D,color:#ffffff
style L fill:#81B29A,color:#ffffff
style O fill:#81B29A,color:#ffffff
style N fill:#E84855,color:#ffffff
style B fill:#F6AE2D,color:#ffffff
style E fill:#F6AE2D,color:#ffffff
style I fill:#F6AE2D,color:#ffffff
style M fill:#F6AE2D,color:#ffffff
style D fill:#E84855,color:#ffffff
style F fill:#E84855,color:#ffffff
C.8 Quick Reference: R Functions by Task
The following table maps each analytical task to the primary R function used in this book, the package it comes from, and the chapter where it is introduced.
| Task | Function | Package | Chapter |
|---|---|---|---|
| One-way ANOVA | aov() |
base R | 3 |
| Two-way ANOVA (Type III) | Anova() |
car |
5 |
| Repeated measures ANOVA | aov() with Error() |
base R | 6 |
| Sphericity test | summary(Anova(...), multivariate=FALSE) |
car |
6 |
| Linear mixed model | lmer() |
lme4 |
9 |
| P-values for LMM | anova(..., ddf="Kenward-Roger") |
lmerTest |
9 |
| Poisson GLMM | glmer(..., family=poisson) |
lme4 |
11 |
| Negative binomial GLMM | glmmTMB(..., family=nbinom2) |
glmmTMB |
11 |
| Zero-inflated GLMM | glmmTMB(..., ziformula=~1) |
glmmTMB |
11 |
| Binomial GLMM | glmer(..., family=binomial) |
lme4 |
11 |
| Gamma GLMM | glmmTMB(..., family=Gamma) |
glmmTMB |
11 |
| Ordinal mixed model | clmm() |
ordinal |
11 |
| GLMM diagnostics | simulateResiduals() |
DHARMa |
11 |
| Post-hoc comparisons | emmeans() |
emmeans |
4 |
| Tukey HSD | TukeyHSD() |
base R | 4 |
| Planned contrasts | glht() |
multcomp |
4 |
| Effect sizes | omega_squared() |
effectsize |
3 |
| Mixed model R² | r.squaredGLMM() |
MuMIn |
9 |
| Levene’s test | leveneTest() |
car |
2 |
| Kruskal-Wallis | kruskal.test() |
base R | 10 |
| Friedman test | friedman.test() |
base R | 10 |
| Robust ANOVA | t1way() |
WRS2 |
10 |
| Analytical power | pwr.anova.test() |
pwr |
3 |
| Simulation power | powerSim() |
simr |
12 |
| Power curve | powerCurve() |
simr |
12 |
| Model comparison | anova() |
base R | 8 |
| AIC comparison | AIC() |
base R | 11 |
| Model selection table | model.sel() |
MuMIn |
11 |