4  ANOVA-related Tests for \(k\) Continuous Population Means

Learning Objectives

By the end of this chapter, you will be able to:

  • Explain how analysis of variance (ANOVA) partitions total variation into between-group and within-group components.
  • Interpret the meaning of the \(F\)-statistic and its role in testing mean differences across groups.
  • Describe the difference between main effects and interaction effects in a two-way ANOVA.
  • Identify potential interaction patterns via appropriate plotting when conducting a two-way ANOVA.
  • Conduct one-way and two-way ANOVA in a reproducible manner using the hypothesis testing workflow via R and Python.
  • Identify the assumptions underlying one-way and two-way ANOVA, including independence, normality, and homoscedasticity.
  • Evaluate model diagnostics to assess whether ANOVA assumptions are met.

It is time to expand our hypothesis testing framework from comparing two population means to a generalized approach for \(k\) population means by exploring analysis of variance (ANOVA). ANOVA is a foundational test in frequentist statistics designed to compare means across multiple populations under specific distributional assumptions, which we will examine in this chapter. Originally formalized by Ronald A. Fisher (Fisher 1925), ANOVA extends the concept of the two-sample \(t\)-test by partitioning the total variability of our outcome of interest into components attributable to main effects, interactions, and random error. This partitioning enables us to determine whether observed differences in groups are improbable to appear under the null hypothesis, which states that all groups share the same population mean.

mindmap
  root((Frequentist
  Hypothesis 
  Testings
  ))
    Simulation Based<br/>Tests
    Classical<br/>Tests
      (Chapter 1: <br/>Tests for One<br/>Continuous<br/>Population Mean)
      (Chapter 2: <br/>Tests for Two<br/>Continuous<br/>Population Means)
      (Chapter 3: <br/>ANOVA-related <br/>Tests for<br/>k Continuous<br/>Population Means)
        {{Unbounded<br/>Responses}}
          One<br/>Factor type<br/>Feature
            )One way<br/>ANOVA(
          Two<br/>Factor type<br/>Features
            )Two way<br/>ANOVA(

Figure 4.1: A specific hypothesis testing mind map outlining the techniques explored in this chapter, which include ANOVA-related tests for \(k\) population means.

Heads-up on some historical background in statistics!

While Ronald A. Fisher (1890–1962) is rightly recognized as a significant figure in modern statistics for developing key concepts such as ANOVA, it is imperative to recognize the broader context of his legacy. Beyond his pioneering contributions to statistics, Fisher was also a prominent advocate of eugenics, a movement that promoted pseudoscientific ideas about human heredity and social hierarchy. His involvement with the eugenics community is well documented (MacKenzie 1981). These views have been called out by certain members of the scientific community (Tarran 2020). That said, we (the authors of this mini-book) consider that students and scholars need to distinguish Fisher’s statistical insights from the unethical ideologies he supported, while also acknowledging the social responsibility to critique the historical misuse of science.

Fisher’s statistical work remains central to frequentist inference. However, educators and researchers must confront the historical connection between statistical innovation and eugenic thinking in early 20th-century science (Tabery and Sarkar 2015). By explicitly addressing this context, we can acknowledge the rigour of Fisher’s methodological contributions while rejecting the discriminatory worldviews he promoted. This approach aligns with contemporary efforts to ensure that teaching statistics is grounded not only in subject matter accuracy but also in ethical reflection and historical accountability (Kennedy-Shaffer 2024).

This chapter builds on the concepts introduced in the earlier works by Chapter 2 and Chapter 3, which concentrated on hypothesis testing for one and two groups, respectively. While those chapters primarily addressed pairwise comparisons, we will now shift our focus multiple comparisons:

As a side note, the workflow guiding this chapter, as referenced in Figure 1.1, will provide a structured approach for applying both one-way and two-way ANOVA to the same A/B/n dataset (see Section 4.1).

Heads-up on the importance of ANOVA in data science!

In modern data science, ANOVA remains significant, particularly in A/B/n testing (the generalization of A/B testing to multiple treatments). This approach allows for the simultaneous evaluation of various treatment variations or designs. ANOVA is valued for its interpretability and its flexibility in handling both balanced designs (where there is an equal number of replicates for each experimental treatment) and unbalanced designs (where there are unequal numbers of replicates). These strengths have solidified its role in both academic research and practical applications.

Image by Manfred Steger via Pixabay.

4.1 The ANOVA Dataset

The simulated dataset discussed in this chapter relates to an experimental context known as A/B/n testing, which is an expansion of traditional A/B testing as previously discussed. In a typical A/B testing, participants are randomly assigned to one of two strategies, referred to as treatments: treatment \(A\) (the control treatment) and treatment \(B\) (the experimental treatment). This approach enables us to infer causation concerning the following inquiry:

Will changing from treatment \(A\) to treatment \(B\) cause my outcome of interest, \(Y\), to increase (or decrease, if that is the case)?

In an ANOVA setting, the outcome must be continuous. The primary goal of A/B testing is to determine whether there is a statistically significant difference between treatments \(A\) and \(B\) concerning the outcome. Additionally, because we are conducting a proper randomized experiment, we can infer causation (which goes further than mere association), as treatment randomization enables us to get rid of the effect of further confounders.

Heads-up on confounding!

In causal inference, confounding refers to the mixing of effects between the outcome of interest, denoted as \(Y\), the randomized factor \(X\) (which is a two-level factor in A/B testing, corresponding to treatment \(A\) and treatment \(B\)), and a third, uncontrollable factor known as the confounder \(C\). This confounder is associated with the factor \(X\) and independently affects the outcome \(Y\), as shown by Figure 4.2.

Figure 4.2: Diagram depicting confounding (Rodríguez-Arelis et al. 2024).

When conducting experiments with more than two treatments in A/B testing, the experiment is referred to as A/B/n testing. In this case, the “n” does not indicate the sample size; rather, it simply represents any number of additional treatments beyond treatments \(A\) and \(B\). With this clarification, we can now proceed with our simulated dataset.

Image by Pabitra Kaity via Pixabay.
Table 4.1: First 100 rows of our A/B/n simulated data.

4.2 One-way ANOVA

4.2.1 Study Design

4.2.2 Data Collection and Wrangling

4.2.3 Exploratory Data Analysis

4.2.4 Testing Settings

4.2.5 Hypothesis Definitions

4.2.6 Test Flavour and Components

4.2.7 Inferential Conclusions

4.2.8 Storytelling

4.3 Two-way ANOVA

4.3.1 Study Design

4.3.2 Data Collection and Wrangling

4.3.3 Exploratory Data Analysis

4.3.4 Testing Settings

4.3.5 Hypothesis Definitions

4.3.6 Test Flavour and Components

4.3.7 Inferential Conclusions

4.3.8 Storytelling

4.4 Chapter Summary