Chi-square test

The chi-square test is the standard method for analyzing categorical data. It comes in two forms: the test of independence (are two categorical variables associated?) and the goodness-of-fit test (does a categorical variable follow a specific distribution?). Both use the same test statistic and the chi-squared distribution as the reference.

The chi-square statistic

Both tests use the same core idea: compare observed frequencies \(O_i\) to expected frequencies \(E_i\) under \(H_0\).

\[\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\]

Large values of \(\chi^2\) indicate that the observed data are far from what \(H_0\) predicts. Under \(H_0\), this statistic follows a \(\chi^2\) distribution whose degrees of freedom depend on the type of test.

Test of independence

Used to test whether two categorical variables are associated in a contingency table.

Hypotheses: \(H_0\): the two variables are independent. \(H_1\): there is an association.

Expected frequencies for each cell \((i, j)\):

\[E_{ij} = \frac{\text{row total}_i \times \text{column total}_j}{\text{grand total}}\]

Degrees of freedom: \(df = (r-1)(c-1)\), where \(r\) = number of rows and \(c\) = number of columns.

Gender and product preference

A survey of 200 people records gender (Male/Female) and product preference (Like/Dislike):

Like Dislike Total
Male 60 40 100
Female 30 70 100
Total 90 110 200

Expected frequencies under independence:

\[E_{11} = \frac{100 \times 90}{200} = 45, \quad E_{12} = \frac{100 \times 110}{200} = 55\] \[E_{21} = \frac{100 \times 90}{200} = 45, \quad E_{22} = \frac{100 \times 110}{200} = 55\]

Test statistic:

\[\chi^2 = \frac{(60-45)^2}{45} + \frac{(40-55)^2}{55} + \frac{(30-45)^2}{45} + \frac{(70-55)^2}{55}\] \[= 5.000 + 4.091 + 5.000 + 4.091 = 18.182\]

\(df = (2-1)(2-1) = 1\). Critical value at \(\alpha=0.05\): \(\chi^2_{0.05,1} = 3.841\).

Since \(18.182 > 3.841\), reject \(H_0\). There is a significant association between gender and product preference.

Example icon

Heatmap of observed vs expected frequencies for the gender and product preference contingency table

The heatmap shows each cell’s contribution to \(\chi^2\): red cells deviate most from independence.

Effect size: Cramér’s V

The p-value tells you whether the association is significant, not how strong it is. For contingency tables, use Cramér’s V:

\[V = \sqrt{\frac{\chi^2}{n \times \min(r-1, c-1)}}\]

\(V\) ranges from 0 (no association) to 1 (perfect association). For a \(2 \times 2\) table it equals the absolute value of the phi coefficient.

For the example: \(V = \sqrt{18.182 / (200 \times 1)} = \sqrt{0.0909} \approx 0.301\). A moderate association.

⚠️ Statistical significance is not the same as strong association

With large samples, even trivial associations produce significant chi-square statistics. A survey of 10,000 people might find \(\chi^2 = 5.2\) (\(p = 0.023\)) for an association with \(V = 0.02\): statistically significant but practically negligible.

Always report Cramér’s V alongside the p-value. As a rough guide: \(V < 0.1\) is weak, \(0.1 \leq V < 0.3\) is moderate, \(V \geq 0.3\) is strong.

Goodness-of-fit test

Used to test whether a single categorical variable follows a specific distribution.

Hypotheses: \(H_0\): the variable follows the specified distribution. \(H_1\): it does not.

Degrees of freedom: \(df = k - 1\), where \(k\) is the number of categories.

Is a die fair?

A die is rolled 120 times. If fair, each face should appear 20 times. Observed counts:

Face 1 2 3 4 5 6
Observed 18 22 25 17 21 17
Expected 20 20 20 20 20 20

\[\chi^2 = \frac{(18-20)^2}{20} + \frac{(22-20)^2}{20} + \frac{(25-20)^2}{20} + \frac{(17-20)^2}{20} + \frac{(21-20)^2}{20} + \frac{(17-20)^2}{20}\] \[= 0.2 + 0.2 + 1.25 + 0.45 + 0.05 + 0.45 = 2.6\]

\(df = 6 - 1 = 5\). Critical value at \(\alpha=0.05\): \(\chi^2_{0.05,5} = 11.07\).

Since \(2.6 < 11.07\), fail to reject \(H_0\). No significant evidence that the die is unfair.

Example icon

Bar chart comparing observed and expected frequencies for the die fairness goodness-of-fit test

Assumptions

Both chi-square tests require:

  • Independence: observations are not paired or clustered.
  • Expected frequencies \(\geq 5\): each cell in the table should have an expected frequency of at least 5. If this is violated, merge categories or use Fisher’s exact test (for \(2 \times 2\) tables).
  • Counts, not percentages: the test statistic must be computed from raw counts, not proportions.

⚠️ Chi-square tests do not identify which cells drive the association

A significant result in a large contingency table tells you there is an association somewhere, but not where. Use standardized residuals to identify which cells deviate most from independence:

\[r_{ij} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}\]

Cells with \(|r_{ij}| > 2\) are contributing significantly to the overall \(\chi^2\). In R: chisq.test(table)$residuals.

Running the tests in R

Both tests are available in base R. For the test of independence, pass the contingency table to chisq.test(). For goodness-of-fit, pass the observed counts and optionally the expected probabilities. The correct = FALSE argument disables the Yates continuity correction, which is rarely needed with large samples but is applied by default for \(2 \times 2\) tables.

# Test of independence
table_data <- matrix(c(60, 40, 30, 70), nrow = 2)
chisq.test(table_data, correct = FALSE)

# Cramér's V (package rstatix or vcd)
library(vcd)
assocstats(table_data)

# Goodness-of-fit
observed <- c(18, 22, 25, 17, 21, 17)
chisq.test(observed)  # tests against uniform by default

The output includes the test statistic, degrees of freedom, and p-value. Standardized residuals are accessible via chisq.test(table_data)$residuals, which helps identify which cells drive a significant result.

💡 Choosing between chi-square and Fisher's exact test

For \(2 \times 2\) tables with small samples (any expected frequency \(< 5\)), use Fisher’s exact test (fisher.test() in R). It computes the exact p-value from the hypergeometric distribution and does not require large samples. For larger tables, chi-square is the standard approach.