Chi-Square Test

 

What is the Chi-Square Test?

The Chi-Square (χ²) test is a non-parametric statistical test used to determine if there is a significant association between two categorical variables or if a categorical variable follows a specific distribution.


✅ Types of Chi-Square Tests

1. Chi-Square Test for Independence

Used to determine if two categorical variables are independent or related.

Example: Is there a relationship between gender and preference for a product?


2. Chi-Square Goodness-of-Fit Test

Used to determine whether a single categorical variable follows a hypothesized distribution.

Example: Does a die roll show numbers uniformly?


Assumptions of the Chi-Square Test

  • Data are frequencies or counts (not percentages or means).

  • Categories are mutually exclusive.

  • Observations are independent.

  • Expected frequency in each cell should be ≥ 5 (or use Fisher's exact test if < 5).


General Formula

χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}

Where:

  • OiO_i = Observed frequency

  • EiE_i = Expected frequency

  • The sum is over all categories or cells


📊 1. Chi-Square Test for Independence – Example

Question: Is there an association between gender and ice cream preference?

Ice CreamMaleFemaleTotal
Vanilla203050
Chocolate252550
Total4555100

Step 1: Hypotheses

  • H₀: Gender and ice cream preference are independent.

  • H₁: Gender and ice cream preference are associated.


Step 2: Calculate Expected Frequencies

Eij=(Row Total)×(Column Total)Grand TotalE_{ij} = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}}
Ice CreamMale (Expected)Female (Expected)
Vanilla                    50×45100=22.5    \frac{50×45}{100} = 22.550×55100=27.5
Chocolate50×45100=22.5\frac{50×45}{100} = 22.5
50×55100=27.5\frac{50×55}{100} = 27.5

Step 3: Apply Chi-Square Formula

χ2=(2022.5)222.5+(3027.5)227.5+(2522.5)222.5+(2527.5)227.5\chi^2 = \frac{(20 - 22.5)^2}{22.5} + \frac{(30 - 27.5)^2}{27.5} + \frac{(25 - 22.5)^2}{22.5} + \frac{(25 - 27.5)^2}{27.5} =6.2522.5+6.2527.5+6.2522.5+6.2527.5= \frac{6.25}{22.5} + \frac{6.25}{27.5} + \frac{6.25}{22.5} + \frac{6.25}{27.5} =0.278+0.227+0.278+0.227=1.01= 0.278 + 0.227 + 0.278 + 0.227 = 1.01

Step 4: Degrees of Freedom

df=(r1)(c1)=(21)(21)=1df = (r - 1)(c - 1) = (2 - 1)(2 - 1) = 1

Critical value at α = 0.05 and df = 1 ≈ 3.841


Step 5: Conclusion

Since χ² = 1.01 < 3.841 → Fail to reject H₀

Conclusion: No significant association between gender and ice cream preference.


📊 2. Chi-Square Goodness-of-Fit – Example

Question: A die is rolled 60 times. Observed frequencies:
[10, 11, 8, 9, 12, 10]. Is the die fair?

Step 1: Hypotheses

  • H₀: Die is fair → all faces have equal probability.

  • H₁: Die is not fair.


Step 2: Expected Frequency

If fair:

Ei=606=10 for each faceE_i = \frac{60}{6} = 10 \text{ for each face}

Step 3: Calculate χ²

      χ2=(OiEi)2Ei=(1010)210+(1110)210+(810)210+(910)210+(1210)210+(1010)210\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} = \frac{(10-10)^2}{10} + \frac{(11-10)^2}{10} + \frac{(8-10)^2}{10} + \frac{(9-10)^2}{10} + \frac{(12-10)^2}{10} + \frac{(10-10)^2}{10} =0+0.1+0.4+0.1+0.4+0=1.0= 0 + 0.1 + 0.4 + 0.1 + 0.4 + 0 = 1.0

Step 4: Degrees of Freedom

df=k1=61=5df = k - 1 = 6 - 1 = 5

Critical value at α = 0.05, df = 5 ≈ 11.07


Step 5: Conclusion

Since χ² = 1.0 < 11.07 → Fail to reject H₀

Conclusion: The die appears to be fair.


📌 Summary Table

Test TypeUse Casedf
Goodness-of-Fit   One categorical variable vs. expected values        k1k - 1
Test for Independence   Relationship between 2 categorical variables     (r1)(c1)(r - 1)(c - 1)

Comments

Popular posts from this blog

Statistical Methods Lab ( R Language) PCCBL308 Semester 3 KTU BTech CB 2024 Scheme - Dr Binu V P

Statistical Methods Lab ( R Language) PCCBL308 Course Details and Syllabus

t -test