Tuesday, March 31, 2015

Make a chi2 test in R

Hello

Today, we will see how to make a chi2 test in R.

Chi2 memo

Pearson's chi-squared test or chi2 test will help us to test if :
- one variable fits with a theoretical distribution (example : results with 1000 throws of a dice, is it  a normal dice or a loaded one?).
- two random variables are independents (example : someone's eye color and his shoe size).

You can find more information here : Chi2 by Stattrek

How to make a chi2 in R

I will use a dataframe to make my chi2. I want to know if two variables (marital status and education) are independent or not.

Firstly, I will display the table of the two variables :

table(dataset$marital,dataset$educ)

I obtain :


            -9th 9-11th High School -College +College
  Married    270    343         522      703      843
  Widowed    110     84         110      101       61
  Divorced    34     80         144      209      103
  Separated   42     46          43       51       22
  Single      48    138         244      450      308
  Couple      46     90         103      142       59

I can see that 843 people are married and with a College education. I also see that 110 people are widowed and stopped their education in High School.

Now, we will see if these two variables are independent (null hypothesis), with the chi2 test :

chi2<-chisq.test(dataset$marital,dataset$educ)
chi2

I obtain :

Pearson's Chi-squared test

data:  dataset$marital and dataset$educ
X-squared = 390.2901, df = 20, p-value < 2.2e-16

I can see that my pvalue is less than 0.05 (pvalue<2.2e-16). So I can reject the null hypothesis. My two variables are not independent!

How can I know which category has an excess or a deficit :

chi2$residuals

I obtain :


           dataset$educ
dataset$marital   -9th   9-11th     High School   -College     +College
  Married    0.2617848  -1.76781487  -1.74227653  -3.432675   6.4889432
  Widowed    9.3892736   2.2735136   1.2208028  -3.2281972  -5.19370145
  Divorced  -2.9930002  -0.0251501   2.2137140   2.9820762  -3.37361722
  Separated  4.8436372   3.2263128   0.0204511  -1.2662691  -4.09296860
  Single    -6.4278820  -2.2586597  -0.3564617   5.0699334   0.52792194
  Couple     0.3616861   3.5671765   1.0965409   0.9328732  -4.91334127


This command shows me that in the category "married and College education" I have more people that what I expected under the null hypothesis.


I can also have this information with :

chi2$observed-chi2$expected


For information :
residuals = (observed - expected)/sqrt(expected)

PS : All statistics are made with the dataset : demographics from NHANES 2011-2012 (only people >20 years old)


No comments:

Post a Comment