Tuesday, March 31, 2015

Make a chi2 test in R

Hello

Today, we will see how to make a chi2 test in R.

Chi2 memo

Pearson's chi-squared test or chi2 test will help us to test if :
- one variable fits with a theoretical distribution (example : results with 1000 throws of a dice, is it  a normal dice or a loaded one?).
- two random variables are independents (example : someone's eye color and his shoe size).

You can find more information here : Chi2 by Stattrek

How to make a chi2 in R

I will use a dataframe to make my chi2. I want to know if two variables (marital status and education) are independent or not.

Firstly, I will display the table of the two variables :

table(dataset$marital,dataset$educ)

I obtain :


            -9th 9-11th High School -College +College
  Married    270    343         522      703      843
  Widowed    110     84         110      101       61
  Divorced    34     80         144      209      103
  Separated   42     46          43       51       22
  Single      48    138         244      450      308
  Couple      46     90         103      142       59

I can see that 843 people are married and with a College education. I also see that 110 people are widowed and stopped their education in High School.

Now, we will see if these two variables are independent (null hypothesis), with the chi2 test :

chi2<-chisq.test(dataset$marital,dataset$educ)
chi2

I obtain :

Pearson's Chi-squared test

data:  dataset$marital and dataset$educ
X-squared = 390.2901, df = 20, p-value < 2.2e-16

I can see that my pvalue is less than 0.05 (pvalue<2.2e-16). So I can reject the null hypothesis. My two variables are not independent!

How can I know which category has an excess or a deficit :

chi2$residuals

I obtain :


           dataset$educ
dataset$marital   -9th   9-11th     High School   -College     +College
  Married    0.2617848  -1.76781487  -1.74227653  -3.432675   6.4889432
  Widowed    9.3892736   2.2735136   1.2208028  -3.2281972  -5.19370145
  Divorced  -2.9930002  -0.0251501   2.2137140   2.9820762  -3.37361722
  Separated  4.8436372   3.2263128   0.0204511  -1.2662691  -4.09296860
  Single    -6.4278820  -2.2586597  -0.3564617   5.0699334   0.52792194
  Couple     0.3616861   3.5671765   1.0965409   0.9328732  -4.91334127


This command shows me that in the category "married and College education" I have more people that what I expected under the null hypothesis.


I can also have this information with :

chi2$observed-chi2$expected


For information :
residuals = (observed - expected)/sqrt(expected)

PS : All statistics are made with the dataset : demographics from NHANES 2011-2012 (only people >20 years old)


Tuesday, March 24, 2015

Start a new project with R Studio

Hello,


Today, we will see how to start a new project in R Studio.

Firstly, what is a project?

A project in R Studio is a work environment. When you start a new project:

  • A new R session is opened
  • Several history files are created (.Rprofile, .RData, .RHistory). They will help you find your commands and history when you will re-open this project later.
  • You choose a working directory (setwd() and getwd() in R without RStudio). 
  • All your R Studio settings for this project will be restored.

 These projects are really useful if you currently work on different datasets or studies!


Create a new project in R Studio

  • Open R Studio
  • Go to "files" (up left) and "New Project" 
  • Choose "Empty project"

  • Choose if you want to use an existing directory or create a new one.
  • Choose the directory that you want or choose the name of the new one and where it will be created. And click on "create project".
  • Done! RStudio will open your new work environment for this project. 
  • Now you can create a new R script to start programming in R. To do that, use "File, New File, R Script" or the icon on the top left. 


















Finally,  this is your new environment :


See you soon for a new article about R Studio! In the next one, we will see how to personalize R Studio.

Karine

Download R and R Studio

Hello everyone,


This is my first article about learning R for statistics! This blog is for beginners in R, but also people who want to learn more advanced skills with this software!

OK let's go, we will see where we can download R and then R Studio.

For R :
- Official website : R project
- Download R : Choose one of these links. Then you will have links for Windows, Linux and Mac.

For R Studio :
Then we will use R Studio. It is an integrated development environment (IDE) for R, it is more user-friendly than the original R environment. Plus, very cool debugging tools were recently introduced (since version 0.98 from april 2014).
- Download R Studio (open source solution).

See you soon for the next article!

Karine