Select Page

Chi-Squared testing

Null hypothesis:  The proportion of people who choose the same hotel again will be the same for beach combers and windsurfers.

To start off, I constructed my data.frame in a very roundabout manner:

>Beachcomber <- c(163,64,227)
>Windsurfer <- c(154,108,262)
>Choose_again <- c("Yes", "No", "Total")
>dat <- data.frame(Choose_again, Beachcomber, Windsurfer)
>dat$Total <- dat$Beachcomber + dat$Windsurfer
>dat <- dat[,-1]
>rownames(dat) <- c("Yes", "no","total")


      Beachcomber Windsurfer Total
Yes           163        154   317
no             64        108   172
total         227        262   489

Next, I run the chi-test and save it in an object:

> res

Pearson's Chi-squared test with Yates' continuity correction

data: dat[1:2, 1:2]
X-squared = 8.4903, df = 1, p-value = 0.0035

I was stuck on that for a while, trying to figure out exactly which parts of the data.frame needed to be included/excluded.  I did notice that this way of doing it didn’t give me the error that other combinations did though.  I assume it’s because the function implemented Yates’ continuity correction, which is automatically implemented for 2×2 tables.  The p-value is quite small, so at a threshold of p=.05, the null hypothesis would be rejected and we’d say that it appears that the proportion of people who choose to return to their hotel is different for each group.


gg <- data.frame(x = seq(0,20,.1))
gg$y <- dchisq(gg$x, 1)
ggplot(gg) + 
 geom_path(aes(x,y)) + 
 geom_ribbon(data=gg[gg$x>qchisq(.05,1,lower.tail=FALSE),], aes(x,ymin=0, ymax=y), fill="red")+
 geom_vline(xintercept = res$statistic, color = "blue")+
 labs( x = "x", y = "dchisq(gg$x, 1)")+
 geom_text(aes(x=8, label="x^2", y=0.25), colour="blue", angle=90)