Null hypothesis: The proportion of people who choose the same hotel again will be the same for beach combers and windsurfers.
To start off, I constructed my data.frame in a very roundabout manner:
>Beachcomber <- c(163,64,227) >Windsurfer <- c(154,108,262) >Choose_again <- c("Yes", "No", "Total") >dat <- data.frame(Choose_again, Beachcomber, Windsurfer) >dat$Total <- dat$Beachcomber + dat$Windsurfer >dat <- dat[,-1] >rownames(dat) <- c("Yes", "no","total") >dat Beachcomber Windsurfer Total Yes 163 154 317 no 64 108 172 total 227 262 489
Next, I run the chi-test and save it in an object:
>res<-chisq.test(dat[1:2,1:2]) > res Pearson's Chi-squared test with Yates' continuity correction data: dat[1:2, 1:2] X-squared = 8.4903, df = 1, p-value = 0.0035
I was stuck on that for a while, trying to figure out exactly which parts of the data.frame needed to be included/excluded. I did notice that this way of doing it didn’t give me the error that other combinations did though. I assume it’s because the function implemented Yates’ continuity correction, which is automatically implemented for 2×2 tables. The p-value is quite small, so at a threshold of p=.05, the null hypothesis would be rejected and we’d say that it appears that the proportion of people who choose to return to their hotel is different for each group.
gg <- data.frame(x = seq(0,20,.1)) gg$y <- dchisq(gg$x, 1) ggplot(gg) + geom_path(aes(x,y)) + geom_ribbon(data=gg[gg$x>qchisq(.05,1,lower.tail=FALSE),], aes(x,ymin=0, ymax=y), fill="red")+ geom_vline(xintercept = res$statistic, color = "blue")+ labs( x = "x", y = "dchisq(gg$x, 1)")+ geom_text(aes(x=8, label="x^2", y=0.25), colour="blue", angle=90)