Homework 3 is assigned today, and due Friday, July 16 at 11:59pm.
Project proposals returned with feedback. Final write-ups due next Tuesday, July 20.
Project presentation order will be assigned randomly.
No lab this Friday- use this time to work on Homework 3 and/or project proposal.
Let’s look at data relating student test scores in various subjects to factors such as parent background and test preparation. The data are from Kaggle. We’ll use the following variables in the analysis:
math
- score on math test (between 0 and 100)parent_educ
- highest level of parental education: associate’s degree, bachelor’s degree, high school, master’s degree, some college, some high schoolscores <- read.csv("data/student_performance.csv")
Visualize the relationship between math score and parent education. Describe any patterns or relationships you see.
Write the ANOVA hypotheses to test the relationship between parent education and math test score.
Check the assumptions of ANOVA by (1) normality and (2) homoscedastic variance. Assuming the observations are independent, should we move forward with the ANOVA testing?
Regardless of what you concluded in Exercise 3, perform the hypothesis test using the ANOVA model and a significance level of \(\alpha = 0.05\). State the value of the test statistic, as well the distribution it follows, assuming \(H_{0}\) true.
Should you proceed with any more testing? Explain your decision.
If we were to proceed with testing and conducted all pairwise tests, we would conduct 15 t-tests. What would be the Bonferroni-correct significance level we would use for each test?
Pick one pairwise hypothesis and perform a t-test using the significance level from Exercise 6. Hint: use the t_test()
function, filtering for the two levels of parent education that you are testing between.