https://infer.netlify.app/articles/observed_stat_examples.html
library(tidyverse)
library(infer)
asheville <- read_csv("data/asheville.csv")
Suppose you are interested in whether the mean price per guest per night is actually less than $80. Conduct a hypothesis test to assess this claim.
Hypotheses
\(H_0\): The mean price per guest per night is $80
\(H_a\): The mean price per guest per night is less than $80
\(H_0: \mu = 80\)
\(H_a: \mu < 80\)
Simulate null distribution
set.seed(1234)
null_dist <- asheville %>%
specify(response = ppg) %>%
hypothesize(null = "point", mu = 80) %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "mean")
mean_ppg <- asheville %>%
summarise(mean_ppg = mean(ppg)) %>%
pull()
Visualize Null distribution using ggplot
ggplot(data = null_dist, aes(x = stat)) +
geom_histogram(alpha = 0.8, bins = 15) +
geom_vline(xintercept = mean_ppg, color = "red")
Visualize null distribution using infer
visualize(null_dist) +
shade_p_value(obs_stat = mean_ppg, direction = "less")
Calculate p-value
p_val <- null_dist %>%
filter(stat <= mean_ppg) %>%
summarise(p_value = n() / nrow(null_dist)) %>%
pull(p_value)
p_val
## [1] 0.336
Conclusion
The p-value of 0.336 is greater than \(\alpha = 0.05\), so we fail to reject the null hypothesis. The data do not provide sufficient evidence that the mean price per night is less than $80.
Clone the ae-12 repo on GitHub and start a new project in RStudio. Be sure to configure git in the RStudio console, so you can so you can push your results back up to GitHub.
library(usethis)
use_git_config(user.name= "github username", user.email="your email")
Suppose you are interested in whether at least half of the Airbnb listings in Asheville are more than $50 per guest per night. What would be your null and alternative hypotheses?
Simulate the null distribution to test your hypotheses. You can use 1000 reps for the in-class exercise.
set.seed(1234)
null_dist_opt1 <- asheville #%>%
# specify(response = _____) %>%
# hypothesize(null = _____, ____ = _____) %>%
# generate(reps = _____, type = _____) %>%
# calculate(stat = ______ )
#create variable to track price
asheville <- asheville %>%
mutate(less_50_ppg = if_else(ppg < 50, "Yes", "No"))
set.seed(1234)
null_dist_opt2 <- asheville # %>%
# specify(response = ______, ____ = ____) %>%
# hypothesize(null = _____, p = _____) %>%
# generate(reps = ______, type = _______) %>%
# calculate(stat = ______)
What was your p-value? What decision do you make with respect to your hypotheses, and what conclusion do you make in the context of the research problem?
#calculate observed statistic
obs_med <- asheville # %>%
# finish the code
# null_dist_opt1 %>%
# filter( ______ ) %>%
# summarise(p_val = _______)
#calculate observed statistic
obs_prop <- asheville # %>%
# finish the code
# null_dist_opt2 %>%
# filter( ______ ) %>%
# summarise(p_val = _______)
Suppose you are interested in whether the proportion of listings with a price per guest per night greater than $50 is 0.5. How would your null and alternative hypotheses change in this case? Carry out the appropriate hypothesis test, and report your p-value, decision, and conclusion in context of the research problem.
\(H_0\):
\(H_a\):
set.seed(1234)