Announcements

Resource for full examples in infer

https://infer.netlify.app/articles/observed_stat_examples.html

From last class

library(tidyverse)
library(infer)
asheville <- read_csv("data/asheville.csv")

Suppose you are interested in whether the mean price per guest per night is actually less than $80. Conduct a hypothesis test to assess this claim.

Hypotheses

\(H_0\): The mean price per guest per night is $80

\(H_a\): The mean price per guest per night is less than $80

\(H_0: \mu = 80\)

\(H_a: \mu < 80\)

Simulate null distribution

set.seed(1234)
null_dist <- asheville %>%
  specify(response = ppg) %>%
  hypothesize(null = "point", mu = 80) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean")
mean_ppg <- asheville %>%
  summarise(mean_ppg = mean(ppg)) %>%
  pull()

Visualize Null distribution using ggplot

ggplot(data = null_dist, aes(x = stat)) +
  geom_histogram(alpha = 0.8, bins = 15) + 
  geom_vline(xintercept = mean_ppg, color = "red")

Visualize null distribution using infer

visualize(null_dist) +
  shade_p_value(obs_stat = mean_ppg, direction = "less")

Calculate p-value

p_val <- null_dist %>%
  filter(stat <= mean_ppg) %>%
  summarise(p_value = n() / nrow(null_dist)) %>%
  pull(p_value)
p_val
## [1] 0.336

Conclusion

The p-value of 0.336 is greater than \(\alpha = 0.05\), so we fail to reject the null hypothesis. The data do not provide sufficient evidence that the mean price per night is less than $80.

Clone a repo + start a new project

Clone the ae-12 repo on GitHub and start a new project in RStudio. Be sure to configure git in the RStudio console, so you can so you can push your results back up to GitHub.

library(usethis)
use_git_config(user.name= "github username", user.email="your email")

Exercise 1

Suppose you are interested in whether at least half of the Airbnb listings in Asheville are more than $50 per guest per night. What would be your null and alternative hypotheses?

Option 1

  • \(H_0\): The median price per guest per night is $50
  • \(H_a\): The median price per guest per night is greater than $50

Option 2

  • \(H_0\): The proportion of Airbnbs less than $50 ppg is 0.5
  • \(H_a\): The proportion of Airbnbs less than $50 ppg is less than 0.5

Exercise 2

Simulate the null distribution to test your hypotheses. You can use 1000 reps for the in-class exercise.

Option 1

set.seed(1234)
null_dist_opt1 <- asheville #%>%
  # specify(response = _____) %>%
  # hypothesize(null = _____, ____ = _____) %>%
  # generate(reps = _____, type = _____) %>%
  # calculate(stat = ______ )

Option 2

#create variable to track price
asheville <- asheville %>%
  mutate(less_50_ppg = if_else(ppg < 50, "Yes", "No"))
set.seed(1234)
null_dist_opt2 <- asheville # %>%
#   specify(response = ______, ____ = ____) %>%
#   hypothesize(null = _____, p = _____) %>%
#   generate(reps = ______, type = _______) %>%
#   calculate(stat = ______)

Exercise 3

What was your p-value? What decision do you make with respect to your hypotheses, and what conclusion do you make in the context of the research problem?

Option 1

#calculate observed statistic
obs_med <- asheville # %>%
 # finish the code
# null_dist_opt1 %>%
#   filter( ______ ) %>%
#   summarise(p_val = _______)

Option 2

#calculate observed statistic
obs_prop <- asheville # %>%
 # finish the code
# null_dist_opt2 %>%
#   filter( ______ ) %>%
#   summarise(p_val = _______)

Exercise 4

Suppose you are interested in whether the proportion of listings with a price per guest per night greater than $50 is 0.5. How would your null and alternative hypotheses change in this case? Carry out the appropriate hypothesis test, and report your p-value, decision, and conclusion in context of the research problem.

\(H_0\):

\(H_a\):

set.seed(1234)