The goal of today’s lab is to practice creating bootstrap confidence intervals, and visualizing bootstrap distributions.
When dealing with randomness (as often the case in simulation in statistics), it is important to specify which pseudo-random draw you used in your analysis, so that you or someone else can reproduce the exact numbers you initially report. The set.seed()
function in R allows you to ensure that all of your analysis relies on a specific pseudo-random draw:
Often, we rely on specific parameters values throughout our analysis, and at a later point, we may want to replace them. In order to minimize the need to change your code later, we can assign the parameter values to a name, and use the name (rather than the hard-coded value) downstream. Then, to update your code at a later point, you can just change the value. Here, we are assinging the number of reps to a variable called num_reps
:
The data for today’s lab may be found by cloning your lab-05-durham- repository available in the GitHub course organization.
Today’s data comes from the City of Durham’s annual Resident Satisfaction Survey for 2018 (more information accessible here). In particular, the durham_survey
dataset contains data from 608 Durham residents on the survey questions below. Assume that the data are representative of Durham residents and may be generalized to the wider population of all city residents.
Any variable starting with quality
refers to the perceived quality of the listed variable, with 1 being “highly dissatisfied,” 3 being “neutral”, and 5 being “highly satisfied.” A value of 9 indicates that the subject responded with N/A.
You may load in the data with the following code, where ____
should be replaced by a meaningful name of your choosing:
Write all R code according to the style guidelines discussed in class.
Hint: be careful with how missing values are coded in this survey. As well, don’t forget to set a seed in order to ensure reproducibility!
Provide a point estimate of the mean satisfaction with the fire department (quality_fire
) among Durham residents in 2018.
Construct a 95% bootstrap interval for the mean satisfaction with the fire department among durham residents in 2018. Use at least 1,000 bootstrap samples. Make sure your interval is reproducible.
Visualize the bootstrap distribution and your confidence interval from Exercise 2. Interpret the confidence interval you constructed.
Provide a point estimate of the proportion of the respondents in the survey who were satisfied (score of 4 or 5) with the quality of parks and recreation (quality_parks_rec
) in Durham.
Hint: see if you can reuse parts of code used in previous exercises.
Construct a 99% bootstrap interval for the proportion of respondents in the survey who were satisfied with the quality of parks and recreation in Durham. Make sure your interval is reproducible.
Visualize the bootstrap distribution and your confidence interval from Exercise 5. Interpret the confidence interval you constructed.
Hint: If either of the two scores is missing, then that observation cannot be used to calculate the correlation.
quality_bike_path
) and the perceived quality of pedestrian paths (quality_ped_path
).Hint: To simulate the correlation between two variables, use specify(var1 ~ var2)
. Remember that correlation is still a numerical quantity, so that should help you choose the type of simulation you want to perform.
Construct a 95% bootstrap interval for the correlation between survey responses for perceived quality of bike paths and pedestrian paths. Make sure your interval is reproducible.
Construct a 99% bootstrap interval using the bootstrap distribution from Exercise 8.
How does the 99% bootstrap interval compare to the 95% bootstrap interval calculated in the previous exercise?
In general, how does the bootstrap interval change when the confidence level increases?
Knit to PDF to create a PDF document. Knit and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Please only upload your PDF document to Gradescope. Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.