Announcements

No class next Monday, July 5. Enjoy your long weekend!

Monday office hours are cancelled, and rescheduled to Tuesday, July 6 from 9:00-10:30am.

Project proposal due in a week from tomorrow (July 9)

  • Write the proposal in the proposal.Rmd file in the project repo, and upload your data into the data folder. See the project description for more information.

  • Proposal should include:
    • a description of the data
    • defintion of the variables you intend to explore
    • EDA
    • at least two questions that you are interested in answering using the data

Exercises

The Pioneer Valley Planning Commission collected data in Florence, MA for 90 days from April 5 to November 15, 2005 using a laser sensor, with breaks in the laser beam recording when a rail-trail user passed the data collection station. The dataset is called RailTrail, and we will be examining the following variables:

library(mosaicData)
data(RailTrail)

Exercise 1

Fit a linear model using the daily high temperature to predict the estimated number of trail users. Calculate the \(R^2\) and \(Adj. R^2\) for the model.

Exercise 2

Fit a linear model using the daily high temperature and whether it is the fall season to predict the estimated number of trail users. Calculate the \(R^2\) and \(Adj. R^2\) for the model.

Exercise 3

Which model has the higher \(R^2\)? Which model has the higher \(Adj. R^2\)? Which model is a better fit for the data?

Exercise 4

Let’s use the model from Exercise 1. Display the output for the model from Exercise 1 with the 90% confidence interval for the model coefficients.

Interpret the coefficient of hightemp in the context of the data.

Exercise 5

Interpret the 90% confidence interval for hightemp in the context of the data.

Exercise 6

Does the intercept have a reasonable interpretation? If so, interpret the intercept in the context of the data. Otherwise, explain why not.

Exercise 7

Conduct a hypothesis test to assess if there is a linear relationship the high temperature and number of trail users. State the null and alternative hypotheses, the test statistic, p-value, and conclusion in the context of the data.

Exercise 8

Is the confidence interval consistent with the conclusion from the hypothesis test in the previous exercise? Briefly explain why or why not.

Exercise 9

Let’s check the model conditions. Create the diagnostic plots use them to determine if the four model conditions are met.