Go to course organization on GitHub.
In addition to your private individual repositories, you should now see a repo with the prefix hw-03-life-expectancy-. Go to that repository.
Clone the repo and start a new project in RStudio.
Don’t forget to configure git:
In Lab 08, we considered simple linear regression models where we estimated the average life_expectancy
of each country using each one of the following variables: the number of years of schooling
, adult_mortality
, and average BMI category (BMI_cat
). In this homework, we will now fit multiple linear regression models for life_expectancy
. The data were modified from this Kaggle dataset, and are available in Sakai. Upload the data just like you did in Lab 08. Once you have done so, you can run the following code to load the packages and data to get started.
Variable name | Description |
---|---|
country |
Country |
year |
Year (2000-2015) |
status |
Developed or Developing country status |
life_expectancy |
life expentancy in age |
adult_mortality |
Adult Mortality Rates of both sexes (probability of dying between 15 and 60 years per 1000 population) |
infant_deaths |
Number of Infant Deaths per 1000 population |
alcohol |
Alcohol, recorded per capita (15+) consumption (in litres of pure alcohol) |
percentage_expenditure |
Expenditure on health as a percentage of Gross Domestic Product per capita (%) |
hepB |
Hepatitis B immunization coverage among 1-year-olds (%) |
measles |
Number of reported cases of Measles per 1000 population |
BMI |
Average Body Mass Index (BMI) of entire population |
under_five_deaths |
Number of under-five deaths per 1000 population |
polio |
Polio immunization coverage among 1-year-olds (%) |
total expenditure |
General government expenditure on health as a percentage of total government expenditure (%) |
diphtheria |
Diphtheria tetanus toxoid and pertussis immunization coverage among 1-year-olds (%) |
HIV_AIDS |
Deaths per 1000 live births HIV/AIDS (0-4 years) |
GDP |
Gross Domestic Product per capita (in USD) |
population |
Population of the country |
thinness_10_19 |
Prevalence of thinness among children and adolescents for Age 10 to 19 (% ) |
thinness_5_9 |
Prevalence of thinness among children for Age 5 to 9(%) |
income_composition |
Human Development Index in terms of income composition of resources (index ranging from 0 to 1) |
schooling |
Number of years of schooling |
BMI_cat
: the category that each BMI falls into, where BMI < 18.5 is “underweight”, 18.5 \(\leq\) BMI \(<\) 25 is “normal”, 25 \(\leq\) BMI \(<\) 30 is “overweight”, and BMI \(\geq 30\) is “obese”.Once you have created these variable, createa new dataset that retains only the following variables: country
, life_expectancy
, schooling
, adult_mortality
, BMI_cat
, and status
. Then omit all NAs. This is the dataset you will use for the remainder of the exercises.
schooling
, life_expectancy
, and the BMI_cat
of each country. Describe what you see.Fit a linear main effects model to predict average life expectancy
using the following variables: schooling
, status
, BMI_cat
, adult_morality
. Write out the equation of the fitted model, and interpret all the coefficients.
Obtain and interpet the \(R^2\) for the main effects model.
Fit a linear model to predict average life expectancy
using the same main effects as above, but now with the addition of an interaction between BMI_cat
and schooling
. Write out the equation of the fitted model.
Write the regression equations for each level of BMI_cat
.
Compare the adjusted \(R^2\) for both models. Which model do you prefer and why?
Use the model you ultimately selected in Exercise 7 for the remainder of this section. Examine if the conditions to perform inference on the regression coefficients are satisfied.
Conduct a hypothesis test evaluating whether or not adult_mortality
is associated with life_expectancy
.
Compute the mean adult mortality for each BMI category. Replace the dashes in the following code with the mean adult mortality for each respective BMI_cat
. This code creates a new data frame to predict average life expectancy for hypothetical countries with the corresponding values for the explanatory variables.
Hint: Remember, we use the augment()
function to predict from a fitted model. You can use augment(<model>, newdata = <data_frame_for_prediction>)
to obtain predictions.
What are the predicted life expectancies for these hypothetical countries?