The goal of today’s lab is to practice statistical inference using both simulation procedures and the Central Limit Theorem. The data for today’s lab may be found by cloning your repository available at the class GitHub repository. Use the lecture notes, readings, and application exercises to help you complete the lab. You can also use this chart on simulation-based inference to help you determine the appropriate sampling scheme when conducting simulation-based inference.
The dataset is adapted from Little et al. (2007), and contains voice measurements from individuals both with and without Parkinson’s Disease (PD), a progressive neurological disorder that affects the motor system. The aim of Little et al.’s study was to examine whether they could diagnose PD by examining the spectral (sound-wave) properties of patients’ voices.
147 measurements were taken from patients with PD, and 48 measurements were taken from healthy controls. For the purposes of this lab, you may assume that measurements are representative of the underlying populations (PD vs. healthy).
The variables in the dataset are as follows:
clip
: ID of the recording numberjitter
: a measure of variation in fundamental frequencyshimmer
: a measure of variation in amplitudehnr
: a ratio of total components vs. noise in the voice recordingstatus
: PD vs. Healthyavg.f.q
: 1, 2, or 3, corresponding to average vocal fundamental frequency (1 = low, 2 = mid, 3 = high)You may load in the data with the following code, where ____
should be replaced by a meaningful name of your choosing:
$\mu$
in the narrative. To write \(\alpha\), type in $\alpha$
in your narrative. To write \(\neq\), type in $\neq$
in your narrative.
Is there enough evidence to suggest that the mean HNR in the voice recordings of the healthy patients is significantly different from 25 at the \(\alpha\) = 0.05 significance level? Conduct this hypothesis test using a simulation method.
Write out the null and alternative hypotheses for this question in both words and symbols.
Display a visualization of your simulated null distribution, and describe the values that would cause you to reject your null hypothesis (called the rejection region). Does our observed sample mean lie in this rejection region?
What is your p-value, decision, and conclusion in context of the research question?
Given your conclusion in Exercise 3, which type of error could you possibly have made? What would making such an error mean in the context of the research question?
Researchers suspect that patients with PD are less able to control their vocal muscles, and thus may have a different HNR (tonal component to noise ratio) compared to healthy volunteers. Thus, they are interested in whether the mean HNR in voice recordings among patients with PD is statistically significantly different from 24.7 at the 0.05 significance level. Conduct this hypothesis test using the Central Limit Theorem.
Hint: Be careful about which distribution you use to answer this question.
What is the distribution of the test statistic under the null hypothesis, the test statistic itself, the p-value, decision, and conclusion in context of the research question?
Given your conclusion in Exercise 6, which type of error could you possibly have made? What would making such an error mean in the context of the research question?
Would you expect a 95% confidence interval computed using the same data to contain 24.7 or not? Explain.
Suppose we are now interested in testing whether a correlation exists between voice jitter and voice shimmer among healthy volunteers. Test whether the correlation between these two values is non-zero at the \(\alpha\) = 0.01 level. Conduct this hypothesis test using a simulation method.
As an aside, correlation is given in symbols by \(\rho\).
Hint: Refer to Lab 05 for what to specify()
. Use hypothesize(null = “independence”)
. The type of simulated data we will generate()
depends on two quantities/variables; consult the chart.
Display a visualization of your simulated null distribution, and describe the values that would cause you to reject your null hypothesis. Does the observed sample correlation lie in this rejection region?
What is your p-value, decision, and conclusion in context of the research question?
What is the probability you’ve made a Type 1 error? If you cannot tell for sure, explain why. Similarly, what is the probability you’ve made a Type 2 error? If you cannot tell for sure, explain why.
Knit to PDF to create a PDF document. Knit and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Please only upload your PDF document to Gradescope. Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.