class: center, middle, inverse, title-slide # Conditional Probability ### Becky Tang ### 05.27.2021 --- layout: true <div class="my-footer"> <span> <a href="http://datasciencebox.org" target="_blank">datasciencebox.org</a> </span> </div> --- ## Conditional probability The probability an event will occur *given* that another event has already occurred is a .vocab[conditional probability]. The conditional probability of event `\(A\)` given event `\(B\)` is: .instructions[ `$$P(A | B) = \frac{P(A \text{ and } B)}{P(B)}$$` ] --- ## Conditional probabilities .instructions[ `$$P(A | B) = \frac{P(A \text{ and } B)}{P(B)}$$` ] Examples come up all the time in the real world: - *Given* that it rained yesterday, what is the probability that it will rain today? - *Given* that a mammogram comes back positive, what is the probability that a woman has breast cancer? - *Given* that I've already watched six episodes of How I Met Your Mother tonight, what is the probability that I'll get any work done this evening? --- ## Coffee and mortality <img src="img/06/coffee.png" width="700" style="display: block; margin: auto;" /> .midi[ | | Did not die| Died| |:--------------------------|-----------:|----:| |Does not drink coffee | 5438| 1039| |Drinks coffee occasionally | 29712| 4440| |Drinks coffee regularly | 24934| 3601| ] --- ## Three probabilities .midi[ | | Did not die| Died| |:--------------------------|-----------:|----:| |Does not drink coffee | 5438| 1039| |Drinks coffee occasionally | 29712| 4440| |Drinks coffee regularly | 24934| 3601| ] <br> .question[ .midi[ Define events *A* = died and *B* = non-coffee drinker. Calculate the following for a randomly selected person in the cohort:] - .vocab[Marginal probability]: `\(P(A)\)`, `\(P(B)\)` - .vocab[Joint probability]: `\(P(A \text{ and } B)\)` - .vocab[Conditional probability]: `\(P(A | B)\)`, `\(P(B | A)\)` ] --- class: center, middle # Independence --- ## The multiplicative rule We can write the definition of conditional probability .instructions[ `$$P(A | B) = \frac{P(A \text{ and } B)}{P(B)}$$` ] -- .instructions[ Using the equation above, we get... `$$P(B) \times P(A | B) = P(A \text{ and } B)$$` ] .center[ **What does the multiplicative rule mean in plain English?** ] --- ## Defining independence Events `\(A\)` and `\(B\)` are said to be .vocab[independent] when `$$P(A | B) = P(A) \hspace{10mm} \textbf{OR} \hspace{10mm} P(B | A) = P(B)$$` <br> In other words, knowing that one event has occurred doesn't cause us to "adjust" the probability we assign to another event. --- ## Checking independence We can use the multiplicative rule to see if two events are independent. .instructions[ If events `\(A\)` and `\(B\)` are independent, then `$$P(A \text{ and } B) = P(A) \times P(B)$$` ] --- ## Independent vs. disjoint events Since for two independent events `\(P(A|B) = P(A)\)` and `\(P(B|A) = P(B)\)`, knowing that one event has occurred tells us nothing more about the probability of the other occurring. -- For two disjoint events `\(A\)` and `\(B\)`, knowing that one has occurred tells us that the other definitely has not occurred: `\(P(A \text{ and } B) = 0\)`. -- .instructions[ .center[ Disjoint events are **<u>not</u>** independent! ] ] --- ## Checking independence | | Did not die| Died| |:--------------------------|-----------:|----:| |Does not drink coffee | 5438| 1039| |Drinks coffee occasionally | 29712| 4440| |Drinks coffee regularly | 24934| 3601| <br> .question[ Are dying and abstaining from coffee independent events? How might we check? ] --- class: center, middle # Bayes' Rule --- ## An example In an introductory statistics course, 50% of students were first years, 30% were sophomores, and 20% were upperclassmen. 80% of the first years didn’t get enough sleep, 40% of the sophomores didn’t get enough sleep, and 10% of the upperclassmen didn’t get enough sleep. .question[ What is the probability that a randomly selected student in this class didn’t get enough sleep? ] --- ## Bayes' Rule As we saw before, the two conditional probabilities `\(P(A | B)\)` and `\(P(B | A)\)` are not the same. But are they related in some way? -- Yes they are (!) using .vocab[Bayes' rule]: .instructions[ **Bayes' rule:** `$$\begin{align}P(A | B) &= \frac{P(A \text{ and } B)}{P(B)}\\[10pt] &= \frac{P(B | A)P(A)}{P(B)} \end{align}$$` ] --- ## Bayes' Rule (continued) Putting together a few rules of probability... `$$\begin{align}P(A | B) &= \frac{P(A \text{ and } B)}{P(B)}\\[10pt] &= \frac{P(B | A)P(A)}{P(B)}\\[15pt] &= \frac{P(B | A)P(A)}{P(B | A)P(A) + P(B | A^c)P(A^c)}\end{align}$$` Let's look at an example to see how this works. --- class: center, middle # Diagnostic Testing --- ## Definitions Suppose we're interested in the performance of a diagnostic test. Let `\(D\)` be the event that a patient has the disease, and let `\(T\)` be the event that the test is positive for that disease. - .vocab[Prevalence]: `\(P(D)\)` - .vocab[Sensitivity]: `\(P(T | D)\)` - .vocab[Specificity]: `\(P(T^c | D^c)\)` - .vocab[Positive predictive value]: `\(P(D | T)\)` - .vocab[Negative predictive value]: `\(P(D^c | T^c)\)` .question[ What do these probabilities mean in plain English? ] --- ## Rapid self-administered HIV tests .pull-left[ From the FDA package insert for the Oraquick ADVANCE Rapid HIV-1/2 Antibody Test, - Sensitivity, `\(P(T | D)\)`, is 99.3% - Specificity, `\(P(T^c | D^c)\)`, is 99.8% From CDC statistics in 2016, 14.3/100,000 Americans aged 13 or older are HIV+. ] .pull-right[ <img src="img/07/oraquick.png" width="400" style="display: block; margin: auto;" /> ] .question[ Suppose a randomly selected American aged 13+ has a positive test result. What is the probability they have HIV? ] --- ## Using Bayes' Rule `\begin{align*} P(D | T) &= \frac{P(D \text{ and } T)}{P(T)}\\ &= \frac{P(T | D)P(D)}{P(T)}\\[5pt] &= \frac{P(T | D)P(D)}{P(T | D)P(D) + P(T | D^c)P(D^c)}\\[5pt] &= \frac{P(T | D)P(D)}{P(T | D)P(D) + (1 - P(T^c | D^c))(1 - P(D))} \end{align*}` <br> .question[**What does all of this mean? Let's take a look!**] --- ## Work through example --- ## A discussion Think about the following questions: - Is this calculation surprising? - What is the explanation? - Was this calculation actually reasonable to perform? - What if we tested in a different population, such as high-risk individuals? - What if we were to test a random individual in a country where the prevalence of HIV is approximately 25%?