class: center, middle, inverse, title-slide # Data science ethics ### Becky Tang ### 06.03.2021 --- layout: true <div class="my-footer"> <span> <a href="http://datasciencebox.org" target="_blank">datasciencebox.org</a> </span> </div> --- ## Topics - Misrepresenting data - Privacy - Algorithmic bias --- class: middle, center ## Misrepresenting data --- .question[ What is the difference between these two pictures? Which presents a better way to represent these data? ] <br> <img src="img/09/axis-start-at-0.png" width="80%" style="display: block; margin: auto;" /> .footnote[Ingraham, C. (2019) ["You’ve been reading charts wrong. Here’s how a pro does it."](https://www.washingtonpost.com/business/2019/10/14/youve-been-reading-charts-wrong-heres-how-pro-does-it/), The Washington Post, 14 Oct.] --- .question[ Do you recognize this map? What does it show? ] <img src="img/09/election-2016-county.png" width="650" style="display: block; margin: auto;" /> .footnote[Gamio, L. (2016) ["Election maps are telling you big lies about small things"](https://www.washingtonpost.com/graphics/politics/2016-election/how-election-maps-lie/), The Washington Post, 1 Nov.] --- <img src="img/09/cairo-what-matters.png" width="900" style="display: block; margin: auto;" /> .footnote[Credit: Alberto Cairo, [Visual Trumpery talk](https://visualtrumperytour.wordpress.com/).] --- .question[ Is this visualization telling the complete story? What's missing? ] <img src="img/09/diminishing-return.jpg" width="50%" style="display: block; margin: auto;" /> .footnote[Credit: [Statistics How To](https://www.statisticshowto.com/probability-and-statistics/descriptive-statistics/misleading-graphs/)] --- .question[ ...smoking cigarettes can help you live longer? ] <img src="img/09/cigarettes1.png" width="50%" style="display: block; margin: auto;" /> .footnote[ [How Charts Lie: Getting Smarter About Visual Information](https://wwnorton.com/books/9781324001560) ] --- > Missing information <img src="img/09/cigarettes2.png" width="50%" style="display: block; margin: auto;" /> .footnote[ [How Charts Lie: Getting Smarter About Visual Information](https://wwnorton.com/books/9781324001560) ] --- > Simpson's paradox <img src="img/09/cigarettes3.png" width="50%" style="display: block; margin: auto;" /> .footnote[ [How Charts Lie: Getting Smarter About Visual Information](https://wwnorton.com/books/9781324001560) ] --- > Individual level <img src="img/09/cigarettes4.png" width="70%" style="display: block; margin: auto;" /> .footnote[ [How Charts Lie: Getting Smarter About Visual Information](https://wwnorton.com/books/9781324001560) ] --- class: middle, center ## Privacy --- ## OK Cupid Data Breach - In 2016, researchers published data of 70,000 OkCupid users—including usernames, political leanings, drug usage, and intimate sexual details. >"Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form."" > Researchers Emil Kirkegaard and Julius Daugbjerg Bjerrekær - Although the researchers did not release the real names and pictures of the OkCupid users, critics noted that their identities could easily be uncovered from the details provided—such as from the usernames. <br> <small>[*OKCupid Study Reveals the Perils of Big-Data Science*](https://www.wired.com/2016/05/okcupid-study-reveals-perils-big-data-science/)</small> --- class: middle .question[ In analysis of data individuals willingly shared publicly on a given platform (e.g. social media data), how do you make sure you don't violate reasonable expectations of privacy? ] <img src="img/09/okcupid-tweet.png" width="70%" style="display: block; margin: auto;" /> --- ## Facebook & Cambridge Analytica <img src="img/09/facebook-cambridge-analytica-explained.jpg" width="60%" style="display: block; margin: auto;" /> <br> [How Cambridge Analytica turned Facebook 'likes' into a lucrative political tool](https://www.theguardian.com/technology/2018/mar/17/facebook-cambridge-analytica-kogan-data-algorithm) --- class: middle, center ## Algorithmic bias --- class: middle, center ## The Hathaway Effect --- <img src="img/09/hathaway.png" width="50%" style="display: block; margin: auto;" /> .footnote[["Does Anne Hathaway News Drive Berkshire Hathaway's Stock?"](https://www.theatlantic.com/technology/archive/2011/03/does-anne-hathaway-news-drive-berkshire-hathaways-stock/72661/)] --- ## The Hathaway Effect - **Oct. 3, 2008** - Rachel Getting Married opens: .vocab[BRK.A up .44%] - **Jan. 5, 2009** - Bride Wars opens: .vocab[BRK.A up 2.61%] - **Feb. 8, 2010** - Valentine’s Day opens: .vocab[BRK.A up 1.01%] - **March 5, 2010** - Alice in Wonderland opens: .vocab[BRK.A up .74%] - **Nov. 24, 2010** - Love and Other Drugs opens: .vocab[BRK.A up 1.62%] - **Nov. 29, 2010** - Anne announced as co-host of the Oscars: .vocab[BRK.A up .25%] .footnote[[The Hathaway Effect: How Anne Gives Warren Buffet a Rise](https://www.huffpost.com/entry/the-hathaway-effect-how-a_b_830041)] --- ## Amazon's experimental hiring algorithm - Used AI to give job candidates scores ranging from one to five stars - much like shoppers rate products on Amazon, some of the people said - Company realized its new system was not rating candidates for software developer jobs and other technical posts in a gender-neutral way - Amazon’s system taught itself that male candidates were preferable >Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said. .footnote[Dastin, J. (2018) [Amazon scraps secret AI recruiting tool that showed bias against women](https://reut.rs/2Od9fPr), Reuters, 10 Oct.] --- ## Bias in health care risk algorithm - Algorithm to target patients for “high-risk care management” programs that seek to improve the care of patients with complex health needs by providing additional resources - Such program are considered effective at improving outcomes and satisfaction while reducing costs, but are themselves costly -> want to identify patients who would benefit the most - Algorithm’s designers used previous patients’ health care spending as a proxy for medical need -- What happened: Algorithm tended to assign lower risk score to black patients Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). [Dissecting racial bias in an algorithm used to manage the health of populations](https://science.sciencemag.org/content/366/6464/447/tab-figures-data), Science. --- ## Bias in health care risk algorithm <img src="img/09/illness_expenditure.png" width="50%" style="display: block; margin: auto;" /> > Issue: health care spending does not equal need. --- ## Bias in health care risk algorithm .pull-left[ <img src="img/09/risk_expenditure.png" width="70%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/09/risk_illness.png" width="70%" style="display: block; margin: auto;" /> ] > Even though black patients tend to have more severe medical conditions, algorithm is built to predict health care costs rather than illness --- ## Bias in algorithms used for sentencing -- <img src="img/09/propublica-criminal-sentencing.png" width="70%" style="display: block; margin: auto;" /> There’s software used across the country to predict future criminal activity. And it's biased... [Pro Publica, May 23, 2016](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing) --- class: middle >“Although these measures were crafted with the best of intentions, I am concerned that they inadvertently undermine our efforts to ensure individualized and equal justice,” he said, adding, “they may exacerbate unwarranted and unjust disparities that are already far too common in our criminal justice system and in our society.” > Then U.S. Attorney General Eric Holder (2014) --- ## ProPublica analysis .vocab[Data:] Risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 + whether they were charged with new crimes over the next two years --- ## ProPublica analysis .vocab[Results:] - 20% of those predicted to commit violent crimes actually did - Algorithm had higher accuracy (61%) when full range of crimes taken into account (e.g. misdemeanors) - Algorithm was more likely to falsely flag African American defendants as higher risk, at almost twice the rate as Caucasian defendants <img src="img/09/propublica-results.png" width="90%" style="display: block; margin: auto;" /> --- class: middle, center Read more at [propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing). --- class: center, middle ## Further study on data science ethics --- ## Further reading .pull-left[ <img src="img/09/ethics-data-science.jpg" width="60%" style="display: block; margin: auto;" /> ] .pull-right[ [Ethics and Data Science](https://www.amazon.com/Ethics-Data-Science-Mike-Loukides-ebook/dp/B07GTC8ZN7) by Mike Loukides, Hilary Mason, DJ Patil (free Kindle download) ] --- ## Further reading .pull-left[ <img src="img/09/how-charts-lie.jpg" width="60%" style="display: block; margin: auto;" /> ] .pull-right[ [How Charts Lie: Getting Smarter About Visual Information](https://wwnorton.com/books/9781324001560) by Alberto Cairo ] --- ## Further reading .pull-left[ <img src="img/09/weapons-of-math-destruction.jpg" width="60%" style="display: block; margin: auto;" /> ] .pull-right[ [Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy ](https://www.amazon.com/Ethics-Data-Science-Mike-Loukides-ebook/dp/B07GTC8ZN7) by Cathy O'Neil ] --- ## Further watching .center[ <iframe width="560" height="315" src="https://www.youtube.com/embed/MfThopD7L1Y" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> Predictive Policing: Bias In, Bias Out by Kristian Lum ] --- ## Parting thoughts - At some point during your data science journey you will learn tools that can be used unethically - You might also be tempted to use your knowledge in a way that is ethically questionable either because of business goals or for the pursuit of further knowledge (or because your boss told you to do so) .question[ How do you train yourself to make the right decisions (or reduce the likelihood of accidentally making the wrong decisions) at those points? ] --- ## Do good with data - Data Science for Social Good: - [at the University of Chicago](http://www.dssgfellowship.org/) - [at the Alan Turing Institute](https://www.turing.ac.uk/collaborate-turing/data-science-social-good) - [DataKind](https://www.datakind.org/): DataKind brings high-impact organizations together with leading data scientists to use data science in the service of humanity - [Pledge to promote data values & practices](https://datapractices.org/manifesto/)