Discussion guidelines

Data representation

Misleading data visualizations1

What baby boomers think

  • What is the graph trying to show?
  • Why is this graph misleading?
  • How can you improve this graph?


Brexit

  • What is the graph trying to show?
  • Why is this graph misleading?
  • How can you improve this graph?

Spurious correlation2

  • What is the graph trying to show?
  • Why is this graph misleading?

More reading on data visualization

Discussion questions

Web scraping

A researcher is interested in the relationship of weather to sentiment (positivity or negativity of posts) on Twitter. They want to scrape data from https://www.wunderground.com and join that to Tweets in that geographic area at a particular time. One complication is that Weather Underground limits the number of data points that can be downloaded for free using their API (application program interface). The researcher sets up six free accounts to allow them to collect the data they want in a shorter time-frame.3

  • What ethical considerations might be violated by this approach to data scraping?
  • What can the researcher do to collect the data in an ethical way?

Posting data from social media

A data analyst received permission to post a data set that was scraped from a social media site. The full data set included name, screen name, email address, geographic location, IP (Internet protocol) address, demographic profiles, and preferences for relationships. The analyst removes name and email address from the data set in effort to deidentify it.4

Developing algorithms

A company uses a machine learning algorithm to determine which job advertisement to display for users searching for technology jobs. Based on past results, the algorithm tends to display lower paying jobs for women than for men (after controlling for other characteristics than gender).5

Case study: Optimizing Schools

Princeton Dialogues on AI and Ethics


  1. Source: https://humansofdata.atlan.com/2019/02/dos-donts-data-visualization/

  2. Source: https://www.tylervigen.com/spurious-correlations

  3. Source: Modern Data Science with R, 2nd Edition

  4. Source: Modern Data Science with R, 2nd Edition

  5. Source: Modern Data Science with R, 2nd Edition