class: center, middle, inverse, title-slide # Welcome to Intro to Data Science ### Becky Tang ### 05.12.21 --- class: center, middle # Welcome! --- ## What is Data Science? *"Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to understand and analyze actual phenomena with data. It employs techniques and theories drawn from many fields within the context of <font class="vocab">mathematics, statistics, information science, and computer science</font>."* .pull-right[ [-Wikipedia](https://en.wikipedia.org/wiki/Data_science) ] We will be leveraging the programming language R to introduce ourselves to data science. --- ## Necessary background We assume ZERO background in data science, statistics, and coding. We will help you get started with learning statistics, but this course will *not* be your typical intro stats course. There is a large emphasis on computing, so all we need is an openness to learn! (and a laptop) That being said, there is a lot of material in this course! *Please make sure you are enrolled in both Summer Sessions I and II, along with corresponding labs*. --- class: regular ## Instructor [Becky Tang](https://beckytang.rbind.io/) <i class="material-icons">mail_outline</i> [becky.tang@duke.edu](mailto:becky,tang@duke.edu)<br> <i class="material-icons">calendar_today</i> M 1:30-3:00pm; W 10:30am-12:00pm -- .pull-left[ <img src="img/00/eBird.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="img/00/restio.jpg" width="100%" style="display: block; margin: auto;" /> ] --- ## Teaching Assistants <small> [Marc Brooks](https://www.linkedin.com/in/marc-g-brooks) <i class="material-icons">mail_outline</i> [marc.brooks@duke.edu](mailto:marc.brooks@duke.edu)<br> <i class="material-icons">calendar_today</i> TBD </small> --- ## Where to find information - Course website: [https://sta199-summer2021.github.io/website/](https://sta199-summer2021.github.io/website/) - GitHub (assignments): [https://github.com/sta199-summer2021](https://github.com/sta199-summer2021) --- ## Course Objectives - Learn to explore, visualize, and analyze data in a reproducible and shareable manner - Gain experience in data wrangling and munging, exploratory data analysis, predictive modeling, and data visualization - Work on problems and case studies inspired by and based on real-world questions and data - Learn to effectively communicate results through written assignments and final project presentation --- class: middle, center ## Examples of Data Science --- ## Billboard Hot 100 Analytics Analysis: <small>[https://towardsdatascience.com/billboard-hot-100-analytics-using-data-to-understand-the-shift-in-popular-music-in-the-last-60-ac3919d39b49](https://towardsdatascience.com/billboard-hot-100-analytics-using-data-to-understand-the-shift-in-popular-music-in-the-last-60-ac3919d39b49)</small> <br><br> [GitHub repository with data and R scripts](https://github.com/RosebudAnwuri/TheArtandScienceofData/tree/master/The%20Making%20of%20Great%20Music) --- ## Atlas of Redistricting [https://projects.fivethirtyeight.com/redistricting-maps/north-carolina/](https://projects.fivethirtyeight.com/redistricting-maps/north-carolina/) --- ## A Year as Told by Fitbit [https://livefreeordichotomize.com/2017/12/27/a-year-as-told-by-fitbit/](https://livefreeordichotomize.com/2017/12/27/a-year-as-told-by-fitbit/) --- ## TED Talk [https://www.youtube.com/watch?v=hVimVzgtD6w](https://www.youtube.com/watch?v=hVimVzgtD6w) --- class: center, middle ## Your Turn! --- ## Create a GitHub account <small> <small> .instructions[ Go to https://github.com/, and create an account (unless you already have one). After you create your account, click [here](https://forms.gle/mmRshneG5QyjTzty9) and enter your GitHub username. ] Tips for creating a username from [Happy Git with R](http://happygitwithr.com/github-acct.html#username-advice). - Incorporate your actual name! - Reuse your username from other contexts if you can, e.g., Twitter or Slack. - Pick a username you will be comfortable revealing to your future boss. - Shorter is better than longer. - Be as unique as possible in as few characters as possible. - Make it timeless. Don’t highlight your current university, employer, or place of residence. - Avoid words laden with special meaning in programming, like `NA`. </small> .instructions[ Raise your hand if you have any questions. ] --- ## Time for an analysis! - Let's take a look at the [UN Votes](https://sta199-summer2021.github.io/website/appex/00-unvotes.html) analysis --- ## Discussion Share with the class! 1. Start by introducing yourself! Name, year, major/ academic interest, favorite hobby. 2. Consider the plot in Part 1. Describe how the voting behaviors of the five countries have changed over time. 3. Consider the plot in Part 2. - On which issues have the three countries voted most similarly in recent years? - On which issues have they voted most differently in recent years? - Has this changed over time? --- class: middle, center ## Course Policies --- ## Class Meetings -- <font class="vocab">Lecture</font> - Focus on concepts behind data analysis - Interactive lecture that includes examples and hands-on exercises - Bring fully-charged laptop to every lecture - Please let me know as soon as possible if you do not have access to a laptop -- <font class="vocab">Lab</font> - Focus on computing using R `tidyverse` syntax - Apply concepts from lecture to case study scenarios - Bring fully-charged laptop to every lab --- ## Textbooks - [OpenIntro Statistics, 4th Edition](https://www.openintro.org/stat/textbook.php?stat_book=os) - Free PDF available online. Hard copy available for purchase. - Assigned readings about statistical content - [R for Data Science](http://r4ds.had.co.nz/) - Free online version. Hard copy available for purchase. - Assigned readings and resource for R coding using `tidyverse` syntax. --- ## Activities & Assessments - <font class="vocab">Homework</font>: Three individual assignments combining conceptual and computational skills. -- - <font class="vocab">Labs</font>: Assignments focusing on computational skills. Start labs on Friday and due Tuesday. *Lowest score will be dropped.* -- - <font class="vocab">Exams</font>: Two take-home exams. -- - <font class="vocab">Final Project</font>: Project presented during the class on **Wednesday-Thursday, July 21-22 9:00-10:15am**. You must complete the project and present in class to pass the course. .instructions[ Let me know if you are unable to attend the project presentations due to differing time zones. ] --- ## Activities & Assessments -- - <font class="vocab">Application Exercises</font>: Exercises usually started in class and when assigned, due by the next class. Graded based on a good-faith effort has been made in attempting all parts. Successful on-time completion of at least 90% will result in full points for that AE; anything lower than that will be assigned points accordingly. *Lowest two scores will be dropped.* -- - <font class="vocab">Data Visualization Examples</font>: Share a visualization that you find interesting. Around one presentation per week. --- ## Grade Calculation <small> | Component | Weight | |---------------|--------| | Homework | 17.5%| | Labs | 15% | | Exam 1 | 17.5% | | Exam 2 | 17.5% | | Final Project | 17.5% | | Participation & Application Exercises | 10% | | Data Visualization Examples | 5%| -- - If you have a cumulative numerical average of 90 - 100, you are guaranteed at least an A-, 80 - 89 at least a B-, and 70 - 79 at least a C-. The exact ranges for letter grades will be determined after Exam 2. - You are expected to attend lectures and labs. Excessive absences can impact your final course grade. </small> --- ## Excused Absences - Students who miss a class due to a scheduled varsity trip, religious holiday, or short-term illness should fill out the respective form. - These excused absences do not excuse you from assigned work. -- - If you have a personal or family emergency or chronic health condition that affects your ability to participate in class, please contact your academic dean’s office. -- - Exam dates cannot be changed and no make-up exams will be given. --- ## Late Work & Regrade Requests - Homework assignments: - Late but within 12 hours of deadline: no penalty - After the 12 hours, there is a 20% penalty for each day the assignment is late - Late work will not be accepted for the take-home exams or final project. - Regrade requests must be submitted within one week of when the assignment is returned using the link posted in the course syllabus --- ## Academic Honesty All work for this class should be done in accordance with the Duke Community Standard. > To uphold the Duke Community Standard: > - I will not lie, cheat, or steal in my academic endeavors; > - I will conduct myself honorably in all my endeavors; and > - I will act if the Standard is compromised. Any violations will automatically result in a grade of 0 on the assignment and will be reported to [Office of Student Conduct](https://studentaffairs.duke.edu/conduct) for further action. --- ## Reusing Code - Unless explicitly stated otherwise, you may make use of online resources (e.g. StackOverflow) for coding examples on assignments. If you directly use code from an outside source (or use it as inspiration), you must or explicitly cite where you obtained the code. Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. - On individual assignments, you may discuss the assignment with one another; however, you may not directly share code or write up with other students. - On team assignments, you may not directly share code or write up with another team. Unauthorized sharing of the code or write up will be considered a violation for all students involved. --- ## Where to find help - **If you have a question during lecture or lab, feel free to ask it!** There are likely other students with the same question, so by asking you will create a learning opportunity for everyone. -- - **Office Hours**: A lot of questions are most effectively answered in-person, so office hours are a valuable resource. Please use them! -- - **Piazza**: Outside of class and office hours, any general questions about course content or assignments should be posted on Piazza since there are likely other students with the same questions. If you weren't invited to join our Piazza, sign up using [this link](piazza.com/duke/summer2021/sta199l101l1su21). --- ## Academic Resource Center Sometimes you may need help with the class that is beyond what can be provided by the teaching team. In that instance, I encourage you to visit the Academic Resource Center. <br><br> .small[ The [Academic Resource Center (ARC)](https://arc.duke.edu) offers free services to all students during their undergraduate careers at Duke. Services include Learning Consultations, Peer Tutoring and Study Groups, ADHD/LD Coaching, Outreach Workshops, and more. Because learning is a process unique to every individual, they work with each student to discover and develop their own academic strategy for success at Duke. Contact the ARC to schedule an appointment. Undergraduates in any year, studying any discipline can benefit! ] --- ## Accessibility Please contact the [Student Disability Access Office (SDAO)](https://access.duke.edu) if there is an element of the course that is not accessible to you. There you can engage in a confidential conversation about the process for requesting reasonable accommodations. <br><br> Please note that accommodations are not provided retroactively, so please contact them as soon as possible. More information can be found online at [access.duke.edu](https://access.duke.edu). --- ### Inclusion In this course, we will strive to create a learning environment that is welcoming to all students and that is in alignment with [Duke’s Commitment to Diversity and Inclusion](https://provost.duke.edu/initiatives/commitment-to-diversity-and-inclusion). If there is any aspect of the class that is not welcoming or accessible to you, please let me know immediately. <br><br> Additionally, if you are experiencing something outside of class that is affecting your performance in the course, please feel free to talk with me and/or your academic dean. --- class: center, middle ## Questions? --- ## Announcements - Fill out the **Getting To Know You Survey on Sakai** - due Sunday, 5/16 at 11:59pm - Please make sure you have access to Sakai and Gradescope - I will have abbreviated office hours immediately after class today - Usual office hours start next week, including TA office hours (see Sakai for Zoom links)