This course consists of a survey of analytical tools and concepts in data science, with goal of equipping students with an understanding of the best practices used by professional data scientists and analysts in top companies in technology, finance, and media. The course begins with an overview of fundamentals in data handling and exploratory data analysis, followed by an introduction to core concepts in statistical modeling and machine learning, and concludes with a brief introduction advanced concepts in data science.
Students will work with a wide variety of real world data sets throughout the course in order to gain hands on experience. Emphasis will be placed on frequent practice through writing and reviewing code each week. In addition, students will be assigned and expected to discuss short reading assignments ranging from academic reviews of popular topics in analytics as well as data science and engineering blog posts from companies such as Airbnb, Spotify, and Facebook. Tasks and readings will aim to demystify the work of data teams in the real world, and familiarize students with the concepts and resources needed to secure and succeed in analytical roles.
Important Dates
- Project Teams Formed, October 1.
- Project Proposals Due via Email, October 19.
- Midterm Exam (45 min) Monday, October 22nd.
- First Project Update, November 5.
- Second Project Update, November 26.
- Projects Due Finals period.
Recommended Texts and Materials
- Required Text: Data Science from Scratch, Joel Grus. 2nd Edition, April 2015 (O'Reilly). Available online.
- Additional required readings and videos will be made available to students in advance of each week's assignments. All will be availble online at no cost.
- In addition to the required materials, students may find the following resources helpful in supplementing course materials:
- Recommended Text: Python for Data Analysis, Wes McKinney. 2nd Edition, October 2017 (O'Reilly). Available online.
- Recommended Text: Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani and Jerome Friedman. 2nd Edition, 2009 (Springer). Available free online here.
The CUNY Policy on Academic Integrity governs behavior in this class. Academic dishonesty is prohibited in the City University of New York and is punishable by penalties, including failing grades, suspension, and expulsion.
Schedule
Week 1: August 27
- Course Intro: What is Data Science and Why Does It Matter?
Week 2: September 5 (Wednesday)
- Data Exploration 1: How to Get Data
- Assignment 2: Due Monday, September 17 before class (6:30pm EST).
Week 3: September 17
- Data Exploration 2: Processing and Cleaning Data
- Assignment 3: Due Monday, September 24 before class (6:30pm EST).
Week 4: September 24
- Data Exploration 3: Statistics and Stories We Tell Ourselves
- Assignment 4: Due Monday, October 1 before class (6:30pm EST).
Week 5: October 1
- Models 1: Intro to Regression and Classification
- Assignment 5: Due Monday, October 15 before class (6:30pm EST).
Week 6: October 15
- Models 2: Regression and Classification, Part 2
Week 7: October 22
- Midterm (answer key)
- Applications in Classification
Week 8:
Week 9: November 5
- ML 1: Trees, Bias vs. Variance Tradeoffs
- Assignment 6: Due Monday, November 12 before class (6:30pm EST).
Week 10: November 12
- ML 2: Performance Evaluation and Ensemble Models
Week 11: November 19
- ML 3: NLP, Text as Data, and Bayes Rule
Week 12: November 26
- ML 4: Unsupervised Learning
Week 13: December 3
Week 14: December 10
- Life in Data: Careers Tips and Ethics