City College, Fall 2019

Intro to Data Science

Course Intro: What is Data Science and Why Does It Matter?

September 5, 2019

What is Data Science?
Is that helpful?
Who are Data Scientists?
About Me

About Tech in Residence

About this Course
Official Course Objectives
  1. Explain the key steps in a data science project.
  2. Apply Python to load, clean, and process data sets.
  3. Identify key elements of and patterns in a data set using computational analysis and statistical methods.
  4. Explain and visualize empirical findings using with Python and other resources.
  5. Explain fundamental principles of machine learning.
  6. Apply predictive algorithms to a data set.
  7. Work effectively in a team dedicated to analyzing data.
Why Take this Course?
  1. Careers in data are abundant, lucrative, and rewarding.
  2. Learn how to detect BS.
  3. Be a more informed person.
Resources: Coding

Resources: DataCamp
Resources: Notebooks

Resources: Class Communications
Course Page
How to Get Help
How to Get Help
How to Get Help
Project 30%
HW Assignments 30%
Midterm Exam 30%
Attendance, Quizzes, and Class Participation 10%

The bulk of the course grade will be a group project that will be due in advance of the last class on December 9. Students will be expected to work on the project during the second half of the class and will be required to present their progress throughout the course of the semester. Grades will be assigned on the basis of overall project quality, demonstration of core principles taught in the class, and individual contributions to the group's effort. More details on the project will discussed in the second week of class.

Assignments and Exams
  • Assigments. This class includes frequent assignments to encourage mastery of basic concepts and check comprehension, predominantly through DataCamp. All assignments and quizzes will be graded on a 10-point scale. All quizzes will be announced in advance of class.
    • Assignments not turned in by the set deadline are eligible to be completed for half credit by the final class on December 9. Exceptions will be granted only as mandated by CUNY policy.
  • Exam. A short midterm exam will be held in November and will focus on broad concepts the course has surveyed thus far. The format will mimic the style of questions frequently asked in interviews for data-related roles.
Texts and Materials
  • Required Text: Data Science from Scratch, Joel Grus. 2nd Edition, May 2019 (O'Reilly). Available online.
  • Additional required readings and videos will be made available to students in advance of each week's assignments. All will be available online at no cost.
  • In addition to the required materials, students may find the following resources helpful in supplementing course materials:
    • Recommended Text: Foundations of Data Science, Avrim Blum, John Hopcroft, and Ravindran Kannan. January 2018. Available free online here.
    • Recommended Text: Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani and Jerome Friedman. 2nd Edition, 2009 (Springer). Available free online here.
    • Recommended Text: Python for Data Analysis, Wes McKinney. 2nd Edition, October 2017 (O'Reilly). Available online.

Academic dishonesty is prohibited in The City University of New York. Penalties for academic dishonesty include academic sanctions, such as failing or otherwise reduced grades, and/or disciplinary sanctions, including suspension or expulsion.

CUNY Policy on Academic Integrity
Data Science in Practice
Let's talk about job negotiations.
Today's Data
H1-B Visa Data
Please take a second to fill out this survey.
Let's Dive Into Some Data!


First assignment

Due Sunday, September 8 at 11:59pm
  • Email with:
    1. How you prefer to be addressed in class - name and pronounciation.
    2. The email you prefer to correspond in with the class (and use for DataCamp).
    3. If not the same as #2, email for use with DataCamp.
    4. Three interesting facts from the H1-B data set we discussed in class, including one statistic you compute on you own.
      • These should not be things we discussed directly in class.
      • The statistic you send should include a number and a description, ie - The average salary for a pancake flipper was $50,000.
      • You need not spend much time on these, but do be sure you are able to load the data and interact with code on your own.
    5. Optional but strongly preferred - a photo of you that will help me recognize you.