City College, Spring 2019

Intro to Data Science

Course Intro: What is Data Science and Why Does It Matter?

January 28, 2019

What is Data Science?
Is that helpful?
Who are Data Scientists?
About Me

About Tech in Residence

About this Course
Official Course Objectives
  1. Explain the key steps in a data science project.
  2. Apply Python to load, clean, and process data sets.
  3. Identify key elements of and patterns in a data set using computational analysis and statistical methods.
  4. Explain and visualize empirical findings using with Python and other resources.
  5. Explain fundamental principles of machine learning.
  6. Apply predictive algorithms to a data set.
  7. Work effectively in a team dedicated to analyzing data.
Why Take this Course?
  1. Careers in data are abundant, lucrative, and rewarding.
  2. Learn how to detect BS.
  3. Be a more informed person.
Resources: Coding

Resources: DataCamp
Resources: Notebooks

Resources: Class Communications
Course Page
How to Get Help
How to Get Help
How to Get Help
Project 30%
Assignments & Quizzes 30%
Midterm Exam 30%
Class Participation 10%

The bulk of the course grade will be a group project that will be due in the last class on May 13. Students will be expected to work on the project during the second half of the class and will be required to present their progress throughout the course of the semester. Grades will be assigned on the basis of overall project quality, demonstration of core principles taught in the class, and individual contributions to the group's effort. More details on the project will discussed in the second week of class.

Assignments and Exams
  • Assigments. This class includes frequent assignments to check comprehension, predominantly through DataCamp. All assignments and quizzes will be graded on a 10-point scale. All quizzes will be announced in advance of class.
    • No late assignments accepted. Assignments not turned in by the set deadline will be scored as 0/10. Exceptions will be granted only as mandated by CUNY policy.
    • Worst two assignments dropped, includes missed assignments.
  • Exam. A short midterm exam will be held in March and will focus on broad concepts the course has surveyed thus far. The format will mimic the style of questions frequently asked in interviews for data-related roles.
Texts and Materials
  • Required Text: Data Science from Scratch, Joel Grus. 2nd Edition, April 2015 (O'Reilly). Available online.
  • Additional required readings and videos will be made available to students in advance of each week's assignments. All will be available online at no cost.
  • In addition to the required materials, students may find the following resources helpful in supplementing course materials:
    • Recommended Text: Python for Data Analysis, Wes McKinney. 2nd Edition, October 2017 (O'Reilly). Available online.
    • Recommended Text: Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani and Jerome Friedman. 2nd Edition, 2009 (Springer). Available free online here.

Academic dishonesty is prohibited in The City University of New York. Penalties for academic dishonesty include academic sanctions, such as failing or otherwise reduced grades, and/or disciplinary sanctions, including suspension or expulsion.

CUNY Policy on Academic Integrity
Data Science in Practice
Today's Data
H1-B Visa Data
Let's Dive Into Some Data!


First assignment

Due Sunday, February 3 at 11:59pm
  • Email with:
    1. Find three datasets you're interested in working with over the course the semester, and send me:
      • The URL providing access to and documentation for each of the data sets
      • One question you think you could answer with each of the datasets
      • One interesting summary statistic (mean, median, etc) from one of the datasets
    2. How you prefer to be addressed in class - name and pronounciation.
    3. The email you prefer to correspond in with the class (and use for DataCamp).
    4. Your GitHub handle. (Sign up for one if you do not already have it, a free account is fine.)