City College, Fall 2018
Intro to Data Science
Week 6: Regression vs. Classification
October 15, 2018
Today's Agenda
- Regression vs. Classification
- Logistic Regression
- Measuring Performance for Classification Models
- Midterm Recap
Week 5 Recap
- Linear Regression
- Assumptions for Linear Models
- Measuring Performance for Linear Models
HW Recap
- How was DataCamp?
- Data Exercise
- Was the exercise clear?
- Why would it be problematic to model with two collinear predictors?
- How would you apply linear regression to your project data?
Data Science Models
A flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution.
Regression vs. Classification
Is this a good forecast?
Regression analysis estimates the conditional expectation of the dependent variable given the independent variables.
Classification is the problem of identifying to which of a set of categories a new observation belongs.
Regression vs. Classification
Logistic Regression
Regression vs. Classification
Logistic Regression Output
Logisitic regression and many other classification models output a continuous value between 0 and 1.
Measuring Classification Performance
- Confusion Matrix
- Precision
- Recall
- Accuracy
(as explained by the zombie apocalypse)
Confusion Matrix
Precision
zombie apocalypse use case: you're hunting zombies, and you need to kill as many zombies as possible without killing any humans
Recall
zombie apocalypse use case: you discover a cure for zombies, but can only apply it k infected people
Accuracy
zombie apocalypse use case: zombies have infected roughly half the population, and you're throwing them a party. you are putting together an invite list and want to make sure you invite an equal amount of zombies and humans.
Wrap Up
- Linear Regression
- Assumptions for Linear Models
- Measuring Performance for Linear Models
- Regression vs. Classification
- Logistic Regression
Midterm: October 22, 6:30pm
- 45 Minute Written Exam
- No computer needed, closed book, closed notes
- Part I: Multiple Choice
- Part II: Short Answer
Week 2: Where to Find Data
- The data science lifecycle.
- Structured vs unstructured data.
- Common sources of data.
- Common ways to access data.
Week 3: Processing and Cleaning Data
- Elements of the ETL Process
- Processing Tools
- Data Cleaning Considerations for Data Scientists
- Missing Value
- Handling Outliers
- Normalizing Data
Week 4: Statistics and the Stories We Tell Ourselves
- Types of Data
- Useful Statistical Distribution
- Important Summary Statistics
- Independence
- Key Theorems
Week 5: Intro to Linear Models
- What Makes Linear Regression Linear
- Assumptions for Linear Models
- Measuring Performance for Linear Models
Week 6: Regression vs Classification
- Regression vs. Classification
- Logistic Regression
- Measuring Performance for Classification Models