City College, Spring 2019
Intro to Data Science
Week 6: Regression vs. Classification
March 18, 2019
Today's Agenda
- Regression vs. Classification
- Logistic Regression
- Measuring Performance for Classification Models
- Midterm Recap
Week 5 Recap
- Linear Regression
- Assumptions for Linear Models
- Measuring Performance for Linear Models
Data Science Models
Key Metrics: Linear Regression
- R-squared
- Adjusted R-squared
- Coefficients
- P-values
R-Squared: Share of the target variation that is explained by the model.
A flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution.
Regression vs. Classification
Is this a good forecast?
Regression analysis estimates the conditional expectation of the dependent variable given the independent variables.
Classification is the problem of identifying to which of a set of categories a new observation belongs.
Regression vs. Classification
Logistic Regression
Regression vs. Classification
Logistic Regression Output
Logisitic regression and many other classification models output a continuous value between 0 and 1.
Measuring Classification Performance
- Confusion Matrix
- Precision
- Recall
- Accuracy
(as explained by the zombie apocalypse)
Confusion Matrix
Precision
zombie apocalypse use case: you're hunting zombies, and you need to kill as many zombies as possible without killing any humans
Recall
zombie apocalypse use case: you discover a cure for zombies, but can only apply it k infected people
Accuracy
zombie apocalypse use case: zombies have infected roughly half the population, and you're throwing them a party. you are putting together an invite list and want to make sure you invite an equal amount of zombies and humans.
Midterm: March 25, 6:30pm
- 90 Minute Written Exam
- No computer needed, closed book, closed notes
- Part I: Multiple Choice
- Part II: Short Answer
Week 2: Where to Find Data
- The data science lifecycle.
- Structured vs unstructured data.
- Common sources of data.
- Common ways to access data.
Week 3: Processing and Cleaning Data
- Elements of the ETL Process
- Processing Tools
- Data Cleaning Considerations for Data Scientists
- Missing Value
- Handling Outliers
- Normalizing Data
Week 4: Statistics and the Stories We Tell Ourselves
- Types of Data
- Useful Statistical Distribution
- Important Summary Statistics
- Independence
- Key Theorems
Week 5: Intro to Linear Models
- What Makes Linear Regression Linear
- Assumptions for Linear Models
- Measuring Performance for Linear Models
Week 6: Regression vs Classification
- Regression vs. Classification
- Logistic Regression
- Measuring Performance for Classification Models