### Intro to Data Science

Week 6: Regression vs. Classification

October 15, 2018

###### Today's Agenda
1. Regression vs. Classification
2. Logistic Regression
3. Measuring Performance for Classification Models
4. Midterm Recap
###### Week 5 Recap
1. Linear Regression
2. Assumptions for Linear Models
3. Measuring Performance for Linear Models
###### HW Recap
1. How was DataCamp?
2. Data Exercise
• Was the exercise clear?
• Why would it be problematic to model with two collinear predictors?
• How would you apply linear regression to your project data?
###### Data Science Models ###### Generalized Linear Models

A flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution.

###### Generalized Linear Models ###### Regression vs. Classification Is this a good forecast?

Regression analysis estimates the conditional expectation of the dependent variable given the independent variables.

Classification is the problem of identifying to which of a set of categories a new observation belongs.

###### Regression vs. Classification ###### Logistic Regression ###### Logistic Regression  ###### Regression vs. Classification ###### Logistic Regression Output Logisitic regression and many other classification models output a continuous value between 0 and 1.

###### Measuring Classification Performance
1. Confusion Matrix
2. Precision
3. Recall
4. Accuracy
(as explained by the zombie apocalypse)
###### Confusion Matrix ###### Precision zombie apocalypse use case: you're hunting zombies, and you need to kill as many zombies as possible without killing any humans

###### Recall zombie apocalypse use case: you discover a cure for zombies, but can only apply it k infected people

###### Accuracy zombie apocalypse use case: zombies have infected roughly half the population, and you're throwing them a party. you are putting together an invite list and want to make sure you invite an equal amount of zombies and humans.

###### Wrap Up
1. Linear Regression
2. Assumptions for Linear Models
3. Measuring Performance for Linear Models
4. Regression vs. Classification
5. Logistic Regression
###### Midterm: October 22, 6:30pm
• 45 Minute Written Exam
• No computer needed, closed book, closed notes
• Part I: Multiple Choice
Review
###### Week 2: Where to Find Data
• The data science lifecycle.
• Structured vs unstructured data.
• Common sources of data.
• Common ways to access data.
###### Week 3: Processing and Cleaning Data
• Elements of the ETL Process
• Processing Tools
• Data Cleaning Considerations for Data Scientists
• Missing Value
• Handling Outliers
• Normalizing Data
###### Week 4: Statistics and the Stories We Tell Ourselves
• Types of Data
• Useful Statistical Distribution
• Important Summary Statistics
• Independence
• Key Theorems
###### Week 5: Intro to Linear Models
• What Makes Linear Regression Linear
• Assumptions for Linear Models
• Measuring Performance for Linear Models
###### Week 6: Regression vs Classification
• Regression vs. Classification
• Logistic Regression
• Measuring Performance for Classification Models