### Intro to Data Science

Week 5: Intro to Linear Models

March 11, 2019

###### Week 4 Recap
1. Types of Data
2. Useful Statistical Distribution
3. Important Summary Statistics
4. Key Theorems
###### Course Schedule
1. Project Proposals: due March 18, details here
2. Midterm: March 25
• Drawn primarily from course lectures
• Mix of multiple choice and short answer
• Similar to fall's exam, but longer in content and time
3. Project Update: April 29
4. Project Deadline & Presentation: Wednesday May 15
###### Today's Agenda
1. Modeling Theory
2. Linear Regression
3. Assumptions for Linear Models
4. Measuring Performance for Linear Models

#### Occam's Razor "Entities should not be multiplied unnecessarily."

#### Occam's Razor Simple models are preferable over more complex models.

###### George Box All models are wrong...

but some are useful.

###### Data Science Models What is linear regression? What is linear regression? Why do we call this a linear regression? A model is linear when each term is either a constant or the product of a parameter and a predictor variable.

What are we trying to solve for?  Demo
###### Assumptions of Linear Regression
1. Data is linear in form.
2. Sample is random.
3. Error terms have constant variance (homoscedasticity).
4. Error terms have a mean of zero based on the observed data.
5. Predictors are independent (no multicollinearity).
6. Errors are normally distributed.
###### Data is linear in form. ###### Sample is random. ###### Error terms have constant variance (homoskedasticity). ###### Errors are uncorrelated.  ###### Predictors are independent (no multicollinearity).

in the case of:  ###### Errors are normally distributed. ###### Output ###### Key Metrics of Success
1. R-squared
3. Coefficients
4. P-values
###### R-Squared  ###### Coefficients ###### P-values xkcd
###### Does linear regression prove causality? 