Lecture 5

City College, Fall 2018

Intro to Data Science

Week 5: Intro to Linear Models

October 1, 2018

Today's Agenda

Linear Regression
Assumptions for Linear Models
Measuring Performance for Linear Models
Regression vs. Classification
Logistic Regression

Week 4 Recap

Types of Data
Useful Statistical Distribution
Important Summary Statistics
Key Theorems

HW Recap

How was DataCamp?
Project Updates

What are the key relationships from the data sets you've selected?
How might you apply a model to your data?

Data Science Models

What is linear regression?

What is linear regression?

Why do we call this a linear regression?

A model is linear when each term is either a constant or the product of a parameter and a predictor variable.

What are we trying to solve for?

You try.

Demo

Assumptions of Linear Regression

Data is linear in form.
Sample is random.
Error terms have constant variance (homoscedasticity).
Error terms have a mean of zero based on the observed data.
Predictors are independent (no multicollinearity).
Errors are normally distributed.

Data is linear in form.

Sample is random.

Error terms have constant variance (homoskedasticity).

Errors are uncorrelated.

Predictors are independent (no multicollinearity).

in the case of:

Errors are normally distributed.

Output

Key Metrics of Success

R-squared
Adjusted R-squared
Coefficients
P-values

R-Squared

Adjusted R-squared

Coefficients

P-values

xkcd

Does linear regression prove causality?

This Week's Data

Your turn.

Assignment 5: Due Monday, October 15 by 6:30pm

Part I: DataCamp

This week's DataCamp assignment covers two chapters of Supervised Learning with scikit-learn, but does not require the full course. It should be shorter than past DataCamp assignments and will appear in your DataCamp account.

Part II: Linear Regression Practice

The assignment notebook has a few short questions building on today's lecture and data exercise. Please complete the exercises as instructed in each individual notebook cells to obtain full credit for the assignment. To submit your assignment, please submit a completed jupyter notebook (*.ipynb files only) through Blackboard.

Project Proposal: Due Friday, October 19 by 11:59pm

Details on the project are now available here.