Week 10: Performance Evaluation and Ensemble Models

November 12, 2018- Project Notes
- Model Fit
- Ensemble Models

*Democratic strategist David Axelrod predicted that poll performance “is going to prompt another round of soul-searching about whether and how you can poll accurately, because a lot of these races that were blowouts tonight or apparently blowouts tonight polled as tough races.” *

*Democratic pollster Andrew Baumann called the pre-election polls “quite accurate, particularly for a midterm that ended up being totally different than any previous midterm.” *

Actual Dem | Actual Rep | |
---|---|---|

Predicted Dem | 215 | 9 |

Predicted Rep | 16 | 195 |

Based on 538 deluxe model for 11/05/18.

Rank all observations by the predicted probability class, and chart the cummulative share of actual true values captured by the first x observations, where x ranges from 1 to the total number of observations.

*Demonstrates model's ability to outperform other (random) choices at positive prediction across decision thresholds.*

For more details.

Plot the true positive rate against the false positive rate at every possible threshold from highest to lowest.

*Demonstrates model's ability to outperform other (random) choices across decision thresholds while weighing false positives against false negatives.*

For more details.

Total volume of area under the ROC curve.

Sci-kit Learn can calculate this for you.- Logistic Regression
- Decision Trees
- Random Forest
- Gradient Boosting Machine
- Naive Bayes

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.

*Random Forests construct a multitude of decision trees at training time and outputting the class that is the mode of the predicted classes among each tree.*

For more details.

And a nice overview of everything.

*Gradient Boosting produces a prediction model in the form of an ensemble of weak prediction models and then generalizes them by allowing optimization of an arbitrary loss function.*

For more details.