Week 11: NLP, Text as Data, and Bayes Rule
November 18, 2019Due this Saturday:
Models with multicollinearity can still be useful.
Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages.
Map words
to feature vectors.
Advantages:
Disadvantages:
How to calculate?
Make some assumptions.
The conditional independence assumption: For conditional independence to hold true, we're assuming no words are more likely to appear with each other than any others.
Examples: "hot" and "dog", "ball" and "game", harry" and "potter", "computer" and "science"
For more, check out the Stanford parser and named entity recognizer and this interactive topic modeling explorer.
Multiple Choice | Short Answer | Exam Total | |
---|---|---|---|
Points Possible | 70 | 66 | 136 |
Mean | 56 | 48 | 104 |
Median | 57 | 49 | 104 |
Std Dev | 5.4 | 11.9 | 16.0 |