Analyzing product sentiment

In this module ,we focused on classfiiers,applying them to analyzing product sentiment,and understanding the types of errors a classifier makes. We also built an exciting Ipython notebook for analyzing the sentiment of real product reviews.
In this assignment, we are going to explore this application further, training a sentiment analysis model using a set of key polarizing words, verify the weights learned to each of these words, and compare the results of this simpler classifier with those of the one using all of the words. These technniques will be a core component in your capstone project.
Follow the rest of the instructions on this page to complete your program. When you are done, insdead of uploading your code, you will answer a series of quiz quesions (see the quiz after this reading) to document your completion of this assignment. The instructions will indicate what data to collect for answering the quiz.

Learning outcomes

Execute sentiment analysis code with the IPython notebook
Load and transform real,text data
Using the .apply() function to create new columns(features) for our model
Compare results of two models,one using all words and the other using a subset of the words
Compare learned models with majority class prediction
Examine the predicions of a sentiment model
Build a sentiment analysis model using a classifier

Resources oyou will need

You will need to install the software tools or use the free Amazon EC2 machine . Instructions for both options are provided in the reading for Module 1.

Download the data and starter code

Before getting started ,you will need to download the dataset and the starter IPython notebook that we used in the module

Download the product review dataset here in SFrame format

What you will do

Now you are ready! We are going do four tasks in this assignment.There are several results you need to gather along the way to enter into the quiz afer this reading.
In the Ipython notebook above,we used the word counts for all words in the reviews to train the sentiment classifier model.
Now ,we are going to follow a similar path, but only use this subset of the words:

Often,ML practitioners will throw out words they consider "unimportant" before training their model. This procedure can often be helpful in terms of accuracy. Here ,we are going to throw out all words except for the very few above. Using so few words in our model will hurt our accuracy,but help us interpret what our classifier is doing.