overview:
A review text usually contains emotional information, which is very useful for evaluating the quality of a product. In this Project, we will train a model to classify a sentence to 3 sentiment: positive, negative and neutral.
dataset:
We will use the Amazon Customer Reviews Dataset, which is provided from Amazon. This dataset consists of many classes, and we will use book review data from them, which is about 4.4GB in size.This is a link including all the information of those data. There are many attributes in this data set, however we will use mainly the review text as training data.
amazon_reviews_parquet(
marketplace string,
customer_id string,
review_id string,
product_id string,
product_parent string,
product_title string,
star_rating int,
helpful_votes int,
total_votes int,
vine string,
verified_purchase string,
review_headline string,
review_body string,
review_date bigint,
year int)
method:
We will use CNN as main model for classification, which is proved as a powerful model to solve sentiment analysis problem. And we use Glove and SSWE as word embedding, which is also a important part of NLP problem.
plan:
We divided the whole project into 3 part: data process, training model, and evaluating model.
Name | Work |
---|---|
Xi | data process |
Martin | training model |
Ziyuan | evaluating model |