Week3_Clean/Filter Data and Make Plots

After retrieving web data and storing them using MongoDB (pymongo), we are considering to clean or format the data in a certain consistent data, filter the data using "pipeline", and make the plot using "chart" module. All the coding was performed in Jupyter Notebook.

  1. Create a new collection, and transfer the retrieved data (.json format) to the new data collection and make a copy for that collections using either mongo shell or cmd:
hw3_1.png
hw3_2.png
  1. Below is the link for the code on how to show the top 3 posted categories in one selected zone:
    https://anaconda.org/tangli666/week3_hw_v2/notebook
hw3_3.png
  1. Below is the link for the code on how to show the relationship between the item condition and the average price:
    https://anaconda.org/tangli666/week3_hw_v10/notebook
    Note: in order to filter and format the 'price', some modification was made and update to a the new collection:
    """
    for i in item_info.find():
    try:
    price = int(i['price'].split(' ')[0])
    except ValueError:
    price = 0
    item_info.update({'_id':i['_id']},{'$set':{'price':price}})
    """
hw3_4.png
  1. Last, the command line for exporting the data collection to a csv file:
hw3_5.png
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容