Data Wrangling Summary

Gather

Depending on the source of your data, and what format it's in, the steps in gathering data vary.

High-level gathering process: 

    obtaining data (downloading a file from the internet, scraping a web page, querying an API, etc.) 

    importing that data into your programming environment (e.g., Jupyter Notebook).

Assess

Assess data for:

Quality: issues with content. Low quality data is also known as dirty data.

Tidiness: issues with structure that prevent easy analysis. Untidy data is also known as messy data. Tidy data requirements:

    Each variable forms a column.

    Each observation forms a row.

    Each type of observational unit forms a table.

Types of assessment:

    Visual assessment: scrolling through the data in your preferred software application (Google Sheets, Excel, a text editor, etc.).

    Programmatic assessment: using code to view specific portions and summaries of the data (pandas' head, tail, and info methods, for example).

Clean

Types of cleaning:

    Manual (not recommended unless the issues are single occurrences)

    Programmatic

The programmatic data cleaning process:

    Define: convert our assessments into defined cleaning tasks. These definitions also serve as an instruction list so others (or yourself in the future) can look at your work and reproduce it.

    Code: convert those definitions to code and run that code.

    Test: test your dataset, visually or with code, to make sure your cleaning operations worked.

Always make copies of the original pieces of data before cleaning!

Reassess and Iterate

After cleaning, always reassess and iterate on any of the data wrangling steps if necessary.

Store (Optional)

Store data, in a file or database for example, if you need to use it in the future.


From Udaicty Advanced Data Analyst Nanodegree

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,424评论 0 10
  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,811评论 0 23
  • 最近一周起得最晚的一天里,梦到了很奇怪的景象,像是河岸,黑灰色的雾气压在水面上,水天相接的地方,岸边行走的人的影子...
    蔓忱阅读 179评论 0 0
  • 文 | 晨光花开 -1- 坐在沙发上,看着窗外的艳阳天。 空气淡淡的飘着,我想象着外面的天空如同抖音内他人拍摄的上...
    晨光花开阅读 577评论 0 1
  • 导师苏建新 第一次用思维导图做自我介绍
    燕子_79b6阅读 232评论 0 0