2021-04-27

Hands-On: Explore Your Data

In the last hands-on lesson, you imported your first dataset into a Dataiku DSS project. Returning to that project, let’s explore that dataset.

The Explore tab of a dataset provides a tabular view of your data where you can start to examine it.

Sampling

In the Sampling concept video, we learned how Dataiku DSS only shows a sample of the dataset when you are working interactively with it.

To see the sample settings of a dataset, near the top left of the page, click Configure sample, which opens a panel on the left.

By default, the sample in the Explore tab includes the first 10,000 records of the dataset.

Storage Type and Meaning

Beneath each column name is the storage type and meaning.

Dataiku DSS detects a meaning of “Integer” for customer_id, based upon the fact that most of values of customer_id are integers. The gauge shows red for the few values that do not match this meaning, which allows us to determine whether these values are truly invalid customer IDs, or, as is the case here, Integer is too restrictive a meaning for customer_id.

Click on the meaning and update it to Text. Now the gauge for customer_id is entirely green.

Note

In this dataset, we do not have any missing values. But if we did, they would be represented by the color gray in the data quality bar.

Charts

You can use charts to explore a dataset. For example, we might want to know how often each type of t-shirt is ordered.

  • Click on the Charts tab.
  • From the panel on the left, drag and drop Count of records as the Y variable.
  • Drag and drop tshirt_category as the X variable.

Dataiku DSS shows a column chart of Count of records by tshirt_category for the current sample.

The chart reveals that the values of tshirt_category are not consistently recorded. Sometimes black shirt color is recorded as “Black”, and sometimes as “Bl”. Similarly, white shirts are sometimes recorded as “White” and sometimes as “Wh”.

What’s next?

Congratulations! You’ve created your first project, imported your first dataset, and created your first chart. In the next hands-on lesson, we’ll handle these issues with a Prepare recipe.

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • Explore Your Data 概念概述 Schema/架构 When we upload a dataset...
    LONG_7阅读 3,283评论 0 0
  • 我是黑夜里大雨纷飞的人啊 1 “又到一年六月,有人笑有人哭,有人欢乐有人忧愁,有人惊喜有人失落,有的觉得收获满满有...
    陌忘宇阅读 12,772评论 28 53
  • 信任包括信任自己和信任他人 很多时候,很多事情,失败、遗憾、错过,源于不自信,不信任他人 觉得自己做不成,别人做不...
    吴氵晃阅读 11,377评论 4 8
  • 怎么对待生活,它也会怎么对你 人都是哭着来到这个美丽的人间。每个人从来到尘寰到升入天堂,整个生命的历程都是一本书,...
    静静在等你阅读 10,491评论 1 6

友情链接更多精彩内容