data scientist是个啥--配合datacamp上的课（没有一个编程题）

屏幕快照 2019-06-30 01.12.08.png

比起直接问顾客对你的新产品感兴趣吗。应该问和老产品比起来怎么样，或者问和竞争对手的同质产品比起来，新产品怎么样。

tabular - relational

屏幕快照 2019-06-30 11.42.45.png

这三个再加一个stacked bar chart

用什么做一个dashboard?

be consistent across an organization

cluster就是server

data analytics + machine learning + deep learning 是三个东西

data science workflow: data collection comes first, --> exploration and visualization, then experimentation and prediction（包括分类和预测，a linear regression to estimate也是的。

OKR stands for Objective/Key Result 目标和结果

Add Transparency to Company KPIs：
不是说增加透明度，要理解为增加信息的一目了然。可以做newsletter和dashboard

Persona和Profile
可以有好几个persona/ profile（用户角色、用户画像），卖汽车的比如主妇的需求可以和白领类似，那么他们算一个persona
1 .用户角色不是用户细分
我们关注的是用户如何看待、使用产品，如何与产品互动，这是一个相对连续的过程，人口属性特征并不是影响用户行为的主要因素。根据使用、需求来聚类划分。
可以探索的方向有：动机、能力（被阻碍的点）、环境、来自他人的影响、persona之间的关系（四象限图，又称矩阵图）
2.用户角色不是平均用户
3.用户角色不是真实用户**

regular job: A/B test, dashboard
偷时间的job: ad-hoc requests （一次性的，某个时间节点的，不用更新的）

A/B test

The less sensitive our tests,（Minimum Detectable Effect越小） the smaller the sample size we need.
the baseline conversion increases（和原来比，提升效果更明显）, it becomes easier to reach significance

(ml就是用来做预测的) machine learning is a set of method for making predictions on existing data (with labels and features)

有监督的学习就是有labels & features：
case study: subscribe or churn(订了又退订)（结果就是labels）

training data
find features (影响因素)
得到一个model 先用test data测试
然后做prediction。如果这个人不会churn，那可以把它计入下个月的revenue；如果他要churn，就reach out and offer a special promotion

无监督的学习：例子clustering （聚类）
无监督学习use data only with features 比如customer category

select features define # of clusters
use clusters to solve business problems (在联系前面有监督的学习，对不同的clusters可以分别做预测~)

Special topics in ML

time series forecasting (把时间看做一个feature，有监督的学习)
商业问题像 seasonality 就是时间序列预测的一个应用
Topic 2: NLP 自然语言处理 (根据具体问题，从text提取信息，去创造features)

data-intensive的事情machine learning做不了，要用deep learning做（涉及到神经元）。一般应用于{{ language learning（比如从会议纪要中自动总结概要）和 image classification }}
Anything involving a physical device is probably an IoT problem 物联网

Deep Learning and Explainable AI:
Deep Learning 又叫做神经网络,

It requires much much more data than traditional ML.
Best used in less structured input 比如文本和图像
can give high accurate predictions (predictive)。但人们不知道，为什么做出这个假设来。（DL 缺乏解释力度，用来讲清楚what）
网路说的 Deep learning能做一切数据挖掘有关的事情

Explainable AI 可以讲清楚影响因素，讲清楚why leads to such results。传统ML 也可以解释为啥。AI 让计算机自己动手，ML 训练计算机动手。
又有人说 DL is part of ML is part of AI

data scientist是个啥--配合datacamp上的课（没有一个编程题）

推荐阅读更多精彩内容