这篇文章本来有个前言:170528 吐槽|不算顺利的半年自学,
因为太啰嗦,剔出去单独成篇。
一篇调查
公司阶段与数据分析地位/价值
sample project
平时看到题目有趣,分析手段有可取之处的project就放到这里来。
build a portfolio
Build a personal blog to hold your projects, using Python library pelican, see here
build project motivator
- from dataquest
- from analytics vidhya
- from quora
- competition websites
Learning path
-
What classes should I take if I want to become a data scientist?
- **The top voted answer by Rahul Agarwal offers an excellent course list of related topics: Mathematics, Statistics, Computer Science, Machine Learning and Distributed and parallel computing. **
-
罗文益的回答 中文世界的读书狂魔。
在一众“商业数据运营”相关的学习路径中发现这个回答,很对胃口。然而缺点:- 一,纯CS的人写的,对数学,统计,算法没涉及多少
- 二,我估计我不太想找这些中文书
万变不离其宗,内容大概: - 计算机原理:基本原理,数据库原理
- 计算机应用:数据获取和清洗
- 算法:概率,统计,数据挖掘,算法
-
youtube video: How I'm Learning AI and Machine Learning
Below are learnt in sequence.- 数学:
MIT OCW
Single Variable Calculus
multivariate calculus
linear algebra - AI:
MIT OCW Artificial Intelligence - Advanced mathematics
Numerical Analysis with Justin Solomon
- 数学:
-
What does it take to be a data scientist?
- Answer of Florian Goossens tells a view of programming skill:
- Junior DS: A high-level rapid prototyping language such as Python or R. I recommend python very strongly.
- Data Scientist: A low-level deployment language such as Java, C++, C#, etc.
- Senior Data Scientist: A scalable/Big Data language such as Scala/Spark
- Answer of Florian Goossens tells a view of programming skill:
-
What's the best way to learn data science as a beginner?
- Answer of Karlijn Willems mentioned some source from datacamp, along with a nice picture illustrating each step. But not as well organized as answer of **Rahul Agarwal **
what matters in working scenario
-
What Kaggle has learned from almost a million data scientists - Anthony Goldbloom (Kaggle)
Kaggle steps:- Understand data, EDA
- Come up with features(often matter more than which algorithm to choose)
- Feed data to trainning algotithm
* Excel
专栏上写了三篇Excel的文章,比较简单,大体介绍了Excel应用,可以作为职场新人的指南。
第一篇数据分析—函数篇。主要简单讲解常用的函数,以及与之对应的SQL/Python函数。
第二篇数据分析—技巧篇。主要简单讲解我认为很有新价比的功能,提高工作效率。
第三篇数据分析—实战篇。主要将前两篇的内容以实战方式进行,简单地进行了一次数据分析。数据源采用了真实的爬虫数据,是5000行数据分析师岗位数据。
分析思维
- 金字塔原理 (豆瓣)**,
- XMind中文网站**
- 如何培养麦肯锡式的分析思维。
- 如何建立数据分析的思维框架。
- 这里送三条金句:
一个业务没有指标,则不能增长和分析
好的指标应该是比率或比例
好的分析应该对比或关联。
举一个例子:我告诉你一家超市今天有1000人的客流量,你会怎么分析?
这1000人的数量,和附件其他超市比是多是少?(对比)
这1000人的数量比昨天多还是少?(对比)
1000人有多少产生了实际购买?(转化比例)
路过超市,超市外的人流是多少?(转化比例)
这是一个快速搭建分析框架的方法。如果只看1000人,是看不出分析不出任何结果。
comparison of training alternatives
How do I become a data scientist? An evaluation of 3 alternatives
An article comparing Master's degree, bootcamp, and MOOC, well-illustrated with stories under each circumstance.
Leave these to another day
Look into these links when have time.
- I completed CS109 course from Harvard, which other courses other than CS109 shall I take for a career as a data scientist?
- How can I become a data scientist?
-
How do I get a job as a data scientist if I have no prior experience as a data scientist?
- Josh Devlin suggest the Dataquest style --- do the real project.
- Are data science certificates worth it?
- What is your review of Coursera Data Science Specialization Track?
- Is there a data science bootcamp for someone with previous experience in the field?
- William Chen's profile and his data-science-related answers
- If I'm not a maths savant should I bother learning data science or will the hardcore maths people get all of the data science jobs?
- I am choosing between masters in data science on King's College London or Berkeley online masters. Which you would recommend?
- Which data science career track course should I take for career transition?
- Can you pay to get bad boot camp reviews removed from Course Report?
- Is a master in Data Science worth it?
- Which are some good websites to learn more about Data Science?
- 如何在业余时学数据分析?
- 你是如何走上数据分析之路的?
- 比较喜欢数据分析方向,求问比较热门的专业的就业前景是什么呢?
- (国内211、985高校毕业,市场营销专业,现有的知识技能基础上,是否适合做数据分析师?)[https://www.zhihu.com/question/28021598/answer/40279672]
- 做数据分析不得不看的书有哪些?
other findings
-
try_jupyter
Official introduction to use R, hackshell, Python, Spark with Python, Spark with Scala with jupyter notebook from jupyter.org. -
Which are some good websites to learn more about Data Science?
Seems to me all answers are good but take time to explore and absorb. - comparison of schools' data science program. Quora might be helpful when I'm ready to pursue a master's degree.
- Here's one concerning UC Berkely
-
Class Central
A counterpart of 果壳MOOC, maybe it's more active. It also comes with a recommend system, which might be a nice thing to have. - (GrowingIO)[https://www.growingio.com/]
前 LinkedIn 商业分析高级总监张溪梦,把数据分析后台化承包了,捂脸
20170818 discard info: certifications
--------
-
If I want to get into Data Analytics, what type of certifications should I get?
- option mentioned:
- Python/R
- excel
- SQL
- Tableau
Tableau is a data visualization software that lets you import data from multiple data sources, like Excel and SQL, and visualize that data through the use of interactive dashboards and charts. Tableau offers an in-depth training for their products and has a certification exam. This would absolutely look good.
- resource mentioned:
- k2datascience. Mainly focus on data analyst training.
- option mentioned:
20170817 discard info: about "best course on edx"
--------
20170817后记
删掉这一段是因为:
-
这个提问的方向就错了。
- 其一,局限范围在edx。
- 其二,所谓的“best course”无法定义,什么叫好?
正确思路难道不是先看所需知识结构,然后按照知识内容去选择学习材料吗?
MITx 15.071x并不对我胃口
---------
What are the best data science courses on edX?
- Anilkumar Panda mention MITx: 15.071x The Analytics Edge. I see this course at least 5 times in several answers from other questions.
- Anant Agarwal mention Data Science for Executives, esp one of its component course Statistical Thinking for Data Science and Analytics
- Arjun Narayanan mention Microsoft DAT210x Course Info | edX
20170714 discard info: about harvard cs109
--------
20170714后记
因为谜之讨厌这门课的风格:
- 不完全公开、视频卡
- 真人课堂实录、动辄几小时浪费时间
- 口水话太多
- somehow overwhelming,上了三天就致郁(0529-0601)
赶紧放弃。
-------- 2015 video too slow --------
-------- 2016 CS109a --------
Available on canvas.harvard.edu, many page require student ID, I somehow can view some core page without an ID.
-
These three pages are organized in chronological order, acts like a good reminder of what to do next:
-
lecture and project material
Others, like homework and quizzes, most of the time are unavailable.