Ultimate Skill Checklist For Data Analyst
Contents
- Programming
- Statistic
- Mathematics
- Machine Learning
- Data Wrangling
- Communication and Data Visualization
- Data Intuition
Programming
-
Python programming language
- [ ] numpy
- [ ] pandas
- [ ] matplotlib
- [ ] scipy
- [ ] scikit-learn
-
R programming language
- [ ] ggplot2
- [ ] dplyr
- [ ] ggally
- [ ] reshape2
-
Optional
- [ ] ipython
- [x] ipython notebook
- [ ] anaconda
- [ ] ggplot
- [ ] seaborn
- [ ] Spreadsheet tools (like Excel)
-
Additional Skills
- [ ] Javascript and HTML for D3.js
- [ ] D3.js
- [ ] AJAX implementation
- [ ] jQuery
- [ ] C/C++ or Java
- [ ] Javascript and HTML for D3.js
Statistic
- Descriptive and Inferential statistics
- [x] Mean, median, mode
- [ ] Data distributions
- [ ] Standard normal
- [ ] Exponential/Poisson
- [ ] Binomial
- [ ] Chi-square
- [ ] Standard deviation and variance
- [ ] Hypothesis testing
- [ ] P-values
- [ ] Test for significance
- [ ] Z-test, t-test, Mann-Whitney U
- [ ] Chi-squared and ANOVA testing
- Experimental design
- [ ] A/B Testing
- [ ] Controlling variables and choosing good control and testing groups
- [ ] Sample Size and Power law
- [ ] Hypothesis Testing, test hypothesis
- [ ] Confidence level
- [ ] SMART experiments: Specific, Measurable, Actionable, Realistic, Timely]
Mathematics
- [x] Translate numbers and concepts into a mathematical expression: 4 times the square-root of one-third of a gallon of water (expressed as g): 4 √(1/3 g)
- [x] Solve for missing values in Algebra equations: 14 = 2x + 29
- [ ] How does the 1/2 value change the shape of this graph?
- [ ] �Linear algebra and Calculus
- [ ] Matrix manipulations. Dot product is crucial to understand.
�- [ ] Eigenvalues and eigenvectors -- Understand the significance of these two concepts - [ ] Multivariable derivatives and integration in Calculus
Machine Learning
- Supervised Learning
- [ ] Decision trees
- [ ] Naive Bayes classification
- [ ] Ordinary Least Squares regression
- [ ] Logistic regression
- [ ] Neural networks
- [ ] Support vector machines
- [ ] Ensemble methods
- Unsupervised Learning
- [ ] Clustering Algorithms
- [ ] Principal Component Analysis (PCA)
- [ ] Singular Value Decomposition (SVD)
- [ ] Independent Component Analysis (ICA)
- Reinforcement Learning
- [ ] Qlearning
- [ ] TD-Learning
- [ ] Reinforcement Learning
Data Wrangling
-
Python
- [ ] Learn about Python String library for string manipulations
- [ ] Parsing common file formats such as csv and xml files
- [ ] Regular Expressions
- [x] Mathematical transformations
- [x] Convert non-normal distribution to normal with log-10 transformation
- Database systems (SQL-based and NO SQL based) - Databases act as a central hub to store information
- [ ] Relational databases such as PostgreSQL, mySQL, Netezza, Oracle, etc.
- [ ] Optional: Hadoop, Spark, MongoDB
- [x] SQL
Communication and Data Visualization
- [ ] Understand visual encoding and communicating what you want the audience to take away from your visualizations
- [ ] Programming
- [ ] matplotlib
- [ ] ggplot
- [ ] d3.js
- [ ] Presenting data and convincing people with your data
- [ ] Know the context of the business situation at hand with regards to your data
- [ ] Make sure to think 5 steps ahead and predict what their questions will be and where your audience will challenge your assumptions and conclusions
- [ ] Give out pre-reads to your presentations and have pre-alignment meetings with interested parties before the actual meeting