DS 面试之 Take Home Chellenge

image.png

在数据科学家面试环节，最艰难的题型莫过于 Take Home Chellenge。面试官会给 3~ 48 小时的时间，让你做一个数据分析，提交代码和分析报告。这是一个让人筋疲力尽的过程，为了让大家更好的完成这部分面试，这里给出了一些 Guildline。

1. 争取把题目换掉

问一下面试官，能不能换成别的类型的面试。有的公司，会随机出题目，是有可能换成别的面试类型的。这种 Take Home Chellenge 自然是能避开就避开。

2. 询问评分规则和面试官的期待

办事不由东，累死也无功。如果能够知道，哪怕一点点，关于目标的信息，会对整个分析过程有极大的帮助。 Information is power。

参考 Email

image.png

3. 问问题，陈诉你的假设

如果发现数据有问题，或者不清楚的地方，及时问。如果来不及，要在分析报告中写出你的假设。比如，你发现数据有问题，如果继续分析下去，必须依赖于某种假设，那么就需要写出来。

另外，还可以写出你的limitation：为什么不用更高级的方法处理 missing value ？为什么不用 TensorFlow 而用 XGBoost ？也许你是时间不够，或者硬件不够强大，这些都要写出来，让面试官知道，你可以做的更好。

4. 按照套路来分析

以一个数据分类任务来说，可能包括如下一些部分

Data Cleaning 数据清洗
Minimal feature selection 选出要用的Feature
Impute missing value 处理缺失值
Create a classification pipeline 创建数据处理管道
Training the model 训练模型
Tune hyperparameters with grid-search 超参数选择

5. 代码易读

一般面试官是要看代码的，代码不要一团糟，要易读，要有注释。

整个工程的目录结构可以参考这个

GitHub

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- Make this project pip installable with `pip install -e`
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

DS 面试之 Take Home Chellenge

DS 面试之 Take Home Chellenge

1. 争取把题目换掉

2. 询问评分规则和面试官的期待

3. 问问题，陈诉你的假设

4. 按照套路来分析

5. 代码易读

6. 要有测试和注释

7. 一个少于500字的总结

参考资料

推荐阅读更多精彩内容

DS 面试之 Take Home Chellenge

1. 争取把题目换掉

2. 询问评分规则 和 面试官的期待

3. 问问题，陈诉你的假设

4. 按照套路来分析

5. 代码易读

6. 要有测试和注释

7. 一个少于500字的总结

参考资料

推荐阅读更多精彩内容

2. 询问评分规则和面试官的期待