DS Interview Question--Cross Validation

Q: What is cross validation? How to do it right?

A:

Cross-validation is a technique to evaluate predictive models and estimate how accurately it will perform in practice, by partitioning the original sample into a training set and a validation set.

K-fold CV Steps:

1. Split dataset into training dataset and test dataset;

2. Leave test dataset aside and partition training dataset equally into k set;

3. For k = 1,2,...,K, fit the model with (k-1) sets and calculate the test error rate with k-th set, repeat this step for k times;

4. Calculate the average of prediction errors calculated by validation dataset, and take it as the estimate of model performance;

5. Select the model with lowest prediction error and train the model on the whole training dataset.


Interview questions are from DataAppLab (Wechat: Datalaus)

Jun.28th, 2017  Seattle

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容