登录注册写文章

2019-10-18

2019-10-18

https://www.kaggle.com/kernels/scriptcontent/20478888/data
梯度积累。因为gpu内存限制，更新梯度需要积累到几轮，然后统一做一次。
https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255

1: loss = loss / self.accumulation_steps , the loss is divided by self.accumulation_steps , I think there is no need to do that , why divided it?

2: if (itr + 1 ) % self.accumulation_steps == 0: self.optimizer.step() self.optimizer.zero_grad()
The gradient was not been update immediately , is it due to the very small batch_size ?

That's part of gradient accumulation, so that 32 means we will be adding loss for 32 samples and then we'll do backpropogation. In this way we can train models with batch size 32 even though the GPU memory constraints don't allow us to use 32 samples at a time. Check this out:

©著作权归作者所有,转载或内容合作请联系作者
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

推荐阅读更多精彩内容

rljs
rljs by sennchi Timeline of History Part One The Cognitiv...
sennchi阅读 7,452评论 0赞 10
Snail的ScalersTalk第五轮新概念朗读持续力训练Day8 2019-10-18
• 标题： Snail的ScalersTalk第五轮新概念朗读持续力训练Day8 2019-10-18 • 正文...
蜗牛笔耕阅读 288评论 0赞 0
2019-10-18 褪黑素
When you take a walk across the vitamin aisle of any groc...
Berry521阅读 184评论 0赞 0
记一次难忘的登山经历
这一次难忘的登山经历，要从跟XF的分手经历讲起了。我所知道的是即使我们后来真的分开了，那么肯定不仅仅是因为这一件事...
honeysstory阅读 824评论 0赞 2
7月26号倒车撞到宝马
1 由于不熟悉道路，进错岗亭，倒车与宝马相撞。 2 为什么会追尾，最主要原因还是自己的水平不行。对车身位置把握不...
何何的进化阅读 511评论 0赞 0

赞1赞

赞赏

手机看全文