登录注册写文章

03-07 Dyna

03-07 Dyna

https://classroom.udacity.com/courses/ud501/lessons/5326212698/concepts/54629888620923

hallucinate 产生幻觉

Dyna-Q：混合 Model-Free 和 Model-based

image.png

每一次和真实世界的交互，都会自己更新100次。

image.png

T'[s,a,s']: 从状态 s，采取动作 a，到状态 s’的概率
R'[s,a]: 从状态 s，采取动作 a的 reward

image.png

image.png

image.png

根据真实世界发生的次数，更新 T

image.png

练习: How To Evaluate T?

Type in your expression usingMathQuill

a WYSIWYG math renderer that understands LaTeX.

Correction: The expression should be:

Computing transition probabilities using counts

image.png

image.png

image.png

R：模型中的 Reward
r: 真实的立即 reward

image.png

Summary

The Dyna architecture consists of a combination of:

direct reinforcement learning from real experience tuples gathered by acting in an environment,
updating an internal model of the environment, and,
using the model to simulate experiences.

Sutton and Barto.
Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. [web]

Resources

Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In

Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, 1990. [pdf]
Sutton and Barto.

Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. [web]
RL course by David Silver

(videos, slides)
- Lecture 8: Integrating Learning and Planning [pdf]

©著作权归作者所有,转载或内容合作请联系作者
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

推荐阅读更多精彩内容

碎碎念
最近很痛苦啊，因为我的手机充不进电。当初很喜欢的一款手机，我对他的期望值是真的相当高，现在才用了刚过一年，就...
五味lucky阅读 1,794评论 0赞 1
如果感到悲伤，请让自己尽情悲伤
只要是个凡人就会有情绪。有情绪就会有情绪周期，有情绪高点，就会有情绪低谷。在情绪低落的时候，不要任性，尝试做任何...
90后妹纸进化论阅读 3,321评论 2赞 2
日精进打卡
姓名：楼灵芝单位：杭州熙林服饰【日精进打卡第134天】【知~学习】《六项精进》背诵1遍，共570遍；《大...
心镜_8ef4阅读 1,046评论 0赞 0
五分钟搭建App设置页面_纯代码 + 模型思想(OC)
本节主题（Storyboard/模型思想搭建设置页面）源码地址在文章末尾设置（Setting）页面的搭建(实现...
Developer_Yancy阅读 14,072评论 31赞 150
忆秋
你已盛妆而去来不及收走你的妆台清澈的水面如镜彰显出你浓烈的情怀枫如火，桐如金草却苍凉渐隐去风渐凉，菊含...
余梦人生阅读 1,943评论 3赞 10

1赞2赞

赞赏

手机看全文