海街diary - 简书

IP属地：广东

策略迭代(Policy Iteration)
1. 策略迭代算法：初始化.策略评估：(一般而言，下式中为固定策略由于策略更新)策略更新：如果与上次迭代相比没有变化，则停止；否则，转回2。 ...

0.1 10559 0 1
Model-based RL
注：以下内容基于CS598. 1. Estimate Model 给定数据集, 采用极大似然对模型进行估计。用表示的样本数。 2. Analys...

4673 0 0

CS598 Homework 1
Question 1 Solution Thus, although there exists constant , it doesn't af...

3030 0 0
统计强化学习(Statistical Reinforcement Learning)
UIUC的Prof. Nan Jiang(姜楠)新开了一门Statistical Reinforcement Learning，课程主页在这里。...

5136 0 0
增强学习资源
增强学习是机器学习的一个重要分支。如果您已经具备机器学习、深度学习的基本知识，可以直接上手增强学习，这里推荐UC Berkerly的cs294...

1607 0 0
马尔科夫决策过程解法(Solution to MDP)
1. 马尔科夫决策过程马尔科夫决策过程(Markov Decision Process) 是一个由4个元素组成的元祖组成。为状态; 为动作;...

7272 0 0
Introduction to Coordination in Multi-Agent Reinforcement Learning
It is a fact that we live in a world involving interaction with others, ...

2675 0 0

Papers of Multi Agent Reinforcement Learning(MARL)
Papers in Multi-Agent Reinforcement Learning(MARL) This is my paper list...

0.1 4955 0 1
Notes of Reinforcement Learning in ICML 2018
ICML-2018 Seminar Notes There is a productive and meaningful seminar in ...

3593 0 0