240 发简信
IP属地:广东
  • Resize,w 360,h 240
    策略迭代(Policy Iteration)

    1. 策略迭代算法: 初始化.策略评估:(一般而言,下式中为固定策略由于策略更新)策略更新:如果与上次迭代相比没有变化,则停止;否则,转回2。 ...

  • Model-based RL

    注:以下内容基于CS598. 1. Estimate Model 给定数据集, 采用极大似然对模型进行估计。用表示的样本数。 2. Analys...

  • Resize,w 360,h 240
    CS598 Homework 1

    Question 1 Solution Thus, although there exists constant , it doesn't af...

  • Resize,w 360,h 240
    统计强化学习(Statistical Reinforcement Learning)

    UIUC的Prof. Nan Jiang(姜楠)新开了一门Statistical Reinforcement Learning,课程主页在这里。...

  • 增强学习资源

    增强学习是机器学习的一个重要分支。 如果您已经具备机器学习、深度学习的基本知识,可以直接上手增强学习,这里推荐UC Berkerly的cs294...

  • Resize,w 360,h 240
    马尔科夫决策过程解法(Solution to MDP)

    1. 马尔科夫决策过程 马尔科夫决策过程(Markov Decision Process) 是一个由4个元素组成的元祖组成。 为状态; 为动作;...

  • Resize,w 360,h 240
    Introduction to Coordination in Multi-Agent Reinforcement Learning

    It is a fact that we live in a world involving interaction with others, ...

  • Papers of Multi Agent Reinforcement Learning(MARL)

    Papers in Multi-Agent Reinforcement Learning(MARL) This is my paper list...

  • Resize,w 360,h 240
    Notes of Reinforcement Learning in ICML 2018

    ICML-2018 Seminar Notes There is a productive and meaningful seminar in ...

个人介绍
Do you know that the hard thing and the right thing are the same thing?