240 发简信
IP属地:北京
  • Soft Actor Critic

    SAC是一种off-policy方式优化统计策略的算法,来沟通统计策略优化和DDPG-style的方法。 SAC核心的特点是entropy re...

  • deep reinforcement learning:some key concepts

    Action Spaces动作空间,分为离散动作空间(固定的action选项,如围棋)、连续动作空间(实值向量,如video game) Pol...

  • Resize,w 360,h 240
    DDPG和TD3

    DDPG [Deep Deterministic Policy Gradient] Quick facts: off-policy。 只用于...

  • Resize,w 360,h 240
    deep reinforcement learning: Q-learning Continuous action space

    Action a is a continous vector. 离散动作空间:如video game中选择开火、左移等离散的动作。连续动作空间:...

  • Resize,w 360,h 240
    deep reinforcement learning:training a critic

    cirtic state value function a critic don't determine the action, given a...

  • Resize,w 360,h 240
    deep reinforcement learning: Q-learning

    Critic critic evevaluates how good the actor is. Value Function: : usin...

  • Resize,w 360,h 240
    deep reinforcement learning: DQN variant

    Doubel DQN Q function 倾向于高估reward的action。 Double DQN两个Q function,一个用来估计v...

  • Resize,w 360,h 240
    deep reinforcement learning:on-policy off-policy PPO

    on-policy与环境交互的agent和学习的agent是同一个agent off-policy与环境交互的agent和学习的agent是不同...

  • Resize,w 360,h 240
    李宏毅deep reinforcement learning: Policy Gradient

    Policy-based approach learning a actor machine learning for learning an...