bc5082fefa31 - 简书

IP属地：北京

Soft Actor Critic
SAC是一种off-policy方式优化统计策略的算法，来沟通统计策略优化和DDPG-style的方法。 SAC核心的特点是entropy re...

4019 0 0
deep reinforcement learning：some key concepts
Action Spaces动作空间，分为离散动作空间（固定的action选项，如围棋）、连续动作空间（实值向量，如video game） Pol...

1345 0 0

DDPG和TD3
DDPG [Deep Deterministic Policy Gradient] Quick facts： off-policy。只用于...

0.1 7235 0 1
deep reinforcement learning: Q-learning Continuous action space
Action a is a continous vector. 离散动作空间：如video game中选择开火、左移等离散的动作。连续动作空间：...

1859 0 0
deep reinforcement learning:training a critic
cirtic state value function a critic don't determine the action, given a...

1566 0 0
deep reinforcement learning: Q-learning
Critic critic evevaluates how good the actor is. Value Function： : usin...

1833 0 0
deep reinforcement learning: DQN variant
Doubel DQN Q function 倾向于高估reward的action。 Double DQN两个Q function，一个用来估计v...

1553 0 0

deep reinforcement learning:on-policy off-policy PPO
on-policy与环境交互的agent和学习的agent是同一个agent off-policy与环境交互的agent和学习的agent是不同...

3214 0 0
李宏毅deep reinforcement learning： Policy Gradient
Policy-based approach learning a actor machine learning for learning an...

2383 0 0