
SAC是一种off-policy方式优化统计策略的算法,来沟通统计策略优化和DDPG-style的方法。 SAC核心的特点是entropy re...
Action Spaces动作空间,分为离散动作空间(固定的action选项,如围棋)、连续动作空间(实值向量,如video game) Pol...
DDPG [Deep Deterministic Policy Gradient] Quick facts: off-policy。 只用于...
Action a is a continous vector. 离散动作空间:如video game中选择开火、左移等离散的动作。连续动作空间:...
cirtic state value function a critic don't determine the action, given a...
Critic critic evevaluates how good the actor is. Value Function: : usin...
Doubel DQN Q function 倾向于高估reward的action。 Double DQN两个Q function,一个用来估计v...
on-policy与环境交互的agent和学习的agent是同一个agent off-policy与环境交互的agent和学习的agent是不同...
Policy-based approach learning a actor machine learning for learning an...