Best Paper Award, ICRA 2019
[pdf] [site] [ppt]
摘要
在非结构环境中,多接触操作任务(Contact-rich manipulation tasks)通常同时需要触觉和视觉反馈,通常手动设计结合不同特征的模式的控制器并非易事。深度强化学习(DRL),在高维输入下学习控制策略已经取得了成功,由于样本复杂性,这些算法通常难以在真实机器人上部署。我们使用自监督,去学习传感器数据的紧凑和多模态表示,然后可以用来提高我们的策略学习的样本效率。我们在栓钉插入任务上评估了我们的方法,对不同的几何形状,配置和间隙进行推广,同时对外部扰动具有鲁棒性。我们在仿真和真实机器人上呈现效果。
介绍
Fig. 1: Force sensor readings in the z-axis (height) and visual observations are shown with corresponding stages of a peg insertion task. The force reading transitions from (1) the arm moving in free space to (2) making contact with the box. While aligning the peg, the forces capture the sliding contact dynamics on the box surface (3, 4). Finally, in the insertion stage, the forces peak as the robot attempts to insert the peg at the edge of the hole (5), and decrease when the peg slides into the hole (6).
主要贡献有:
- 多模态表示学习模型,可以从中学习多接触操作策略。
- 插入任务的示范,有效地利用触觉和视觉反馈进行孔搜索,栓钉对齐和插入(参见Fig.1)。烧蚀研究比较了每种模态对任务表现的影响。
- 评估具有不同栓钉几何形状的任务的泛化,以及对扰动和传感器噪声的鲁棒性。
多模态表示模型
Fig. 2: Neural network architecture for multimodal representation learning with self-supervision. The network takes data from three different sensors as input: RGB images, F/T readings over a 32ms window, and end-effector position and velocity. It encodes and fuses this data into a multimodal representation based on which controllers for contact-rich manipulation can be learned. This representation learning network is trained end-to-end through self-supervision.
策略学习和控制器设计
Fig. 3: Our controller takes end-effector position displacements from the policy at 20Hz and outputs robot torque commands at 200Hz. The trajectory generator interpolates high-bandwidth robot trajectories from low-bandwidth policy actions. The impedance PD controller tracks the interpolated trajectory. The operational space controller uses the robot dynamics model to transform Cartesianspace accelerations into commanded joint torques. The resulting controller is compliant and reactive.
实验:设计和设置
Fig. 4: Simulated Peg Insertion: Ablative study of representations trained on different combinations of sensory modalities. We compare our full model, trained with a combination of visual and haptic feedback and proprioception, with baselines that are trained without vision, or haptics, or either. (b) The graph shows partial task completion rates with different feedback modalities, and we note that both the visual and haptic modalities play an integral role for contact-rich tasks.
Reward Design
实验:结果
Real Robot Experiments
Fig. 5: (a) 3D printed pegs used in the real robot experiments and their box clearances. (b) Qualitative predictions: We visualize examples of optical flow predictions from our representation model (using color scheme in [22]). The model predicts different flow maps on the same image conditioned on different next actions indicated by projected arrows.
Fig. 6: Real Robot Peg Insertion: We evaluate our Full Model on the real hardware with different peg shapes, indicated on the x-axis. The learned policies achieve the tasks with a high success rate. We also study transferring the policies and representations from trained pegs to novel peg shapes (last four bars). The robot effectively re-uses previously trained models to solve new tasks.