Efficient Reinforcement Learning for Robotic Control

We design efficient reinforcement learning (RL) for generalizing on diverse tasks and environments, as well as for generalizing from simulation to real robots. We specifically made two efforts along this direction:

TD-MPC2

Building on our previous effort on TD-MPC, a model-based reinforcement learning algorithm that optimizes local trajectories in the latent space of a learned implicit world model, we propose TD-MPC2, an advanced model-based RL algorithm that performs local trajectory optimization in the latent space of a learned implicit world model. TD-MPC2 introduces several improvements that enable it to significantly outperform baseline models across 104 online RL tasks spanning four diverse task domains. The algorithm successfully trains a 317M parameter agent capable of executing 80 tasks across multiple domains, embodiments, and action spaces, all with a single set of hyperparameters. This development addresses critical challenges in creating generalist embodied agents that can learn diverse control tasks from large, uncurated datasets by leveraging RL to extract expert behaviors. Key advancements include (i) enhanced algorithmic robustness through refined core design choices and (ii) a carefully designed architecture that can handle varied datasets without relying on extensive domain knowledge. Consequently, TD-MPC2 is scalable, robust, and versatile, making it applicable to a wide range of single-task and multi-task continuous control problems, thereby pushing the boundaries of what generalist RL agents can achieve.

ExBody

We also propose to allow a humanoid robot to imitate human motion using RL and imitation learning. To enable humanoid robots to generate rich, diverse, and expressive motions in the real world, we propose the ExBody method, which involves training a whole-body control policy using large-scale human motion capture data in a RL framework. This method tackles the challenge of directly mimicking human motions by focusing on imitating upper body motions while allowing the legs to follow robust velocity commands. Through training in simulation and Sim2Real transfer, the policy enables humanoid robots to walk in various styles, shake hands, and dance with humans. Using the Unitree H1 robot and the CMU MoCap dataset, we show that this approach not only enhances expressiveness but also improves walking robustness. Extensive evaluations in both simulation and real-world settings demonstrate the effectiveness of ExBody, highlighting its potential for broader applications in humanoid control and navigation.

Team Members

Hao Su1
Xiaolong Wang1

1. UC San Diego

Publications

ICLR 2024 >
RSS 2024 >