WMs to Generate Diverse Training Data

Abstract

World Models (WMs) have significant potential for robotics, particularly as a training engine for policy learning. In our approach, policies are executed autonomously inside the learned world model, allowing large amounts of interaction data to be generated without physical robot rollouts. We use Generative Value Learning (GVL) to filter good rollouts. When policies fail, we perform planning within the world model to generate recovery trajectories, which provide additional training data. This process enables iterative policy improvement through large-scale, model-based data generation.