World Models (WMs) have significant potential for robotics, particularly as a training engine for policy learning. In our approach, policies are executed autonomously inside the learned world model, allowing large amounts of interaction data to be generated without physical robot rollouts.
Below are bridge-trained policy rollouts generated by the world model:
We use TopReward to filter good rollouts. We found it works well after some modifications. We aim to distill desirable behaviors from multiple policies into a single policy.