Evaluating generalist robot policies in the real world is slow and expensive, making it a major bottleneck for progress. To address this, we build a simulation-based evaluation pipeline that uses reconstruction of real-world scenes. We use the MuJoCo simulator to model the robot and its dynamics, while objects in the scene are simulated within the same physics environment. The static background is reconstructed using 3D Gaussian Splatting (3DGS) to achieve photorealistic rendering.
Within this reconstructed environment, we evaluate several representative policy architectures, including ACT, Diffusion Policy, and π0.5. Our goal is to identify the key parameters and design choices that must be carefully controlled to ensure that policy evaluations in simulation correlate faithfully with real-world performance.
First, we match the dynamics of the simulation to the real-world. In the video below, we can see how closely it matches, though there is a constant offset because of inaccurate camera extrinsics between real and sim. We found velocity control to be more reliable than position control for this matching.