Evaluating generalist robot policies in the real world is slow and expensive, making it a major bottleneck for progress. To address this, we build a simulation-based evaluation pipeline that reconstructs real-world scenes. We use the MuJoCo simulator to model the robot and its dynamics, while objects in the scene are simulated within the same physics environment. The static background is reconstructed using 3D Gaussian Splatting (3DGS) to achieve photorealistic rendering.
Within this reconstructed environment, we evaluate several representative policy architectures, including ACT, Diffusion Policy, and π0.5. Our goal is to identify the key parameters and design choices that must be carefully controlled to ensure that policy evaluations in simulation correlate faithfully with real-world performance.