MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation

Navigation tasks in photorealistic 3D environments are challenging because they require perception and effective planning under partial observability. Recent work shows that map-like memory is useful for long-horizon navigation tasks. However, a focused investigation of the impact of maps on navigation tasks of varying complexity has not yet been performed.
We propose the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment. MultiON generalizes the ObjectGoal navigation task [1, 2] and explicitly tests the ability of navigation agents to locate previously observed goal objects. We perform a set of multiON experiments to examine how a variety of agent models perform across a spectrum of navigation task complexities. Our experiments show that: i) navigation performance degrades dramatically with escalating task complexity; ii) a simple semantic map agent performs surprisingly well relative to more complex neural image feature map agents; and iii) even oracle map agents achieve relatively low performance, indicating the potential for future work in training embodied navigation agents using maps.

In an episode of multiON, the agent must navigate to an ordered set of objects placed within the environment. The number of objects m determines the overall complexity of the navigation episode. We use m-ON to refer to an episode with m ordered goal objects. The m objects are selected from a set of k available objects where k ≥ m.

NoMap(RNN): Baseline agent that does not use any map information.
OracleMap: Uses the entire ground truth map which contains occupancy information and object goal information.
OracleEgoMap: Uses the ground truth map that is progressively revealed as the agent explores the environment.
ObjRecogMap: Does not use oracle information and progressively constructs the global map by predicting object categories of goals visible from the sensory inputs. Trained using auxiliary classification loss.
ProjNeuralMap: Neurally projects the image features into a global map using the projection module.

Here are some sample visualizations of OracleMap. White color represents non-navigable locations, gray color represents navigable locations, and colored squares represent correspondingly colored goal. First, second and third maps are for 2-ON, 3-ON, and 4-ON task respectively.

MultiON

Agent Architecture

Agents

Results

OracleMap Visualizations

Code and Dataset

Paper and Bibtex

Acknowledgements