MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation

Saim Wani*
Shivansh Patel*
Unnat Jain*
Angel X. Chang
Manolis Savva
* equal contribution by SW, SP and UJ
Published at NeurIPS 2020


Navigation tasks in photorealistic 3D environments are challenging because they require perception and effective planning under partial observability. Recent work shows that map-like memory is useful for long-horizon navigation tasks. However, a focused investigation of the impact of maps on navigation tasks of varying complexity has not yet been performed.
We propose the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment. MultiON generalizes the ObjectGoal navigation task [1, 2] and explicitly tests the ability of navigation agents to locate previously observed goal objects. We perform a set of multiON experiments to examine how a variety of agent models perform across a spectrum of navigation task complexities. Our experiments show that: i) navigation performance degrades dramatically with escalating task complexity; ii) a simple semantic map agent performs surprisingly well relative to more complex neural image feature map agents; and iii) even oracle map agents achieve relatively low performance, indicating the potential for future work in training embodied navigation agents using maps.


In an episode of multiON, the agent must navigate to an ordered set of objects placed within the environment. The number of objects m determines the overall complexity of the navigation episode. We use m-ON to refer to an episode with m ordered goal objects. The m objects are selected from a set of k available objects where k ≥ m.

Agent Architecture


NoMap(RNN): Baseline agent that does not use any map information.
OracleMap: Uses the entire ground truth map which contains occupancy information and object goal information.
OracleEgoMap: Uses the ground truth map that is progressively revealed as the agent explores the environment.
ObjRecogMap: Does not use oracle information and progressively constructs the global map by predicting object categories of goals visible from the sensory inputs. Trained using auxiliary classification loss.
ProjNeuralMap: Neurally projects the image features into a global map using the projection module.


Here we show performance of different models across different tasks. We report metric PPL which is a version of SPL based on progress (fraction of objects goals successfully FOUND)

OracleMap Visualizations

Here are some sample visualizations of OracleMap. White color represents non-navigable locations, gray color represents navigable locations, and colored squares represent correspondingly colored goal. First, second and third maps are for 2-ON, 3-ON, and 4-ON task respectively.

Code and Dataset


Paper and Bibtex

[ArXiv] [Poster]

Saim Wani*, Shivansh Patel*, Unnat Jain*, Angel X. Chang, Manolis Savva. MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation In NeurIPS 2020.

  Author = {Saim Wani and Shivansh Patel and
  Unnat Jain and Angel X. Chang and Manolis Savva},
  Title = {MultiON: Benchmarking Semantic 
  Map Memory using Multi-Object Navigation},
  Booktitle = {NeurIPS},
  Year = {2020}


We thank the anonymous reviewers for their helpful suggestions. Unnat Jain thanks Alexander Schwing and Svetlana Lazebnik for their support. Angel X. Chang is supported by the Canada CIFAR AI Chair program. Manolis Savva is supported by an NSERC Discovery Grant. This research was enabled in part by support provided by WestGrid and Compute Canada
Template credits: Deepak and Richard.