Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents

Shivansh Patel*
Saim Wani*
Unnat Jain*
Alexander Schwing
Svetlana Lazebnik
Manolis Savva
Angel X. Chang
* equal contribution by SP, SW and UJ
Published at ICCV 2021


Communication between embodied AI agents has received increasing attention in recent years. Despite its use, it is still unclear whether the learned communication is interpretable and grounded in perception. To study the grounding of emergent forms of communication, we first introduce the collaborative multi-object navigation task CoMON. In this task, an oracle agent has detailed environment information in the form of a map. It communicates with a navigator agent that perceives the environment visually and is tasked to find a sequence of goals. To succeed at the task, effective communication is essential. CoMON hence serves as a basis to study different communication mechanisms between heterogeneous agents, that is, agents with different capabilities and roles. We study two common communication mechanisms and analyze their communication patterns through an egocentric and spatial lens. We show that the emergent communication can be grounded to the agent observations and the spatial structure of the 3D environment.

MultiON Task

In an episode of multiON, the agent must navigate to an ordered set of objects placed within the environment. The number of objects m determines the overall complexity of the navigation episode. We use m-ON to refer to an episode with m ordered goal objects. The m objects are selected from a set of k available objects where k ≥ m.

CoMON Task

In CoMON, an episode involves two heterogeneous agents AO and AN. AO is a disembodied oracle, which cannot navigate in the environment. However, AO has access to oracle top-down information of the environment's state. AN is an embodied navigator, which navigates and interacts with the environment. AN carries out the MultiON task. AO and AN can perform the task collaboratively by communicating via a limited-bandwidth channel.

Communication Mechanisms

Communication Architecture

Interpretation of Communication

U-Comm interpretation

S-Comm interpretation


Code and Dataset


Paper and Bibtex

[Paper] [Poster]

Shivansh Patel*, Saim Wani*, Unnat Jain*, Alexander Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang. Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents In ICCV 2021.

  Author = {Shivansh Patel and Saim Wani and 
  Unnat Jain and Alexander Schwing and 
  Svetlana Lazebnik and  Manolis Savva
  and Angel X. Chang},
  Title = {Interpretation of Emergent Communication 
  in Heterogeneous Collaborative Embodied Agents},
  Booktitle = {ICCV},
  Year = {2021}


This work was funded in part by a Canada CIFAR AI Chair, a Canada Research Chair and NSERC Discovery Grant, and enabled in part by support provided by WestGrid and Compute Canada. This work is supported in part by NSF under grant #1718221, 2008387,2045586.
Template credits: Deepak and Richard.