Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents
      
      
      
      
          
            | * equal contribution by SP, SW and UJ | 
          
      
      
      
      
      
      
        Communication between embodied AI agents has received increasing attention in recent years.
        Despite its use, it is still unclear whether the learned communication is interpretable and grounded in perception.
        To study the grounding of emergent forms of communication, we first introduce the collaborative multi-object navigation task CoMON.
        In this task, an oracle agent has detailed environment information in the form of a map.
        It communicates with a navigator agent that perceives the environment visually and is tasked to find a sequence of goals.
        To succeed at the task, effective communication is essential.
        CoMON hence serves as a basis to study different communication mechanisms between heterogeneous agents, that is, agents with different capabilities and roles.
        We study two common communication mechanisms and analyze their communication patterns through an egocentric and spatial lens.
        We show that the emergent communication can be grounded to the agent observations and the spatial structure of the 3D environment.
      
      
      
      
      
      
      
      MultiON Task
      
        In an episode of multiON, the agent must navigate to an ordered set of objects placed within the environment.
        The number of objects m determines the overall complexity of the navigation episode.
        We use m-ON to refer to an episode with m ordered goal objects.
        The m objects are selected from a set of k available objects where k ≥ m.
      
      CoMON Task
      
        In CoMON, an episode involves two heterogeneous agents A
O and A
N. 
        A
O is a disembodied oracle, which cannot navigate in the environment. 
        However, A
O has access to oracle top-down information of the environment's state.
        A
N is an embodied navigator, which navigates and interacts with the environment. 
        A
N carries out the MultiON task.
        A
O and A
N can perform the task collaboratively by communicating via a limited-bandwidth channel.
        
      
      Communication Mechanisms
      
      Communication Architecture
      
      Interpretation of Communication
      U-Comm interpretation
      
      
      S-Comm interpretation
      
      
    
      Results
      
      
      
      Code and Dataset
      
      
      
        Paper and Bibtex
        
        
        
        
         
        
        
        
        [Paper]
        
        
        [Poster]
        
         | 
        
         | 
        
        
         Citation    Shivansh Patel*, Saim Wani*, Unnat Jain*, Alexander Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang. Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents In ICCV 2021. 
        
        [Bibtex]
               | 
              
              
              | 
               | 
              
               | 
              
                
@inproceedings{patel2021interpretation,
  Author = {Shivansh Patel and Saim Wani and 
  Unnat Jain and Alexander Schwing and 
  Svetlana Lazebnik and  Manolis Savva
  and Angel X. Chang},
  Title = {Interpretation of Emergent Communication 
  in Heterogeneous Collaborative Embodied Agents},
  Booktitle = {ICCV},
  Year = {2021}
  }
           
           | 
          
      
    
    
    
      
      Acknowledgements
      This work was funded in part by a Canada CIFAR AI Chair, a Canada Research
       Chair and NSERC Discovery Grant, and enabled in part by support provided by 
       WestGrid  and Compute Canada. 
       This work is supported in part by NSF under grant #1718221, 2008387,2045586. 
       |