Curious Representation Learning
for Embodied Intelligence
Yilun Du
Chuang Gan
Phillip Isola
[Download Paper]
[Github Code]

Self-supervised representation learning has achieved remarkable success in recent years. By subverting the need for supervised labels, such approaches are able to utilize the numerous unlabeled images that exist on the Internet and in photographic datasets. Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn not only from datasets but also learn from environments. An agent in a natural environment will not typically be fed curated data. Instead, it must explore its environment to acquire the data it will learn from. We propose a framework, curious representation learning (CRL), which jointly learns a reinforcement learning policy and a visual representation model. The policy is trained to maximize the error of the representation learner, and in doing so is incentivized to explore its environment. At the same time, the learned representation becomes stronger and stronger as the policy feeds it ever harder data to learn from. Our learned representations enable promising transfer to downstream navigation tasks, performing better than or comparably to ImageNet pretraining without using any supervision at all. In addition, despite being trained in simulation, our learned representations can obtain interpretable results on real images.

Agent Gif

Illustration of CRL exploration policy navigating in a unseen test environment. CRL is able to effectively look around the surrounding environment.

Source Code and Demo

We have released the Pytorch implementation for the paper. It builds upon Habitat lab github repository. Try our code!

Curious Representation Learning (CRL)

We introduce CRL, an approach to embodied representation learning in which a representation learning model plays a minimax game with an exploration policy. A exploration policy learns to explore the surrounding environment to maximizing contrastive loss. A representation learning model is then learned on the diverse gathered images. For more details, refer to the paper.


[Paper 5MB]  [arXiv]

Yilun Du, Chuang Gan, and Phillip Isola.
Curious Representation Learning for Embodied Intelligence.
ICCV 2021.

  title={Curious representation learning for embodied intelligence},
  author={Du, Yilun and Gan, Chuang and Isola, Phillip},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},


We thank MIT-IBM for support that led to this project. Yilun Du is funded by an NSF graduate research fellowship