Yilun Du

I am currently a senior research scientist at Google Deepmind and an incoming Assistant Professor at Harvard starting in Fall 2025 in the Kempner Institute and CS. I received my PhD at MIT EECS, advised by Prof. Leslie Kaelbling, Prof. Tomas Lozano-Perez and Prof. Joshua B. Tenenbaum. Previously, I also obtained my bachelor's degree from MIT, was a research fellow at OpenAI, an intern and visiting researcher at FAIR and Google Deepmind, and got a gold medal at the International Biology Olympiad. My research focuses on generative models, decision making, robot learning, embodied agents, and the applications of such tools to scientific domains.

My research is driven by the goal of developing intelligent embodied agents in the physical world. My research has primarily focused on building models of the world using generative AI, enabling us to integrate systematic planning and iterative reasoning into learned agents. A major challenge in applying generative AI in this setting is the lack of available modeling data and the necessity for models to generalize well to previously unseen situations. My work addresses this by constructing composable generative models using the idea of learning energy landscapes (EBMs) as a means to generalize beyond the narrow amount of data that is available, and some of my early work on EBMs led to the development of diffusion models in 2020. In the visual domain, EBMs enable compositional visual generation and compositional scene understanding. In decision-making, compositional models enable the synthesis of new trajectories in trajectory planning, enabling flexible adaptation to novel goals and rewards across both synthesized videos and on real robots. Finally, an energy optimization perspective on prediction enables us to combine foundation models together in a zero-shot manner, enabling both hierarchical planning and multimodal perception.

More broadly, I am interested in constructing a decentralized generative architecture for decision-making, consisting of a society of different multimodal models, each with separate responsibilities such as 3D perception, memory, and auditory understanding, which jointly cooperate to make decisions in an environment. I am further interested in integrating systematic reasoning with deep learning approaches, and believe that test-time search is crucial to construct effective multimodal and decision-making agents. Finally, I am also interested general approaches to improve generative modeling as well as their broader applications in domains in science such as computational biology and inverse design.

News
Research Highlights
  • Generative Modeling: constructing compositional generative models of the world.
  • Reasoning: building agents that can reason and understand the world.
  • Perception: inferring the structure of the world from sensory inputs.
  • Interactive Agents: building agents that can interact, explore and learn from the world.
Publications ( show selected / show all by date / show all by topic )

Topics: Generative Modeling / Reasoning / Perception / Interactive Agents (* indicates equal contribution and indicates equal advising)

Few-Shot Task Learning through Inverse Generative Modeling
Aviv Netanyahu, Yilun Du, Antonia Bronars, Jyotish Pari, Joshua B. Tenenbaum, Tianmin Shu, Pulkit Agrawal

NeurIPS 2024

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz , Russ Tedrake, Vincent Sitzmann

NeurIPS 2024 / Website / Paper / Code

Learning Generalizable Systems by Learning Composable Energy Landscapes
Yilun Du

PhD Thesis 2024 / Thesis

Large-scale Reinforcement Learning for Diffusion Models
Yinan Zhang, Eric Tzeng, Yilun Du, Dmitry Kislyuk

ECCV 2024 / Paper / Website / Code

Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Jie Yie, Andrew Luo, Yilun Du, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan

IROS 2024 / Paper

"Set It Up!": Functional Object Arrangement with Compositional Generative Models
Yiqing Xu, Jiayuan Mao, Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling, David Hsu

RSS 2024 / Paper

PoCo: Policy Composition from and for Heterogeneous Robot Learning
Lirui Wang, Alan Zhao, Yilun Du, Ted Adelson, Russ Tedrake

RSS 2024 / Website / Paper

Compositional Generative Modeling: A Single Model is Not All You Need
Yilun Du, Leslie Kaelbling

ICML 2024 / Paper

Improving Factuality and Reasoning in Language Models through Multiagent Debate
Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch

ICML 2024 / Website / Paper / Code

Learning Iterative Reasoning through Energy Diffusion
Yilun Du*, Jiayuan Mao*, Joshua B. Tenenbaum

ICML 2024 / Website / Paper / Code

Potential Based Diffusion Motion Planning
Yunhao Luo, Chen Sun, Joshua B. Tenenbaum, Yilun Du

ICML 2024 / Website / Paper / Code

Compositional Image Decomposition with Diffusion Models
Jocelin Su*, Nan Liu*, Yanbo Wang*, Joshua B. Tenenbaum, Yilun Du

ICML 2024 / Website / Paper / Code

RoboDreamer: Learning Compositional World Models for Robot Imagination
Siyuan Zhou, Yilun Du, Jiaben Chen, Yandong Li, Dit-Yan Yeung, Chuang Gan

ICML 2024 / Website / Paper / Code

Video as the New Language for Real-World Decision Making
Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

ICML 2024 / Paper

3D-VLA: A 3D Vision-Language-Action Generative World Model
Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

ICML 2024 / Website / Paper / Code

Learning to Act from Actionless Video through Dense Correspondences
Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum

ICLR 2024 (Spotlight) / Website / Paper / Code

Learning Interactive Real-World Simulators
Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

ICLR 2024 (Outstanding Paper Award) / Website / Paper

Compositional Generative Inverse Design
Tailin Wu*, Takashi Maruyama*, Long Wei*, Tao Zhang*, Yilun Du*, Gianluca Iaccarino, Jure Leskovec

ICLR 2024 (Spotlight) / Paper

Probabilistic Adaptation of Text-to-Video Models
Mengjiao Yang*, Yilun Du*, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

ICLR 2024 / Website / Paper

Training Diffusion Models with Reinforcement Learning
Kevin Black*, Michael Janner*, Yilun Du, Ilya Kostrikov, Sergey Levine

ICLR 2024 / Website / Paper / Code

Building Cooperative Embodied Agents Modularly with Large Language Models
Hongxin Zhang, Weihua Du, Jiaming Shan, Qinhong Zhou, Yilun Du, Joshua B. Tenenbaum, Tianmin Shu, Chuang Gan

ICLR 2024 / Website / Paper / Code

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan

ICLR 2024 / Website / Paper

Learning Universal Policies via Text-Guided Video Generation
Yilun Du*, Mengjiao Yang*, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel

NeurIPS 2023 (Spotlight) / Website / Paper

DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models
Tsun-Hsuan Wang, Juntian Zheng, Pingchuan Ma, Yilun Du, Byungchul Kim, Andrew Spielberg, Joshua Tenenbaum, Chuang Gan, Daniela Rus

NeurIPS 2023 (Oral) / Website / Paper

Adaptive Online Replanning with Diffusion Models
Siyuan Zhou, Yilun Du, Shun Zhang, Mengdi Xu, Yikang Shen, Wei Xiao, Dit-Yan Yeung, Chuang Gan

NeurIPS 2023 / Website / Code / Paper

3D-LLM: Injecting the 3D World into Large Language Models
Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan

NeurIPS 2023 (Spotlight) / Website / Paper / Code

FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow
Cameron Smith, Yilun Du, Ayush Tewari, Vincent Sitzmann

NeurIPS 2023 / Website / Paper / Code

Secure Out-of-Distribution Task Generalization with Energy-Based Models
Shengzhuang Chen, Long-Kai Huang, Jonathan Schwarz, Yilun Du, Ying Wei

NeurIPS 2023 / Paper

Compositional Diffusion-Based Continuous Constraint Solvers
Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomas Lozano-Perez, Leslie Kaelbling

CoRL 2023 / Website / Code / Paper

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models
Nan Liu*, Yilun Du*, Shuang Li*, Joshua B. Tenenbaum, Antonio Torralba

ICCV 2023 / Website / Paper / Code

Foundation Models for Decision Making: Problems, Methods, and Opportunities
Mengjiao Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

ArXiv Preprint / Paper

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC
Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, Will Grathwohl

ICML 2023 / Website / Colab / Tapestry Colab / Code / Paper

Inferring Relational Potentials in Interacting Systems
Armand Comas, Yilun Du, Christian Fernandez, Sandesh Ghimire, Mario Sznaier, Joshua B. Tenenbaum, Octavia Camps

ICML 2023 (Oral) / Website / Code / Paper

NeuSE: Neural SE(3)-Equivariant Embedding for Consistent Spatial Understanding with Objects
Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

RSS 2023 / ICRA 2023 RAP4Robots Workshop (Outstanding Paper Award) / Website / Paper

StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects
Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton

RSS 2023 / Website / Paper

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, Shuran Song

RSS 2023 / Website / Code / Paper

Learning to Render Novel Views from Wide-Baseline Stereo Pairs
Yilun Du, Cameron Smith, Ayush Tewari, Vincent Sitzmann

CVPR 2023 / Website / Code / Colab / Paper

3D Concept Learning and Reasoning from Multi-View Images
Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

CVPR 2023 / Website / Paper

Is Conditional Generative Modeling all You Need for Decision-Making?
Anurag Ajay*, Yilun Du*, Ahbi Gupta*, Joshua B. Tenenbaum, Tommi S. Jaakkola, Pulkit Agrawal

ICLR 2023 (Top 5% Notable) / Website / Paper / Code

Composing Ensembles of Pre-trained Models via Iterative Consensus
Shuang Li*, Yilun Du*, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch

ICLR 2023 / Website / Paper

Planning with Sequence Models through Iterative Energy Minimization
Hongyi Chen*, Yilun Du*, Yiye Chen*, Joshua B. Tenenbaum, Patricio Antonio Vela

ICLR 2023 / Website / Paper / Code

Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement
Prafull Sharma, Ayush Tewari, Yilun Du, Sergey Zakharov, Rares Ambrus, Adrien Gaidon, William T. Freeman, Fredo Durand, Joshua B. Tenenbaum, Vincent Sitzmann

ICLR 2023 / Website / Paper

Local Neural Descriptor Fields: Locally Conditioned Object Representations for Manipulation
Ethan Chun, Yilun Du, Anthony Simeonov, Tomas Lozano-Perez, Leslie Kaelbling

ICRA 2023 / Paper / Website / Code

Visiblity-Aware Navigation Among Movable Objects
Jose Iturralde*, Aiden Curtis*, Yilun Du, Leslie Kaelbling, Tomas Lozano-Perez

ICRA 2023 / Paper

Language Models Generalize Beyond Natural Proteins
Robert Verkuil*, Ori Kabeli*, Yilun Du, Basile Wicky, Lukas Milles, Justas Dauparas, David Baker, Sergey Ovchinnikov, Tom Sercu, Alexander Rives

bioRxiv Preprint / Paper

Self-conditioned Embedding Diffusion for Text Generation
Robin Strudel, Corentin Tallec, Florent Altche, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, Remi Lebond

ArXiv Preprint / Paper

SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields
Anthony Simeonov*, Yilun Du*, Yen-Chen Lin, Alberto Rodriguez, Leslie Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal

CoRL 2022 / Website / Paper / Code

3D Concept Grounding on Neural Fields
Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan

NeurIPS 2022 / Website / Paper

Pre-Trained Language Models for Interactive Decision-Making
Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyurek, Anima Anandkumar+, Jacob Andreas+, Igor Mordatch+, Antonio Torralba+, Yuke Zhu+

NeurIPS 2022 (Oral) / Website / Paper / Code

(Last five authors contributed equally; order determined by alphabetically.)
Compositional Visual Generation with Composable Diffusion Models
Nan Liu*, Shuang Li*, Yilun Du*, Antonio Torralba, Joshua B. Tenenbaum

ECCV 2022 / Website / Paper / Code / Colab

Learning Object Based State Estimators for Household Autonomy
Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling

IROS 2022 / Website / Paper

Robust Change Detection Based on Neural Descriptor Fields
Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

IROS 2022 / Website / Paper

Planning with Diffusion for Flexible Behavior Synthesis
Michael Janner*, Yilun Du*, Joshua B. Tenenbaum, Sergey Levine

ICML 2022 (Long Talk) / Website / Paper / Code / Colab

Learning Iterative Reasoning through Energy Minimization
Yilun Du, Shuang Li, Joshua B. Tenenbaum, Igor Mordatch

ICML 2022 / Website / Paper / Code

Streaming Inference for Infinite Feature Models
Rylan Schaeffer, Yilun Du, Gabrielle Liu, Ila Fiete

ICML 2022 / Paper

Energy-Based Models for Continual Learning
Shuang Li, Yilun Du, Gido M. van de Ven, Antonio Torralba, Igor Mordatch

CoLLA 2022 (Oral) / Paper / Project Page / Code

Streaming Inference for Infinite Non-Stationary Clustering
Rylan Schaeffer, Gabrielle Liu, Yilun Du, Scott Linderman, Ila Fiete

CoLLA 2022 / Paper

Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation
Anthony Simeonov*, Yilun Du*, Andrea Tagliasacchi, Joshua B. Tenenbaum, Alberto Rodriguez, Pulkit Agrawal+, Vincent Sitzmann+

ICRA 2022 / Website / Paper / Code / Colab

(First two authors contributed equally; order determined by coin toss. Last two authors equal advising.)
Unsupervised Learning of Compositional Energy Concepts
Yilun Du, Shuang Li, Yash Sharma, Joshua B. Tenenbaum, Igor Mordatch

NeurIPS 2021 / Website / Paper / Code

Learning to Compose Visual Relations
Nan Liu*, Shuang Li*, Yilun Du*, Joshua B. Tenenbaum, Antonio Torralba

NeurIPS 2021 (Spotlight) / NeurIPS 2021 Workshop on Controllable Generative Modeling (Outstanding Paper Award) / Website / Paper / Code / MIT News

Learning Signal-Agnostic Manifolds of Neural Fields
Yilun Du, Katie Collins, Joshua B. Tenenbaum, Vincent Sitzmann

NeurIPS 2021 / Website / Paper / Code

The Neural MMO Platform for Massively Multiagent Research
Joseph Suarez, Yilun Du, Clare Zhu, Igor Mordatch, Phillip Isola

NeurIPS 2021 Track on Datasets and Benchmarks / Website / Paper

3D Shape Generation and Completion through Point-Voxel Diffusion
Linqi Zhou, Yilun Du, Jiajun Wu

ICCV 2021 (Oral) / Project Page / Paper / Code

Curious Representation Learning for Embodied Intelligence
Yilun Du, Chuang Gan, Phillip Isola

ICCV 2021 / Project Page / Paper / Code

Neural Radiance Flow for 4D View Synthesis and Video Processing
Yilun Du, Yinan Zhang, Hong-Xing Yu, Joshua B. Tenenbaum, Jiajun Wu

ICCV 2021 / Paper / Project Page / Code

Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
Shuang Li, Yilun Du, Antonio Torralba, Josef Sivic, Bryan Russell

ICCV 2021 / Paper / Project Page / Code / Dataset

Improved Contrastive Divergence Training of Energy Based Models
Yilun Du, Shuang Li, Joshua B. Tenenbaum, Igor Mordatch

ICML 2021 / ICLR 2021 EBM Workshop (Oral) / Paper / Project Page / Code

Unsupervised Discovery of 3D Physical Objects from Video
Yilun Du, Kevin Smith, Tomer Ulman, Joshua B. Tenenbaum, Jiajun Wu

ICLR 2021 / Paper / Code / Project Page

Compositional Visual Generation with Energy Based Models
Yilun Du, Shuang Li, Igor Mordatch

NeurIPS 2020 (Spotlight) / Paper / Project Page / Code

A Long Horizon Planning Framework for Manipulating Rigid Pointcloud Objects
Anthony Simeonov, Yilun Du, Beomjoon Kim, Francois Hogan, Joshua B. Tenenbaum, Pulkit Agrawal, Alberto Rodriguez

CORL 2020 / Paper

Energy-based models for atomic-resolution protein conformations
Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives

ICLR 2020 (Spotlight) / MLCB 2020 (Oral / Travel Award) / Paper / Code

Observational Overfitting in Reinforcement Learning
Xingyou Song, Yiding Jiang, Yilun Du, Behnam Neyshabur

ICLR 2020 / Paper

Model Based Planning with Energy Based Models
Yilun Du, Toru Lin, Igor Mordatch

CORL 2019 / ICML MBRL Workshop 2019 (Oral) / Paper / Code

Implicit Generation and Generalization with Energy Based Models
Yilun Du, Igor Mordatch

NeurIPS 2019 (Spotlight) / Paper / Website / Code / OpenAI Blog

Task-Agnostic Dynamics Priors for Deep Reinforcement Learning
Yilun Du, Karthik Narasimhan

ICML 2019 / Paper / Code

Neural MMO: A massively multiplayer game environment for intelligent agents
Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch

AAMAS 2020 Extended Abstract / Paper / Code / OpenAI Blog

Learning to Exploit Stability for 3D Scene Parsing
Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

NeurIPS 2018 / Paper