Yilun Du

Email: ydu [at] seas [dot] harvard [dot] edu
Twitter: https://twitter.com/du_yilun
Github: https://github.com/yilundu
Research Statement: PDF
CV: CV

I am an Assistant Professor at Harvard in the Kempner Institute and CS and senior research scientist at Google Deepmind. I received my PhD at MIT EECS, advised by Prof. Leslie Kaelbling, Prof. Tomas Lozano-Perez and Prof. Joshua B. Tenenbaum. Previously, I also obtained my bachelor's degree from MIT, was a research fellow at OpenAI, an intern and visiting researcher at FAIR and Google Deepmind, and got a gold medal at the International Biology Olympiad. My research focuses on generative models, decision making, robot learning, embodied agents, and the applications of such tools to scientific domains.

My research is driven by the goal of developing intelligent embodied agents in the physical world. My research has primarily focused on building models of the world using generative AI, enabling us to integrate systematic planning and iterative reasoning into learned agents. A major challenge in applying generative AI in this setting is the lack of available modeling data and the necessity for models to generalize well to previously unseen situations. My work addresses this by constructing composable generative models using the idea of learning energy landscapes (EBMs) as a means to generalize beyond the narrow amount of data that is available, and some of my early work on EBMs led to the development of diffusion models in 2020. In the visual domain, EBMs enable compositional visual generation and compositional scene understanding. In decision-making, compositional models enable the synthesis of new trajectories in trajectory planning, enabling flexible adaptation to novel goals and rewards across both synthesized videos and on real robots. Finally, an energy optimization perspective on prediction enables us to combine foundation models together in a zero-shot manner, enabling both hierarchical planning and multimodal perception.

More broadly, I am interested in constructing a decentralized generative architecture for decision-making, consisting of a society of different multimodal models, each with separate responsibilities such as 3D perception, memory, and auditory understanding, which jointly cooperate to make decisions in an environment. I am further interested in integrating systematic reasoning with deep learning approaches, and believe that test-time search is crucial to construct effective multimodal and decision-making agents. Finally, I am also interested general approaches to improve generative modeling as well as their broader applications in domains in science such as computational biology and inverse design.

News

I recently gave a talk at Kempner Frontiers of NeuroAI Symposium about compositional world models for embodied intelligence here!
We are organizing a workshop on physically grounded world models at ICML 2025 and on world models in robotics at CoRL 2025!
I defended my PhD at MIT EECS! (Video: Link, Thesis: Link)
Check out a list of our work on energy-based models!

Research Highlights

Generative Modeling: constructing compositional generative models of the world.
Reasoning: building agents that can reason and understand the world.
Perception: inferring the structure of the world from sensory inputs.
Interactive Agents: building agents that can interact, explore and learn from the world.

Publications ( show selected / show all by date / show all by topic )

Topics: Generative Modeling / Reasoning / Perception / Interactive Agents (* indicates equal contribution and ^† indicates equal advising)

Yilun Du

News

Research Highlights

Publications ( show selected / show all by date / show all by topic )

Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers

Shalev Lifshitz, Sheila A. McIlraith, Yilun Du

TesserAct: Learning 4D Embodied World Models

Haoyu Zhen*, Qiao Sun*, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, Chuang Gan

MoELoco: Mixture of Experts for Multitask Locomotion

Runhan Huang*, Shaoting Zhu*, Yilun Du, Hang Zhao

Compositional Scene Understanding through Inverse Generative Modeling

Yanbo Wang, Justin Dauwels, Yilun Du

History-Guided Video Diffusion

Kiwhan Song*, Boyuan Chen*, Max Simchowitz, Yilun Du, Russ Tedrake, Vincent Sitzmann

Learning Adaptable World Models with Latent Actions

Shenyuan Gao, Siyuan Zhou, Yilun Du, Jun Zhang, Chuang Gan

Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability

Zishen Wan, Jiayi Qian, Yuhang Du, Jason Jabbour, Yilun Du, Yang (Katie) Zhao, Arijit Raychowhury, Tushar Krishna, Vijay Reddi

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning

Yuncong Yang, Han Yang, Jiachen Zhou, Peihao Chen, Hongxin Zhang, Yilun Du, Chuang Gan

Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models

Sigmund H. Hoeg, Yilun Du, Olav Egeland

Inference-Time Policy Steering through Human Interactions

Yanwei Wang, Lirui Wang, Yilun Du, Balakumar Sundaralingam, Xuning Yang, Yu-Wei Chao, Claudia Perez-D'Arpino, Dieter Fox, Julie Shah,

Anomalies-by-Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation

Siddharth Ancha, Sunshine Jiang, Travis Manderson, Laura Brandt, Yilun Du, Phillip Osteen, Nicholas Roy,

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Vighnesh Subramaniam*, Yilun Du*, Joshua B. Tenenbaum, Antonio Torralba, Shuang Li, Igor Mordatch

Grounding Video Models to Actions through Goal Conditioned Exploration

Yunhao Luo, Yilun Du

Solving New Tasks by Adapting Internet Video Knowledge

Calvin Luo, Zilai Zeng, Yilun Du, Chen Sun

Looped Transformers for Length Generalization

Ying Fan, Yilun Du, Kannan Ramchandran, Kangwook Lee

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Wenhao Chai*, Enxin Song*, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tel, Jenq-Neng Hwang, Saining Xie, Christopher Manning

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Hongxin Zhang*, Zeyuan Wang*, Qiushi Lyu*, Zheyuan Zhang, Sunli Chen, Tianmin Shu, Yilun Du, Chuang Gan

Compositional Diffusion Models for Powered Descent Trajectory Generation with Flexible Constraints

Julia Briden, Yilun Du, Enrico Zucchelli, Richard Linares

Few-Shot Task Learning through Inverse Generative Modeling

Aviv Netanyahu, Yilun Du, Antonia Bronars, Jyotish Pari, Joshua B. Tenenbaum, Tianmin Shu, Pulkit Agrawal

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz , Russ Tedrake, Vincent Sitzmann

Learning Generalizable Systems by Learning Composable Energy Landscapes

Yilun Du

Large-scale Reinforcement Learning for Diffusion Models

Yinan Zhang, Eric Tzeng, Yilun Du, Dmitry Kislyuk

Disentangled Acoustic Fields For Multimodal Physical Scene Understanding

Jie Yie, Andrew Luo, Yilun Du, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan

"Set It Up!": Functional Object Arrangement with Compositional Generative Models

Yiqing Xu, Jiayuan Mao, Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling, David Hsu

PoCo: Policy Composition from and for Heterogeneous Robot Learning

Lirui Wang, Alan Zhao, Yilun Du, Ted Adelson, Russ Tedrake

Compositional Generative Modeling: A Single Model is Not All You Need

Yilun Du, Leslie Kaelbling

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch

Learning Iterative Reasoning through Energy Diffusion

Yilun Du*, Jiayuan Mao*, Joshua B. Tenenbaum

Potential Based Diffusion Motion Planning

Yunhao Luo, Chen Sun, Joshua B. Tenenbaum, Yilun Du

Compositional Image Decomposition with Diffusion Models

Jocelin Su*, Nan Liu*, Yanbo Wang*, Joshua B. Tenenbaum, Yilun Du

RoboDreamer: Learning Compositional World Models for Robot Imagination

Siyuan Zhou, Yilun Du, Jiaben Chen, Yandong Li, Dit-Yan Yeung, Chuang Gan

Video as the New Language for Real-World Decision Making

Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

3D-VLA: A 3D Vision-Language-Action Generative World Model

Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

Video Language Planning

Yilun Du, Sherry Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

Learning to Act from Actionless Video through Dense Correspondences

Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum

Learning Interactive Real-World Simulators

Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

Compositional Generative Inverse Design

Tailin Wu*, Takashi Maruyama*, Long Wei*, Tao Zhang*, Yilun Du*, Gianluca Iaccarino, Jure Leskovec

Probabilistic Adaptation of Text-to-Video Models

Mengjiao Yang*, Yilun Du*, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

Haoyu Zhen, Qiao Sun, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, Chuang Gan

Runhan Huang, Shaoting Zhu, Yilun Du, Hang Zhao

Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, Vincent Sitzmann

Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Shuang Li, Igor Mordatch

Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tel, Jenq-Neng Hwang, Saining Xie, Christopher Manning

Hongxin Zhang, Zeyuan Wang, Qiushi Lyu*, Zheyuan Zhang, Sunli Chen, Tianmin Shu, Yilun Du, Chuang Gan

Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum

Jocelin Su, Nan Liu, Yanbo Wang*, Joshua B. Tenenbaum, Yilun Du

Tailin Wu, Takashi Maruyama, Long Wei, Tao Zhang, Yilun Du*, Gianluca Iaccarino, Jure Leskovec

Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, Sergey Levine

Anurag Ajay, Seungwook Han, Yilun Du*, Shuang Li, Abhi Gupta, Tommi Jaakkola, Joshua B. Tenenbaum, Leslie Kaelbling, Akash Srivastava, Pulkit Agrawal

Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel

Nan Liu, Yilun Du, Shuang Li*, Joshua B. Tenenbaum, Antonio Torralba

Yilun Du, Cameron Smith, Ayush Tewari^†, Vincent Sitzmann^†

Anurag Ajay, Yilun Du, Ahbi Gupta*, Joshua B. Tenenbaum, Tommi S. Jaakkola, Pulkit Agrawal

Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor Mordatch

Hongyi Chen, Yilun Du, Yiye Chen*, Joshua B. Tenenbaum, Patricio Antonio Vela

Jose Iturralde, Aiden Curtis, Yilun Du, Leslie Kaelbling, Tomas Lozano-Perez

Robert Verkuil, Ori Kabeli, Yilun Du, Basile Wicky, Lukas Milles, Justas Dauparas, David Baker, Sergey Ovchinnikov, Tom Sercu, Alexander Rives

Anthony Simeonov, Yilun Du, Yen-Chen Lin, Alberto Rodriguez, Leslie Kaelbling, Tomas Lozano-Perez, Pulkit Agrawal

Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyurek, Anima Anandkumar⁺, Jacob Andreas⁺, Igor Mordatch⁺, Antonio Torralba⁺, Yuke Zhu⁺

Nan Liu, Shuang Li, Yilun Du*, Antonio Torralba, Joshua B. Tenenbaum