Neural Radiance Flow for 4D View Synthesis and Video Processing

ICCV 2021

Yilun Du1, Yinan Zhang2, Hong-Xing Yu2, Joshua B. Tenenbaum1, Jiajun Wu2

1 MIT CSAIL    2 Stanford University

Paper Code

We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene. By enforcing consistency across different modalities, our representation enables multi-view rendering in diverse dynamic scenes, including water pouring, robotic interaction, and real images, outperforming state-of-the-art methods for spatial-temporal view synthesis. Our approach works even when inputs images are captured with only two separate cameras. We further demonstrate that the learned representation can serve an implicit scene prior, enabling video processing tasks such as image super-resolution and de-noising without any additional supervision.

Full View Synthesis

We present NeRFLow, which learns a 4D spatial-temporal representation of a dynamic scene. In a scene with water pouring, we are able to render novel images (left), infer the depth map (middle left), the underlying continuous flow field (middle right), and denoise input observations (right).

Our approach also works with synthesis results on the iGibson robot below, with rendered images on the left and inferred depth map in the middle. We are further able to synthesize dynamic animations (right).

Sparse View Synthesis

Our approach is able to represent a dynamic scene when given only a limited number of views across the scene, as well as a sparse set of timestamps from which the underlying scene is captured with. We illustrate the ability to do novel view synthesis utilizing a dynamic scene captured from a single real monocular video.

Next, we assess the ability of NeRFlow to synthesize scenes which are captured by a sparse set of timestamps. We capture the pouring animation utilizing only 1 in 10 frames of animation. We animate the resultant pouring animation across frames of animation and find that NeRFlow with consistency (left) performs significantly better than a approach without consistency (right).

Video Processing

NeRFlow is able to capture and aggregate radiance information across different viewpoints and timesteps. By rendering from this aggregate representation, NeRFlow can serve as a scene prior for video processing tasks. We present results where we show that when NeRFlow is fit on noisy input images, renders from NeRFlow are able to denoise and super-resolve the input images.



@inproceedings{du2021nerflow, author = {Yilun Du and Yinan Zhang and Hong-Xing Yu and Joshua B. Tenenbaum and Jiajun Wu}, title = {Neural Radiance Flow for 4D View Synthesis and Video Processing}, year = {2021}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision}, }