Seeking Physics in Diffusion Noise

1Brown University    2University of Edinburgh    3Massachusetts Institute of Technology

"A ray of light is shining diagonally on a plastic cup in the dark, with the shadow of the plastic cup appearing at the bottom"

Baseline

Ours

"A piece of wood block is gently placed on the surface of a bowlfilled with water"

Baseline

Ours

Abstract

Do video diffusion models encode signals predictive of physical plausibility? We probe intermediate denoising representations of a pretrained Diffusion Transformer (DiT) and find that physically plausible and implausible videos are partially separable in mid-layer feature space across noise levels. This separability cannot be fully attributed to visual quality or generator identity, suggesting recoverable physics-related cues in frozen DiT features. Leveraging this observation, we introduce progressive trajectory selection, an inference-time strategy that scores parallel denoising trajectories at a few intermediate checkpoints using a lightweight physics verifier trained on frozen features, and prunes low-scoring candidates early. Extensive experiments on PhyGenBench demonstrate that our method improves physical consistency while reducing inference cost, achieving comparable results to Best-of-K sampling with substantially fewer denoising steps.

Pipeline Overview

Video Presentation

Experimental Results

Cross-Backbone Results on PhyGenBench

Backbone Method Overall S1 S2 S3 Mech Opti Ther Mate Win %
CogVideoX-2B Baseline 0.370 0.38 0.43 0.34 0.39
Ours 0.515 1.98 0.91 1.69 0.49 0.58 0.47 0.49 66.1%
CogVideoX-5B Baseline 0.363 1.54 0.58 1.21 0.283 0.493 0.322 0.308
Ours 0.365 1.52 0.53 1.30 0.292 0.456 0.256 0.408 62.5%
Wan 2.1-14B Baseline 0.569 2.05 1.28 1.79 0.525 0.740 0.489 0.458
Ours 0.612 2.09 1.46 1.86 0.600 0.767 0.533 0.492

S1: VQAScore (single-frame), S2: multi-frame physics (GPT-4o), S3: naturalness (GPT-4o). Mech: mechanics, Opti: optics, Ther: thermal, Mate: material properties. Win %: pairwise preference judged by GPT-4o, excluding ties.

-->

BibTeX

@misc{tang2026seekingphysicsdiffusionnoise,
    title={Seeking Physics in Diffusion Noise},
    author={Chujun Tang and Lei Zhong and Fangqiang Ding},
    year={2026},
    eprint={2603.14294},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2603.14294},
}