Dual-branch self-attention
FreeSpec replaces self-attention with global full-window and local sliding-window branches during inference while keeping the pretrained short-video model frozen.
FreeSpec replaces self-attention with global full-window and local sliding-window branches during inference while keeping the pretrained short-video model frozen.
Global and local outputs are decomposed by SVD, and global information is injected through timestep- and rank-aware singular-spectrum modulation.
The modulated spectrum is reconstructed under the local singular basis, preserving high-rank spatial-temporal variations while retaining controlled low-rank structural guidance.
All results are evaluated at 4× the native base-model inference length. Inference time denotes runtime on an A100 GPU. Best scores are shown in bold, second-best scores are underlined, and the FreeSpec row is highlighted.
Additional side-by-side examples. The left column shows Direct and the right column shows FreeSpec (Ours).
@article{chen2026freespec,
title={FreeSpec: Training-Free Long Video Generation via Singular-Spectrum Reconstruction},
author={Fangda Chen and Shanshan Zhao and Longrong Yang and Chuanfu Xu and Zhigang Luo and Long Lan},
year={2026},
journal={arXiv preprint arXiv:2605.06509},
}