FlashPortrait: 6× Faster Infinite Portrait Animation with Adaptive Latent Prediction

1Fudan University   2Microsoft Research Asia   3Xi'an Jiaotong University   4Tencent Inc   5Wan Team, Tongyi Lab, Alibaba Group

Abstract

Current diffusion-based acceleration methods for long-portrait animation struggle to ensure identity (ID) consistency. This paper presents FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6× acceleration in inference speed.

In particular, FlashPortrait begins by computing the identity-agnostic facial expression features with an off-the-shelf extractor. It then introduces a Normalized Facial Expression Block to align facial features with diffusion latents by normalizing them with their respective means and variances, thereby improving identity stability in facial modeling.

During inference, FlashPortrait adopts a dynamic sliding-window scheme with weighted blending in overlapping areas, ensuring smooth transitions and ID consistency in long animations. In each context window, based on the latent variation rate at particular timesteps and the derivative magnitude ratio among diffusion layers, FlashPortrait utilizes higher-order latent derivatives at the current timestep to directly predict latents at future timesteps, thereby skipping several denoising steps.

Human Image drives Human Video

Human Image drives Anime Character Video

Multi-Output Animation Results

Comparisons with SOTA methods

Ablation Study

BibTeX

@article{tu2025flashportrait,
  title={FlashPortrait: 6$\times$ Faster Infinite Portrait Animation with Adaptive Latent Prediction},
  author={Tu, Shuyuan and Pan, Yueming and Huang, Yinming and Han, Xintong and Xing, Zhen and Dai, Qi and Qiu, Kai and Luo, Chong and Wu, Zuxuan},
  journal={arXiv preprint arXiv:2512.16900},
  year={2025}
}