MotionEditor: Editing Video Motion via Content-Aware Diffusion

Shuyuan Tu1,2, Qi Dai3, Zhi-Qi Cheng4, Han Hu3, Xintong Han5, Zuxuan Wu1,2, Yu-Gang Jiang1,2
1Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University, 2Shanghai Collaborative Innovation Center of Intelligent Visual Computing, 3Microsoft Research Asia, 4Carnegie Mellon University, 5Huya Inc

Video Motion Editing Results


Abstract

Existing diffusion-based video editing models have made gorgeous advances for editing attributes of a source video over time but struggle to manipulate the motion information while preserving the original protagonist's appearance and background. To address this, we propose MotionEditor, the first diffusion model for video motion editing. MotionEditor incorporates a novel content-aware motion adapter into ControlNet to capture temporal motion correspondence. While ControlNet enables direct generation based on skeleton poses, it encounters challenges when modifying the source motion in the inverted noise due to contradictory signals between the noise (source) and the condition (reference). Our adapter complements ControlNet by involving source content to transfer adapted control signals seamlessly. Further, we build up a two-branch architecture (a reconstruction branch and an editing branch) with a high-fidelity attention injection mechanism facilitating branch interaction. This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance. We also propose a skeleton alignment algorithm to address the discrepancies in pose size and position. Experiments demonstrate the promising motion editing ability of MotionEditor, both qualitatively and quantitatively. To the best of our knowledge, MotionEditor is the first diffusion-based model capable of video motion editing.

Video Motion Editing Comparison (1/2)


Liquid Warping GAN
Motion Representations Articulated Animation
Tune-A-Video
Follow-Your-Pose
ControlVideo
Masactrl
FateZero
MotionEditor (Ours)
Source Video Reference Video Edited Video

Video Motion Editing Comparison (2/2)


Liquid Warping GAN
Motion Representations Articulated Animation
Tune-A-Video
Follow-Your-Pose
ControlVideo
Masactrl
FateZero
MotionEditor (Ours)
Source Video Reference Video Edited Video