VeloEdit

Abstract

Instruction-based image editing aims to modify source content according to textual instructions. However, existing methods built upon flow matching often struggle to maintain consistency in non-edited regions due to denoising-induced reconstruction errors that cause drift in preserved content. Moreover, they typically lack fine-grained control over edit strength. To address these limitations, we propose VeloEdit, a training-free method that enables highly consistent and continuously controllable editing. VeloEdit dynamically identifies editing regions by quantifying the discrepancy between the velocity fields responsible for preserving source content and those driving the desired edits. Based on this partition, we enforce consistency in preservation regions by substituting the editing velocity with the source-restoring velocity, while enabling continuous modulation of edit intensity in target regions via velocity interpolation. Unlike prior works that rely on complex attention manipulation or auxiliary trainable modules, VeloEdit operates directly on the velocity fields. Extensive experiments on Flux.1 Kontext and Qwen-Image-Edit demonstrate that VeloEdit improves visual consistency and editing continuity with negligible additional computational cost.

Motivation

We extend instruction-driven image editing models to provide continuous control over edit strength while maintaining visual consistency in non-edited regions. Unlike existing methods that require training or complex attention manipulation, VeloEdit operates directly on the velocity field of diffusion models, enabling a simple yet effective approach to controllable editing. By analyzing velocity field similarity, we automatically identify which regions should be preserved and which should be edited, then smoothly interpolate between source and editing velocities to achieve fine-grained control.

Method

VeloEdit operates by decomposing the velocity field into preservation and editing components based on similarity analysis. The key insight is that regions with high velocity similarity between source and edited images should remain unchanged, while regions with low similarity require editing control. Our method consists of three main steps: (1) Velocity Field Decomposition - we compute cosine similarity between source and editing velocities at each spatial location; (2) Consistency Preservation - in high-similarity regions, we replace editing velocity with source velocity; (3) Continuous Intensity Control - in low-similarity regions, we interpolate between velocities using a strength parameter α ∈ [0, 1]. This approach requires no training and can be applied to any diffusion-based editing model, providing both consistency and continuous control in a unified framework.

VeloEdit: Training-Free Consistent and Continuous Instruction-Based Image Editing via Velocity Field Decomposition

TLDR

Abstract

Strength Controlled Image Editing

Motivation

Method

Results