21. Flow-Matching
Flow Matching
Flow Matching1 is a generative framework that trains a velocity field \(v_\theta\) to transform a simple distribution (Gaussian noise \(p_0\)) into a complex data distribution (\(p_1\)) along a continuous trajectory in time.
Compared to diffusion models (DDPM), Flow Matching has straighter trajectories, requiring fewer integration steps at inference, while being simpler to train.
Intuition: Moving Particles
Imagine you have particles scattered according to a Gaussian \(p_0 = \mathcal{N}(0, I)\). You want to move them so that, by time \(t=1\), they are distributed like your training data \(p_1\).
Flow Matching learns a vector field \(v_\theta(x, t)\) that "pushes" each particle in the right direction at every instant \(t \in [0, 1]\).
Mathematical Formulation
The goal is to learn a velocity field \(v_\theta : \mathbb{R}^d \times [0,1] \to \mathbb{R}^d\) such that integrating the ODE:
produces \(x_1 \sim p_1\).
Conditional Flow Matching (CFM)
Given a conditional path \(x_t = (1-t)x_0 + t x_1\) (linear interpolation — "Optimal Transport path"), the conditional velocity field is simply:
The CFM loss:
where \(x_t = (1-t)x_0 + t x_1\). The loss is simply the MSE between the predicted field and the linear interpolation direction — no complex noise schedule.
Flow Matching vs. Diffusion
| Aspect | DDPM | Flow Matching |
|---|---|---|
| Trajectory | Curved (incremental noise) | Straight (OT path) |
| Inference steps | 50–1000 | 10–50 |
| Loss | Noise prediction \(\epsilon\) | Velocity prediction \(v\) |
| Schedule | Complex \(\beta_t\) | Uniform \(t \in [0,1]\) |
| Inference speed | Slower | 2–10× faster |
FLUX.1 — State of the Art (2024)
FLUX.12 (Black Forest Labs) uses Flow Matching with Diffusion Transformers (DiT) — replacing the U-Net with pure Transformer blocks:
flowchart LR
A["Text\n(prompt)"] --> B["Text Encoder\n(CLIP + T5-XXL)"]
N["Noise\nz₀ ~ N(0,I)"] --> C
B --> C["Diffusion Transformer\n12B params\n(Flow Matching)"]
C -->|"ODE: 20-50 steps"| D["Latent z₁"]
D --> E["VAE Decoder"]
E --> F["Image 1024x1024"] - 12B parameters (FLUX.1-dev, open-source)
- Supports multiple aspect ratios natively
- Superior quality to SDXL and SD3 on benchmarks
Inference: Solving the ODE
import torch
def sample_flow_matching(model, n_samples, n_steps=50, device='cuda'):
dt = 1.0 / n_steps
x = torch.randn(n_samples, *data_shape, device=device) # z0 ~ N(0,I)
for i in range(n_steps):
t = torch.full((n_samples,), i * dt, device=device)
v = model(x, t) # predicted velocity field
x = x + dt * v # simple Euler integration
return x # z1 ~ p_data
# With Heun solver (2nd order, better quality):
def sample_heun(model, n_samples, n_steps=20, device='cuda'):
dt = 1.0 / n_steps
x = torch.randn(n_samples, *data_shape, device=device)
for i in range(n_steps):
t = torch.full((n_samples,), i*dt, device=device)
v1 = model(x, t)
x_pred = x + dt * v1
t2 = torch.full((n_samples,), (i+1)*dt, device=device)
v2 = model(x_pred, t2)
x = x + dt * (v1 + v2) / 2 # Heun average
return x
-
Lipman, Y. et al. (2022). Flow Matching for Generative Modeling. ↩
-
Black Forest Labs. (2024). FLUX.1: State-of-the-art text-to-image generation. ↩
-
Liu, X. et al. (2022). Flow Straight and Fast: Rectified Flow. ↩