22. Flow-Matching

Flow Matching

Flow Matching¹ is a generative framework that trains a velocity field \(v_\theta\) to transform a simple distribution (Gaussian noise \(p_0\)) into a complex data distribution (\(p_1\)) along a continuous trajectory in time.

Compared to diffusion models (DDPM), Flow Matching has straighter trajectories, requiring fewer integration steps at inference, while being simpler to train.

Intuition: Moving Particles

Imagine you have particles scattered according to a Gaussian \(p_0 = \mathcal{N}(0, I)\). You want to move them so that, by time \(t=1\), they are distributed like your training data \(p_1\).

Flow Matching learns a vector field \(v_\theta(x, t)\) that "pushes" each particle in the right direction at every instant \(t \in [0, 1]\).

t = 0.00

Mathematical Formulation

The goal is to learn a velocity field \(v_\theta : \mathbb{R}^d \times [0,1] \to \mathbb{R}^d\) such that integrating the ODE:

\[ \frac{dx}{dt} = v_\theta(x, t), \quad x_0 \sim p_0 \]

produces \(x_1 \sim p_1\).

Conditional Flow Matching (CFM)

Given a conditional path \(x_t = (1-t)x_0 + t x_1\) (linear interpolation — "Optimal Transport path"), the conditional velocity field is simply:

\[ u_t(x \mid x_0, x_1) = x_1 - x_0 \]

The CFM loss:

\[ \mathcal{L}_{\text{CFM}} = \mathbb{E}_{t, p(x_0), p(x_1)} \left[ \left\| v_\theta(x_t, t) - (x_1 - x_0) \right\|^2 \right] \]

where \(x_t = (1-t)x_0 + t x_1\). The loss is simply the MSE between the predicted field and the linear interpolation direction — no complex noise schedule.

Flow Matching vs. Diffusion

Aspect	DDPM	Flow Matching
Trajectory	Curved (incremental noise)	Straight (OT path)
Inference steps	50–1000	10–50
Loss	Noise prediction \(\epsilon\)	Velocity prediction \(v\)
Schedule	Complex \(\beta_t\)	Uniform \(t \in [0,1]\)
Inference speed	Slower	2–10× faster

FLUX.1 — State of the Art (2024)

FLUX.1² (Black Forest Labs) uses Flow Matching with Diffusion Transformers (DiT) — replacing the U-Net with pure Transformer blocks:

flowchart LR
    A["Text\n(prompt)"] --> B["Text Encoder\n(CLIP + T5-XXL)"]
    N["Noise\nz₀ ~ N(0,I)"] --> C
    B --> C["Diffusion Transformer\n12B params\n(Flow Matching)"]
    C -->|"ODE: 20-50 steps"| D["Latent z₁"]
    D --> E["VAE Decoder"]
    E --> F["Image 1024x1024"]

12B parameters (FLUX.1-dev, open-source)
Supports multiple aspect ratios natively
Superior quality to SDXL and SD3 on benchmarks

Inference: Solving the ODE

import torch

def sample_flow_matching(model, n_samples, n_steps=50, device='cuda'):
    dt = 1.0 / n_steps
    x = torch.randn(n_samples, *data_shape, device=device)  # z0 ~ N(0,I)

    for i in range(n_steps):
        t = torch.full((n_samples,), i * dt, device=device)
        v = model(x, t)          # predicted velocity field
        x = x + dt * v           # simple Euler integration

    return x  # z1 ~ p_data

# With Heun solver (2nd order, better quality):
def sample_heun(model, n_samples, n_steps=20, device='cuda'):
    dt = 1.0 / n_steps
    x = torch.randn(n_samples, *data_shape, device=device)
    for i in range(n_steps):
        t = torch.full((n_samples,), i*dt, device=device)
        v1 = model(x, t)
        x_pred = x + dt * v1
        t2 = torch.full((n_samples,), (i+1)*dt, device=device)
        v2 = model(x_pred, t2)
        x = x + dt * (v1 + v2) / 2  # Heun average
    return x

Lipman, Y. et al. (2022). Flow Matching for Generative Modeling. ↩
Black Forest Labs. (2024). FLUX.1: State-of-the-art text-to-image generation. ↩
Liu, X. et al. (2022). Flow Straight and Fast: Rectified Flow. ↩