DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics

*Equal contribution.
1Tsinghua University, 2Shengshu Technology, 3Pazhou Lab (Huangpu)
NeurIPS 2023 (Poster)

Abstract

Diffusion probabilistic models (DPMs) have exhibited excellent performance for high-fidelity image generation while suffering from inefficient sampling. Recent works accelerate the sampling procedure by proposing fast ODE solvers that leverage the specific ODE form of DPMs. However, they highly rely on specific parameterization during inference (such as noise/data prediction), which might not be the optimal choice. In this work, we propose a novel formulation towards the optimal parameterization during sampling that minimizes the first-order discretization error of the ODE solution. Based on such formulation, we propose DPM-Solver-v3, a new fast ODE solver for DPMs by introducing several coefficients efficiently computed on the pretrained model, which we call empirical model statistics. We further incorporate multistep methods and a predictor-corrector framework, and propose some techniques for improving sample quality at small numbers of function evaluations (NFE) or large guidance scales. Experiments show that DPM-Solver-v3 achieves consistently better or comparable performance in both unconditional and conditional sampling with both pixel-space and latent-space DPMs, especially in 5~10 NFEs. We achieve FIDs of 12.21 (5 NFE), 2.51 (10 NFE) on unconditional CIFAR10, and MSE of 0.55 (5 NFE, 7.5 guidance scale) on Stable Diffusion, bringing a speed-up of 15%~30% compared to previous state-of-the-art training-free methods.


For pretrained diffusion models, our sampler can generate images with richer and less biased color, higher saturation level and more visual details in few steps.

Stable-Diffusion, 5 steps

(s=7.5, on popular prompts)

DPM-Solver++

(MSE 0.60)

UniPC

(MSE 0.65)

DPM-Solver-v3

(MSE 0.55)

pixar movie still portrait photo of madison beer, jessica alba : : woman : : as hero catgirl cyborg woman by pixar : : by greg rutkowski, wlop, rossdraws, artgerm, weta, marvel, rave girl, leeloo, unreal engine, glossy skin, pearlescent, wet, bright morning, anime, sci - fi, maxim magazine cover, : :

oil painting with heavy impasto of a pirate ship and its captain, cosmic horror painting, elegant intricate artstation concept art by craig mullins detailed

environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light

the living room of a cozy wooden house with a fireplace, at night, interior design, d & d concept art, d & d wallpaper, warm, digital art. art by james gurney and larry elmore.

Full page concept design how to craft life Poison, intricate details,infographic of alchemical, diagram of how to make potions, captions, directions, ingredients, drawing , magic,wuxia

Fantasy art, octane render, 16k, 8k, cinema 4d, back-lit, caustics, clean environment, Wood pavilion architecture, warm led lighting, dusk, Landscape, snow, arctic, with aqua water, silver Guggenheim museum spire, with rays of sunshine, white fabric landscape, tall building, zaha hadid and Santiago calatrava, smooth landscape, cracked ice, igloo, warm lighting, aurora borialis,3d cgi, high definition, natural lighting, realistic, hyper realism

tree house in the forest, atmospheric, hyper realistic, epic composition, cinematic, landscape vista photography by Carr Clifton & Galen Rowell, 16K resolution, Landscape veduta photo by Dustin Lefevre & tdraw, detailed landscape painting by Ivan Shishkin, DeviantArt, Flickr, rendered in Enscape, Miyazaki, Nausicaa Ghibli, Breath of The Wild, 4k detailed post processing, artstation, unreal engine

A trail through the unknown, atmospheric, hyper realistic, 8k, epic composition, cinematic, octane render, artstation landscape vista photography by Carr Clifton & Galen Rowell, 16K resolution, Landscape veduta photo by Dustin Lefevre & tdraw, 8k resolution, detailed landscape painting by Ivan Shishkin, DeviantArt, Flickr, rendered in Enscape, Miyazaki, Nausicaa Ghibli, Breath of The Wild, 4k detailed post processing, artstation, rendering by octane, unreal engine

postapocalyptic city turned to fractal glass, ctane render, 8 k, exploration, cinematic, trending on artstation, by beeple, realistic, 3 5 mm camera, unreal engine, hyper detailed, photo – realistic maximum detai, volumetric light, moody cinematic epic concept art, realistic matte painting, hyper photorealistic, concept art, volumetric light, cinematic epic, octane render, 8 k, corona render, movie concept art, octane render, 8 k, corona render, cinematic, trending on artstation, movie concept art, cinematic composition, ultra – detailed, realistic, hyper – realistic, volumetric lighting, 8 k

“WORLDS”: zoological fantasy ecosystem infographics, magazine layout with typography, annotations, in the style of Elena Masci, Studio Ghibli, Caspar David Friedrich, Daniel Merriam, Doug Chiang, Ivan Aivazovsky, Herbert Bauer, Edward Tufte, David McCandless

Method

Our main theoretical contributions are two-fold:

Our new parameterization and ODE solution.

(1) We are the first to systematically study the parameterizations (e.g., noise/data prediction) in the sampling of diffusion models. We propose a novel parameterization form towards the optimal one in a wide range of parameterization families, inspired by Rosenbrock-type exponential integrators, as well as to minimize the first-order discretization error. The framework can unify previous DPM-Solver and DPM-Solver++, and at the same time explain the superiority of DPM-Solver++ over DPM-Solver.

Multistep predictor-corrector solver.

(2) Based on our proposed parameterization and corresponding exact solution of diffusion ODEs, we develop a high-order solver with the multistep predictor-corrector algorithm, which has the guarantee of both local accuracy and global convergence order.

Comparison with Other Methods

Comparison with other methods.

Quantitative Results

DPM-Solver-v3 achieves SOTA performance among fast training-free samplers under various settings.

(Pixel-Space/Latent-Space DPMs)

(Unconditional/Conditional Generation)

CIFAR10
(ScoreSDE, Pixel DPM)

CIFAR10
(EDM, Pixel DPM)

LSUN-Bedroom
(Latent-Diffusion, Latent DPM)

ImageNet-256
(Guided-Diffusion, Pixel DPM)
(Classifier Guidance, s = 2.0)

MS-COCO2014
(Stable-Diffusion, Latent DPM)
(Classifier-Free Guidance, s = 1.5)

MS-COCO2014
(Stable-Diffusion, Latent DPM)
(Classifier-Free Guidance, s = 7.5)

More Generation Results

ScoreSDE, 5/10 steps

(on CIFAR10)

DPM-Solver++

UniPC

DPM-Solver-v3

FID 28.53

FID 23.71

FID 12.76

FID 4.01

FID 3.93

FID 3.40

EDM, 5/10 steps

(on CIFAR10)

DPM-Solver++

UniPC

DPM-Solver-v3

FID 24.54

FID 23.52

FID 12.21

FID 2.91

FID 2.85

FID 2.51

Latent-Diffusion, 5 steps

(on LSUN-Bedroom)

DPM-Solver++

UniPC

DPM-Solver-v3

FID 18.59

FID 12.24

FID 7.54

Guided-Diffusion, 7 steps

(s=2.0, on ImageNet-256)

DPM-Solver++

(FID 11.02)

UniPC

(FID 10.19)

DPM-Solver-v3

(FID 9.70)

BibTeX

@inproceedings{zheng2023dpm,
  title={DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics},
  author={Zheng, Kaiwen and Lu, Cheng and Chen, Jianfei and Zhu, Jun},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}