Coconote
AI notes
AI voice & video notes
Export note
Try for free
Implementing Denoising Diffusion Models
Sep 22, 2024
Implementation of Diffusion Models: DDPM
Overview
Focus on implementing Denoising Diffusion Probabilistic Model (DDPM)
Future videos will cover Stable Diffusion with text prompts
Training and sampling implementation for DDPM
Aim to implement architecture used in latest diffusion models
Diffusion Process
Forward Process
Create noisier versions of an image by adding Gaussian noise step-by-step
After many steps, results in a noise sample from a normal distribution
Transition function applied at every time step T
Beta is scheduled noise added at T-1 to get image at T
Alpha defined as:
( \alpha = 1 - \beta )
Cumulative products of Alphas allow jumping from original to noisy image
Reverse Process
Model learns reverse process distribution
Same functional form as the forward process
Model predicts mean and variance
Goal: Minimize KL Divergence between ground truth and predicted noise distributions
Fix variance to match target distribution
Minimize square of difference between predicted and original noise
Training Method
Sample image at time step T and a noise sample
Feed noisy version of the image to the model
Loss becomes Mean Squared Error (MSE) between original noise and model prediction
Implementation Steps
Create noise scheduler to handle forward and reverse processes
Utilize a linear noise schedule from 1e-4 to 0.02 over 1000 time steps
Noise Scheduler Functions
Forward Process
: Returns noisy image given an image, noise, and time step T
Reverse Process
: Given XT and noise prediction, returns XT-1 sample
Model Architecture
Use U-Net architecture
Input and output shapes must match; include time step information
Time Embedding Block
: Converts time steps into a tensor representation through embedding and linear layers
U-Net Structure
Encoder
: Downsampling blocks reduce size, increase channels
Mid Block
: Operates at the same spatial resolution
Decoder
: Upsampling blocks increase size, reduce channels
Skip connections between corresponding encoding and decoding layers
Down Block Implementation
Residual connection, self-attention, downsampling
Each residual block consists of normalization, activation, and convolutional layers
Mid Block Implementation
Similar structure to down block but includes layers of self-attention
Up Block Implementation
Same as down block but includes an upsampling layer
Coding the U-Net
Initialize parameters and create down, mid, and up blocks based on the image channels
Time embedding processed at input to get necessary representation
Training and Sampling
Dataset class handles loading and converting images to tensors
Training loop samples random noise, applies noise scheduler, and backpropagates loss
Sampling method creates random noise sample and iteratively calls reverse process
Configuration File
Contains dataset parameters, model parameters, and training parameters
Allows flexibility in model block configurations
Results
Trained on MNIST and a texture dataset (28x28 resized images)
MNIST shows faster results due to similar image characteristics
Texture dataset takes longer to converge but produces decent images by the end
Conclusion
Steps covered: Scheduler, U-Net implementation, training, and sampling code.
Encouraged to check previous videos for more detailed information on diffusion models.
If you found this helpful, consider subscribing for more content!
📄
Full transcript