A Brief History of Diffusion in Image-Generating AI

Introduction

2022 saw significant advancements in text-to-image AI.
Systems like Stable Diffusion and OpenAI's DALL-E 2 have been integrated into platforms for creative and branding tools.
Diffusion technology is expanding beyond art generation into fields like music, DNA synthesis, and drug discovery.

Earlier apps like deepfakes used Generative Adversarial Networks (GANs), which had issues like training instability and high data needs.
GANs consist of two parts: a generator and a discriminator.
Successful GANs have been used in 3D modeling, video clips, speech, and music samples.
GANs' training issues led to the development of diffusion models.

Inspired by physics, specifically the process where substances move from higher to lower concentration.
Diffusion systems add noise to data, increasing randomness until only noise remains.
Unlike natural diffusion, machine learning diffusion systems aim to reverse the noise to reconstruct data.

OpenAI's CLIP (Contrastive Language-Image Pre-Training) improved diffusion systems by scoring image generation steps based on text prompts.
CLIP aids systems like DALL-E and Stable Diffusion by guiding image generation processes.

Art Generation: Can produce various types of artwork, though sometimes controversially replicating training data.
Music Composition: Projects like Harmonai and Riffusion use diffusion models for music generation.
Biomedicine: Research teams are using diffusion to design proteins and regulatory DNA sequences.
- Generate Biomedicines and University of Washington have created models for new protein designs.
- OpenBioML works on DNA-Diffusion for gene expression in specific cell types.

Potential for generating videos, compressing images, and synthesizing speech.
Although diffusion may be replaced eventually, its versatility makes it the current leading architecture.

Diffusion models represent a significant leap over previous technologies like GANs.
The development and application of diffusion models across various fields suggest its expansive potential.