Understanding AI Generative Models

Jul 10, 2024

Understanding AI Generative Models

Introduction

AI image generators can create images from text descriptions.
Generative AI models can produce text, audio, code, and videos.
All based on deep neural networks.

Neural Networks & Prediction Tasks

Neural nets are trained to predict labels based on input data.
Example: Predicting objects in images from labeled training data.
Prediction involves curve fitting: mapping inputs to outputs.

Generative Models

Generative models also function as predictors.
Producing novel art can be seen as curve fitting.

Training a Generative Model

Use images as labels to train a predictor without input-output mapping.
Direct labeling with images leads to blurred outputs due to averaging.

Completing Images

Train a model to predict the value of missing parts of an image.
Start with one missing pixel, expanding to multiple missing pixels.
Models predict one pixel at a time to avoid blurring.
Use random sampling from probability distributions to avoid identical outputs.

Auto-Regressive Models

Generate images by removing and predicting one pixel at a time.
Oldest generative models, conceptually including Chat-GPT.
Inefficient for large images due to many evaluations.

Improving Efficiency

Remove/Generate larger patches of pixels to speed up, but maintain quality balance.
Postulate removing statistically independent pixels for efficiency.
Diffusion models are more efficient, working by adding noise to spread information removal.

Diffusion Models

Initial state: Adding noise instead of removing pixels.
Generate images in fewer steps by spreading out information removal.
Use noisy images as starting points to avoid confluence of values.

Implementation Details

Use a single neural net for efficiency, trained on random steps.
Causal architectures for faster, albeit slightly less accurate, training.
Predict the original clean image indirectly by predicting noise to maintain robustness.

Conditioning on Text

Input images and text prompts together to generate conditioned images.
Trained on image-text pairs, sourced from internet data.

Classifier-Free Guidance

Method to improve generation accuracy based on conditional inputs.
Model trained with and without conditioning input.
Enhances detail fidelity to follow prompts better.

Conclusion

Generative AI is fundamentally about curve fitting.
Interactive audience request for future topics.

Thanks for attending!

Full transcript