Understanding AI Generative Models

Jul 10, 2024

Understanding AI Generative Models

Introduction

  • AI image generators can create images from text descriptions.
  • Generative AI models can produce text, audio, code, and videos.
  • All based on deep neural networks.

Neural Networks & Prediction Tasks

  • Neural nets are trained to predict labels based on input data.
  • Example: Predicting objects in images from labeled training data.
  • Prediction involves curve fitting: mapping inputs to outputs.

Generative Models

  • Generative models also function as predictors.
  • Producing novel art can be seen as curve fitting.

Training a Generative Model

  • Use images as labels to train a predictor without input-output mapping.
  • Direct labeling with images leads to blurred outputs due to averaging.

Completing Images

  • Train a model to predict the value of missing parts of an image.
  • Start with one missing pixel, expanding to multiple missing pixels.
  • Models predict one pixel at a time to avoid blurring.
  • Use random sampling from probability distributions to avoid identical outputs.

Auto-Regressive Models

  • Generate images by removing and predicting one pixel at a time.
  • Oldest generative models, conceptually including Chat-GPT.
  • Inefficient for large images due to many evaluations.

Improving Efficiency

  • Remove/Generate larger patches of pixels to speed up, but maintain quality balance.
  • Postulate removing statistically independent pixels for efficiency.
  • Diffusion models are more efficient, working by adding noise to spread information removal.

Diffusion Models

  • Initial state: Adding noise instead of removing pixels.
  • Generate images in fewer steps by spreading out information removal.
  • Use noisy images as starting points to avoid confluence of values.

Implementation Details

  • Use a single neural net for efficiency, trained on random steps.
  • Causal architectures for faster, albeit slightly less accurate, training.
  • Predict the original clean image indirectly by predicting noise to maintain robustness.

Conditioning on Text

  • Input images and text prompts together to generate conditioned images.
  • Trained on image-text pairs, sourced from internet data.

Classifier-Free Guidance

  • Method to improve generation accuracy based on conditional inputs.
  • Model trained with and without conditioning input.
  • Enhances detail fidelity to follow prompts better.

Conclusion

  • Generative AI is fundamentally about curve fitting.
  • Interactive audience request for future topics.

Thanks for attending!