Coconote
AI notes
AI voice & video notes
Export note
Try for free
Understanding AI Generative Models
Jul 10, 2024
Understanding AI Generative Models
Introduction
AI image generators can create images from text descriptions.
Generative AI models can produce text, audio, code, and videos.
All based on deep neural networks.
Neural Networks & Prediction Tasks
Neural nets are trained to predict labels based on input data.
Example: Predicting objects in images from labeled training data.
Prediction involves curve fitting: mapping inputs to outputs.
Generative Models
Generative models also function as predictors.
Producing novel art can be seen as curve fitting.
Training a Generative Model
Use images as labels to train a predictor without input-output mapping.
Direct labeling with images leads to blurred outputs due to averaging.
Completing Images
Train a model to predict the value of missing parts of an image.
Start with one missing pixel, expanding to multiple missing pixels.
Models predict one pixel at a time to avoid blurring.
Use random sampling from probability distributions to avoid identical outputs.
Auto-Regressive Models
Generate images by removing and predicting one pixel at a time.
Oldest generative models, conceptually including Chat-GPT.
Inefficient for large images due to many evaluations.
Improving Efficiency
Remove/Generate larger patches of pixels to speed up, but maintain quality balance.
Postulate removing statistically independent pixels for efficiency.
Diffusion models are more efficient, working by adding noise to spread information removal.
Diffusion Models
Initial state: Adding noise instead of removing pixels.
Generate images in fewer steps by spreading out information removal.
Use noisy images as starting points to avoid confluence of values.
Implementation Details
Use a single neural net for efficiency, trained on random steps.
Causal architectures for faster, albeit slightly less accurate, training.
Predict the original clean image indirectly by predicting noise to maintain robustness.
Conditioning on Text
Input images and text prompts together to generate conditioned images.
Trained on image-text pairs, sourced from internet data.
Classifier-Free Guidance
Method to improve generation accuracy based on conditional inputs.
Model trained with and without conditioning input.
Enhances detail fidelity to follow prompts better.
Conclusion
Generative AI is fundamentally about curve fitting.
Interactive audience request for future topics.
Thanks for attending!
📄
Full transcript