Generative Adversarial Networks - Part Two

Overview

Focus on deeper understanding and theoretical proof of GANs
Goal: Understand the key equation, algorithm for solving it, and proof for recovering the perfect generative model
Sign-up mentioned for early access and exclusive content on the presenter’s blog

Components: Noise vector (Z), Generator (G), Real Data (X), Discriminator (D)
Generator (G): Transforms noise (Z) into fake samples (G(Z))
Discriminator (D): Takes either real (X) or fake (G(Z)) samples and outputs a probability of the sample being real
- D(X): Probability that real sample (X) is real
- D(G(Z)): Probability that fake sample (G(Z)) is real
Labels: 1 for real samples, 0 for fake samples
Supervised Learning Transformation: Unsupervised learning problem converted into supervised by labeling samples

Two main terms:
1. Discriminator on Real Data: Expectation of the log of D(X)
2. Discriminator on Fake Data: Expectation of log(1 - D(G(Z)))
Discriminator Goals:
- Wants D(X) to be large for real samples
- Wants D(G(Z)) to be small for fake samples
Generator Goals:
- Maximize D(G(Z)) for fooling the discriminator
Adversarial Framework:
- Discriminator seeks to maximize the cost function
- Generator seeks to minimize the cost function

Discriminator Loop:
- Pull M noise samples → Generate M fake data samples
- Sample M real data samples
- Label real samples (1) and fake samples (0)
- Calculate loss with the labeled outputs
- Update discriminator's parameters to maximize cost function (take gradients and update)
Generator Loop:
- Pull M noise samples → Generate M fake samples
- Calculate reduced cost function (no need for real data)
- Update generator's parameters to minimize cost function (take gradients and update)

Objective: Prove that optimal generator matches real data distribution
Optimal Discriminator:
- Relation with probability distributions (produces 1/2 if distributions match)
- Maximum cost function achieved at minus log 4
- Use calculus and algebra to derive the optimal discriminator

Objective: Minimize the JS Divergence
JS Divergence as a distance measure between real and fake data distributions
Cost Function Rewritten:
- Involves JS Divergence term + constant (minus log 4)
- Minimum JS Divergence is 0 (when distributions match)
- When minimized, generator distribution matches real data distribution