Back to notes
What are the four steps in the neural network training process?
Press to flip
1. Sample a small batch from the dataset. 2. Forward propagate to get the loss. 3. Backpropagate to compute gradients. 4. Perform parameter update.
What inspired the development of Convolutional Neural Networks (CNNs)?
CNNs were inspired by studies of the visual cortex by Hubel and Wiesel, which identified layers of simple and complex cells.
What is a typical method to implement learning rate decay?
Common methods include step decay and exponential decay, which involve gradually reducing the learning rate over time.
What is the significance of Xavier initialization?
Xavier initialization provides a balanced start by scaling weights to prevent activations from exploding or vanishing.
What is the role of momentum in parameter updates?
Momentum helps accelerate learning in regions with small gradients and reduces oscillations in steeper regions.
Describe the primary drawback of AdaGrad.
AdaGrad can cause the learning rate to decay to zero, potentially halting further learning.
Explain the advantage of Nesterov momentum over standard momentum.
Nesterov momentum looks ahead to evaluate the gradient, providing faster convergence compared to standard momentum.
What is the purpose of dropout in neural networks?
Dropout prevents overfitting by randomly setting neurons to zero during training, encouraging feature redundancy.
How can model ensembles improve performance?
Model ensembles improve performance by averaging predictions from multiple models, reducing generalization error.
How does RMSProp improve over AdaGrad?
RMSProp prevents the learning rates from decaying to zero by using a moving average of squared gradients.
Why are activation functions critical in neural networks?
Without activation functions, the network acts as a linear classifier and cannot fit complex data.
Why is Adam often considered the best default choice for optimization?
Adam combines both momentum and RMSProp, providing effective and adaptive learning rates and generally leading to better convergence.
How is 'inverted dropout' different from standard dropout?
Inverted dropout scales the inputs during training, not at test time, ensuring consistency between training and testing phases.
How does batch normalization improve training?
Batch normalization mitigates issues with weight initialization and enhances training robustness by normalizing layer inputs.
List some applications of CNNs beyond traditional visual tasks.
CNNs are used in self-driving cars, facial recognition, medical imaging, and non-visual tasks such as speech and text processing.
Previous
Next