Question 1
What is meant by 'use sum of squared residuals as the loss function'?
Question 2
Which function does Gradient Descent optimize in linear regression?
Question 3
What does the derivative of the loss function with respect to the intercept help determine?
Question 4
Which of the following is NOT a step in Gradient Descent as presented?
Question 5
For which types of loss functions can Gradient Descent be applied?
Question 6
What type of Gradient Descent uses subsets of data to reduce computation?
Question 7
Why is it important to use different step sizes at different stages of Gradient Descent?
Question 8
What is the goal when fitting a line to the example dataset using Gradient Descent?
Question 9
In the example dataset, what are the variables on the X and Y axes?
Question 10
Which step size adjustment strategy is used in Gradient Descent?
Question 11
What criteria signal the stopping point in Gradient Descent?
Question 12
How does Gradient Descent generalize to more than two parameters?
Question 13
What does the initial setup in the example dataset use as the estimate for the slope?
Question 14
What key topic did Josh Starmer of StatQuest present on?
Question 15
In the context of Gradient Descent, what does the loss function represent?