Question 1
In what year did Google introduce their original transformer?
Question 2
In embeddings, what does the temperature parameter affect?
Question 3
What does the embedding matrix (We) contain?
Question 4
Which of the following is not an application of transformers?
Question 5
What is the purpose of the softmax function in transformers?
Question 6
What operation is done in parallel in feed-forward layers?
Question 7
What does the 'P' in GPT stand for?
Question 8
What allows transformers to handle various NLP tasks after initial training?
Question 9
Which model size is mentioned as a reference for the scalability of transformers?
Question 10
What is the main function of the Attention Mechanism in a transformer?
Question 11
Which mechanism updates vectors with contextual meanings?
Question 12
What process is repeated to generate text in the Text Prediction process?
Question 13
What operation is critical for predicting the next word in transformers?
Question 14
What do dot products in transformers measure?
Question 15
How is position in high-dimensional space used in word embeddings?