Transformers and Attention Mechanisms Overview

Question 1

What does the term 'multi-headed attention' refer to?

Question 2

Which layer updates embeddings based on the context in transformer models?

Question 3

What is the purpose of the softmax function in the attention mechanism?

Question 4

What is the significance of embeddings in transformer models?

Question 5

What role does the attention mechanism play in understanding context?

Question 6

What kind of influence does the attention mechanism have on embeddings?

Question 7

What is the primary objective of a transformer model?

Question 8

Why is the parallelizability of the attention mechanism important?

Question 9

What differentiates self-attention from cross-attention?

Question 10

What method is used to prevent later tokens from influencing earlier tokens in self-attention?

Question 11

What was the main highlight of the 2017 paper 'Attention is All You Need'?

Question 12

What is the approximate parameter count for one attention head in a transformer?

Question 13

What happens during the masking process in the attention mechanism?

Question 14

In transformers, which matrix helps in creating value vectors?

Question 15

How is the query vector (q) formed during an attention head computation?

Quiz for:Transformers and Attention Mechanisms Overview