Back to notes
What is the significance of training context size in transformers like GPT-3?
Press to flip
It determines the fixed number of vectors processed at a time, which is 2048 for GPT-3.
How do embedding matrices (We) work?
They provide predefined vectors for each token in the vocabulary, which are optimized during training.
What is the role of the 'Unembedding Matrix (WU)' in text prediction?
It processes context-rich vectors to predict the next tokens.
What is the purpose of temperature in text generation with GPT?
Temperature affects the randomness in text generation; higher values make it more random, while lower values make it more predictable.
How do transformers use attention blocks?
Attention blocks allow vectors to update their values based on the context provided by other relevant tokens.
What are tokens in the context of transformers?
Small pieces of input data, such as words or image patches.
What are embeddings in the context of transformers?
Lists of numbers (vectors) associated with tokens, encoding their meanings.
What makes attention mechanisms the heart of transformers?
They update vectors with contextual meanings, allowing transformers to contextualize input effectively.
What computational technique is crucial for next-word predictions in transformers?
Dot products, which measure how well vectors align.
How are high-dimensional spaces used in embeddings?
Positions in these spaces represent word meanings, with directions encoding semantics.
What is the key function of the Softmax function in the transformer?
To transform logits into a probability distribution.
In what year did Google introduce the original transformer for text translation?
2017
Define semantic directions in word embeddings with an example.
Semantic directions are relationships found through differences in embeddings, e.g., `king + woman - man ≈ queen`.
Describe the text prediction process in GPT:
1. Input tokenization, 2. Embedding tokens, 3. Attention mechanism, 4. Feed-forward layers, 5. Probability distribution, 6. Text generation.
What does the 'Generative' part in GPT stand for?
It indicates that the model is capable of generating new text.
Previous
Next