Stanford CS224N / LING 284: Natural Language Processing with Deep Learning

Instructor

Course Introduction: Overview of course structure and primary learning objectives.
Human Language and Word Meaning:
- Discuss how deep learning represents word meanings.
- Introduction to Word2Vec algorithm.
Deep Learning for NLP:
- Optimizing objective function gradients in Word2Vec.
Practical Application:
- Demonstration of word vector usage.

Foundations of Deep Learning in NLP:
- Basics to advanced methods in NLP (e.g., recurrent networks, attention, transformers).
Big Picture Understanding of Human Language:
- Complexity of understanding and producing human languages.
Building Systems with PyTorch:
- Practical implementation for major NLP tasks (word meanings, dependency parsing, machine translation, question answering).

Language as a Social System: Language changes as people adapt its construction.
Complexity in Language: Language isn't a formal system; it involves social constructs and interpretations.
Recent Development in Human History: Language, relatively recent in human evolution, has been a powerful tool for communication.

Machine Translation: Works moderately well now, allowing information retrieval across languages.
GPT-3: A large model by OpenAI, promising universal models trained on large corpora.

Traditional NLP: Used resources like WordNet for synonyms and hypernyms.
Limitations: Insufficient for nuanced meanings and real-time language evolution.
Word2Vec and Deep Learning: Encodes similarity in real-valued vectors.

Concept: Meaning derived from the words that frequently appear close to a target word.
Implementation: Use large text corpora to learn word vectors that predict surrounding context words.

Objective: Predict context words from center words in a text corpus.
Steps:
- Iterate through words in the text.
- Calculate the probability of context words given the center word.
- Adjust word vectors to maximize probability of actual context words.

Objective Function: Maximize log likelihood for context prediction.
Softmax Function: Converts dot products of vectors into probabilities.
Gradient Calculation: Derivatives guide vector adjustments for minimizing loss.

Using Gensim Package: Loading and manipulating word vectors.
Word Similarity: Examples of most similar words to given terms (e.g., 'croissant', 'USA').
Analogies Task: Demonstrates vector arithmetic in word embeddings (e.g., 'king' - 'man' + 'woman' ≈ 'queen').
Effectiveness: Word vectors capturing contextual meanings.

Two Vectors Per Word: Each word has a center and a context vector; typically averaged in practice.
Polysemy: Single vectors can represent words with multiple meanings reasonably well, though sometimes context-specific vectors are better.
Function Words: Common words like 'and', 'not' are harder to contextualize but are learned through advanced language models.
Optimizations: Skip-gram model with negative sampling for efficient learning.

Focus on Text Analysis: This course emphasizes text rather than speech, although a separate speech-based course (CS224S) is available.