Introduction to Natural Language Processing with Deep Learning

Jul 9, 2024

Stanford CS224N / LING 284: Natural Language Processing with Deep Learning

Instructor

  • Christopher Manning

Lecture Goals

  1. Course Introduction: Overview of course structure and primary learning objectives.
  2. Human Language and Word Meaning:
    • Discuss how deep learning represents word meanings.
    • Introduction to Word2Vec algorithm.
  3. Deep Learning for NLP:
    • Optimizing objective function gradients in Word2Vec.
  4. Practical Application:
    • Demonstration of word vector usage.

Key Topics

Course Objectives

  1. Foundations of Deep Learning in NLP:
    • Basics to advanced methods in NLP (e.g., recurrent networks, attention, transformers).
  2. Big Picture Understanding of Human Language:
    • Complexity of understanding and producing human languages.
  3. Building Systems with PyTorch:
    • Practical implementation for major NLP tasks (word meanings, dependency parsing, machine translation, question answering).

Human Language

  • Language as a Social System: Language changes as people adapt its construction.
  • Complexity in Language: Language isn't a formal system; it involves social constructs and interpretations.
  • Recent Development in Human History: Language, relatively recent in human evolution, has been a powerful tool for communication.

Machine Translation and NLP Progress

  • Machine Translation: Works moderately well now, allowing information retrieval across languages.
  • GPT-3: A large model by OpenAI, promising universal models trained on large corpora.

Word Meaning Representation

  • Traditional NLP: Used resources like WordNet for synonyms and hypernyms.
  • Limitations: Insufficient for nuanced meanings and real-time language evolution.
  • Word2Vec and Deep Learning: Encodes similarity in real-valued vectors.

Distributional Semantics

  • Concept: Meaning derived from the words that frequently appear close to a target word.
  • Implementation: Use large text corpora to learn word vectors that predict surrounding context words.

Word2Vec Algorithm

  • Objective: Predict context words from center words in a text corpus.
  • Steps:
    • Iterate through words in the text.
    • Calculate the probability of context words given the center word.
    • Adjust word vectors to maximize probability of actual context words.

Gradient Descent and Optimization

  • Objective Function: Maximize log likelihood for context prediction.
  • Softmax Function: Converts dot products of vectors into probabilities.
  • Gradient Calculation: Derivatives guide vector adjustments for minimizing loss.

Practical Demonstration

  • Using Gensim Package: Loading and manipulating word vectors.
  • Word Similarity: Examples of most similar words to given terms (e.g., 'croissant', 'USA').
  • Analogies Task: Demonstrates vector arithmetic in word embeddings (e.g., 'king' - 'man' + 'woman' ≈ 'queen').
  • Effectiveness: Word vectors capturing contextual meanings.

Practical Questions and Clarifications

  1. Two Vectors Per Word: Each word has a center and a context vector; typically averaged in practice.
  2. Polysemy: Single vectors can represent words with multiple meanings reasonably well, though sometimes context-specific vectors are better.
  3. Function Words: Common words like 'and', 'not' are harder to contextualize but are learned through advanced language models.
  4. Optimizations: Skip-gram model with negative sampling for efficient learning.

Course Context

  • Focus on Text Analysis: This course emphasizes text rather than speech, although a separate speech-based course (CS224S) is available.