Overview of Graph Networks Lecture

Aug 2, 2024

Stanford S24W Course - Lecture on Graph Networks

Instructor Introduction

  • Instructor: Josan (guest instructor on behalf of Yuri)
  • Background: Former head TA of previous course offering
  • Topic: Graph Networks

Recap of Previous Lecture

  • Key Concept: Node embedding
    • Encode nodes into a low-dimensional vector space
    • Learn a function to embed graphs into low-dimensional space
  • Core Problem: Define the embedding function (f)

Node Embedding

  • Similarity Function: Represents how nodes are close in the network
  • Encoder Function: Maps nodes in the graph to embedding space
  • Components:
    • Encoder: Maps nodes to embeddings
    • Decoder: Defines similarity function

Similarity Functions

  • Random Walk-Based Similarity: Nodes co-occurring in short random walks are similar
  • Shallow Encoder: Uses an embedding matrix with columns representing nodes
    • **Limitations: **
    • Scalability: Large parameter size, not suitable for large graphs
    • Transductive: Only applies to seen nodes during training
    • Attribute Ignorance: Doesn't incorporate valuable node attributes

Deep Graph Encoders

  • Improvement Over Shallow Encoders: More expressive and powerful
  • Deep Graph Encoder: Multiple layers of neural network transformations
  • Output Types: Node embeddings, subgraph embeddings, entire graph embeddings

Machine Learning Tasks on Graphs

  • Node-Level: Node classification (e.g., drug toxicity prediction)
  • Link Prediction: Recommender systems (e.g., user-item interactions)
  • Community Detection: Anomaly detection in financial networks
  • Network Similarity: Graph-level tasks (e.g., drug molecule classification)

Basics of Deep Learning

  • Optimization Problem: Minimize loss function with input (X) to predict label (Y)
  • Input Data Types: Vectors, sequences, matrices, graphs
  • Loss Functions: L2 loss for regression, cross-entropy loss for classification

Multi-Layer Perceptron (MLP)

  • Building Block: Linear transformation + bias + non-linear activation (ReLU, sigmoid)
  • Example: Two-dimensional input, three-dimensional hidden layer, one-dimensional output

Deep Learning for Graphs

Plan and Summary

  1. Local Network Structure: Describe the aggregation function for local neighborhood information
  2. Computational Graph: Specify computational architecture with multiple layers
  3. Training Model: Supervised and unsupervised learning examples

Graph Learning Method Properties

  • Permutation Invariance: Graph does not have canonical ordering, output should be identical for any order
  • Permutation Equivariance: Node representations should remain the same, order changes align with node labeling
  • Aggregation: Combine node and its neighborhood embeddings through shared neural weights and permutation-invariant functions

Graph Convolutional Networks (GCN)

  • **Process: **
    • Define computational graph based on local neighborhood structure
    • Propagate and transform information iteratively
  • Example: Unroll computational graph iteratively for node A, considering neighbors and neighbors of neighbors
  • Number of Layers: Corresponds to how far to look in the network (e.g., 2-layer GCN for diameter 2 graphs)
  • Neighborhood Aggregation: Average neighbors' embeddings, apply neural transformations

Mathematical Representation of GCN

  • Input: Node attributes (X_v)
  • Output: Node embedding (D_v)
  • Layer-wise Transformation: Combine node's own embedding and averaged neighbors' embeddings
  • Non-linearity: Apply activation functions like ReLU
  • Permutation Properties: Ensured through shared weights and aggregation functions

Efficient Training

  • Matrix Form: Rewrite local computations to global sparse matrix multiplications (e.g., D^-1 * A * H)
  • Parallel Computing: Utilize GPUs for efficient sparse matrix operations

Training Paradigms

Supervised Learning

  • Example: Drug toxicity classification using node embeddings and cross-entropy loss

Unsupervised Learning

  • Objective: Use graph structure for self-supervision
  • Example: Random walk-based similarity, matrix factorization, node proximity

Inductive Learning

  • Capability: Use trained network on unseen nodes or new graphs
  • Use Cases: Predicting interactions in new protein interaction networks, new nodes in online platforms like YouTube

GCN vs. CNN

Similarities and Differences

  • Neighbor Aggregation: GCN shares weights equally among neighbors, CNN uses predefined filters for structured data
  • Permutation Invariance: GN handles arbitrary graph structures, CNN relies on pixel position in images
  • Parameter Sharing: GCN shares parameters for all neighbors, CNN has unique weights for each pixel position

Conclusion

  • Summary: Basics of neural networks, motivation for graph deep learning, properties and architecture of GCN, training methods, comparison with CNN