Overview of Graph Networks Lecture

Aug 2, 2024

Stanford S24W Course - Lecture on Graph Networks

Instructor Introduction

Instructor: Josan (guest instructor on behalf of Yuri)
Background: Former head TA of previous course offering
Topic: Graph Networks

Recap of Previous Lecture

Key Concept: Node embedding
- Encode nodes into a low-dimensional vector space
- Learn a function to embed graphs into low-dimensional space
Core Problem: Define the embedding function (f)

Node Embedding

Similarity Function: Represents how nodes are close in the network
Encoder Function: Maps nodes in the graph to embedding space
Components:
- Encoder: Maps nodes to embeddings
- Decoder: Defines similarity function

Similarity Functions

Random Walk-Based Similarity: Nodes co-occurring in short random walks are similar
Shallow Encoder: Uses an embedding matrix with columns representing nodes
- **Limitations: **
- Scalability: Large parameter size, not suitable for large graphs
- Transductive: Only applies to seen nodes during training
- Attribute Ignorance: Doesn't incorporate valuable node attributes

Deep Graph Encoders

Improvement Over Shallow Encoders: More expressive and powerful
Deep Graph Encoder: Multiple layers of neural network transformations
Output Types: Node embeddings, subgraph embeddings, entire graph embeddings

Machine Learning Tasks on Graphs

Node-Level: Node classification (e.g., drug toxicity prediction)
Link Prediction: Recommender systems (e.g., user-item interactions)
Community Detection: Anomaly detection in financial networks
Network Similarity: Graph-level tasks (e.g., drug molecule classification)

Basics of Deep Learning

Optimization Problem: Minimize loss function with input (X) to predict label (Y)
Input Data Types: Vectors, sequences, matrices, graphs
Loss Functions: L2 loss for regression, cross-entropy loss for classification

Multi-Layer Perceptron (MLP)

Building Block: Linear transformation + bias + non-linear activation (ReLU, sigmoid)
Example: Two-dimensional input, three-dimensional hidden layer, one-dimensional output

Deep Learning for Graphs

Plan and Summary

Local Network Structure: Describe the aggregation function for local neighborhood information
Computational Graph: Specify computational architecture with multiple layers
Training Model: Supervised and unsupervised learning examples

Graph Learning Method Properties

Permutation Invariance: Graph does not have canonical ordering, output should be identical for any order
Permutation Equivariance: Node representations should remain the same, order changes align with node labeling
Aggregation: Combine node and its neighborhood embeddings through shared neural weights and permutation-invariant functions

Graph Convolutional Networks (GCN)

**Process: **
- Define computational graph based on local neighborhood structure
- Propagate and transform information iteratively
Example: Unroll computational graph iteratively for node A, considering neighbors and neighbors of neighbors
Number of Layers: Corresponds to how far to look in the network (e.g., 2-layer GCN for diameter 2 graphs)
Neighborhood Aggregation: Average neighbors' embeddings, apply neural transformations

Mathematical Representation of GCN

Input: Node attributes (X_v)
Output: Node embedding (D_v)
Layer-wise Transformation: Combine node's own embedding and averaged neighbors' embeddings
Non-linearity: Apply activation functions like ReLU
Permutation Properties: Ensured through shared weights and aggregation functions

Efficient Training

Matrix Form: Rewrite local computations to global sparse matrix multiplications (e.g., D^-1 * A * H)
Parallel Computing: Utilize GPUs for efficient sparse matrix operations

Training Paradigms

Supervised Learning

Example: Drug toxicity classification using node embeddings and cross-entropy loss

Unsupervised Learning

Objective: Use graph structure for self-supervision
Example: Random walk-based similarity, matrix factorization, node proximity

Inductive Learning

Capability: Use trained network on unseen nodes or new graphs
Use Cases: Predicting interactions in new protein interaction networks, new nodes in online platforms like YouTube

GCN vs. CNN

Similarities and Differences

Neighbor Aggregation: GCN shares weights equally among neighbors, CNN uses predefined filters for structured data
Permutation Invariance: GN handles arbitrary graph structures, CNN relies on pixel position in images
Parameter Sharing: GCN shares parameters for all neighbors, CNN has unique weights for each pixel position

Conclusion

Summary: Basics of neural networks, motivation for graph deep learning, properties and architecture of GCN, training methods, comparison with CNN

Full transcript