Coconote
AI notes
AI voice & video notes
Export note
Try for free
Overview of Graph Networks Lecture
Aug 2, 2024
Stanford S24W Course - Lecture on Graph Networks
Instructor Introduction
Instructor:
Josan (guest instructor on behalf of Yuri)
Background:
Former head TA of previous course offering
Topic:
Graph Networks
Recap of Previous Lecture
Key Concept:
Node embedding
Encode nodes into a low-dimensional vector space
Learn a function to embed graphs into low-dimensional space
Core Problem:
Define the embedding function (f)
Node Embedding
Similarity Function:
Represents how nodes are close in the network
Encoder Function:
Maps nodes in the graph to embedding space
Components:
Encoder: Maps nodes to embeddings
Decoder: Defines similarity function
Similarity Functions
Random Walk-Based Similarity:
Nodes co-occurring in short random walks are similar
Shallow Encoder:
Uses an embedding matrix with columns representing nodes
**Limitations: **
Scalability:
Large parameter size, not suitable for large graphs
Transductive:
Only applies to seen nodes during training
Attribute Ignorance:
Doesn't incorporate valuable node attributes
Deep Graph Encoders
Improvement Over Shallow Encoders:
More expressive and powerful
Deep Graph Encoder:
Multiple layers of neural network transformations
Output Types:
Node embeddings, subgraph embeddings, entire graph embeddings
Machine Learning Tasks on Graphs
Node-Level:
Node classification (e.g., drug toxicity prediction)
Link Prediction:
Recommender systems (e.g., user-item interactions)
Community Detection:
Anomaly detection in financial networks
Network Similarity:
Graph-level tasks (e.g., drug molecule classification)
Basics of Deep Learning
Optimization Problem:
Minimize loss function with input (X) to predict label (Y)
Input Data Types:
Vectors, sequences, matrices, graphs
Loss Functions:
L2 loss for regression, cross-entropy loss for classification
Multi-Layer Perceptron (MLP)
Building Block:
Linear transformation + bias + non-linear activation (ReLU, sigmoid)
Example:
Two-dimensional input, three-dimensional hidden layer, one-dimensional output
Deep Learning for Graphs
Plan and Summary
Local Network Structure:
Describe the aggregation function for local neighborhood information
Computational Graph:
Specify computational architecture with multiple layers
Training Model:
Supervised and unsupervised learning examples
Graph Learning Method Properties
Permutation Invariance:
Graph does not have canonical ordering, output should be identical for any order
Permutation Equivariance:
Node representations should remain the same, order changes align with node labeling
Aggregation:
Combine node and its neighborhood embeddings through shared neural weights and permutation-invariant functions
Graph Convolutional Networks (GCN)
**Process: **
Define computational graph based on local neighborhood structure
Propagate and transform information iteratively
Example:
Unroll computational graph iteratively for node A, considering neighbors and neighbors of neighbors
Number of Layers:
Corresponds to how far to look in the network (e.g., 2-layer GCN for diameter 2 graphs)
Neighborhood Aggregation:
Average neighbors' embeddings, apply neural transformations
Mathematical Representation of GCN
Input:
Node attributes (X_v)
Output:
Node embedding (D_v)
Layer-wise Transformation:
Combine node's own embedding and averaged neighbors' embeddings
Non-linearity:
Apply activation functions like ReLU
Permutation Properties:
Ensured through shared weights and aggregation functions
Efficient Training
Matrix Form:
Rewrite local computations to global sparse matrix multiplications (e.g., D^-1 * A * H)
Parallel Computing:
Utilize GPUs for efficient sparse matrix operations
Training Paradigms
Supervised Learning
Example:
Drug toxicity classification using node embeddings and cross-entropy loss
Unsupervised Learning
Objective:
Use graph structure for self-supervision
Example:
Random walk-based similarity, matrix factorization, node proximity
Inductive Learning
Capability:
Use trained network on unseen nodes or new graphs
Use Cases:
Predicting interactions in new protein interaction networks, new nodes in online platforms like YouTube
GCN vs. CNN
Similarities and Differences
Neighbor Aggregation:
GCN shares weights equally among neighbors, CNN uses predefined filters for structured data
Permutation Invariance:
GN handles arbitrary graph structures, CNN relies on pixel position in images
Parameter Sharing:
GCN shares parameters for all neighbors, CNN has unique weights for each pixel position
Conclusion
Summary:
Basics of neural networks, motivation for graph deep learning, properties and architecture of GCN, training methods, comparison with CNN
📄
Full transcript