Embedding Entire Graphs
Overview
- Focus: Embedding entire graphs (or sub-graphs) rather than individual nodes.
- Method: Anonymous random walks.
- Applications: Molecule classification, graph anomaly detection.
Simple Node Embedding Summation
- Idea: Run standard node embedding techniques (e.g., DeepWalk, node2vec) and sum/average node embeddings of the entire graph or sub-graph.
- Application: Used in 2016 for molecule classification—found effective despite simplicity.
Virtual Node Technique
- Improvement: Introduce a virtual node to represent the entire graph or sub-graph.
- Method:
- Create a virtual node connected to all targeted nodes.
- Run node embedding technique (e.g., DeepWalk, node2vec) on the graph.
- Use the virtual node's embedding as the graph's embedding.
Anonymous Walks
- Concept: States in anonymous walks correspond to indexes of the first visit time.
- Example:
- Random Walk 1: A -> B -> C -> B -> C
- Random Walk 2: C -> D -> B -> D -> B
- Both represented as: 1 -> 2 -> 3 -> 2 -> 3
- Characteristics: Agnostic to node identities, only considering the order of visits.
- Walk Generation: Number of anonymous walks increases exponentially with length.
Embedding Graphs with Anonymous Walks
- Method:
- Simulate anonymous walks of length L.
- Record their counts and represent the graph as a probability distribution over these walks.
- Dimensionality:
- Length 3: 5-dimensional vector representation.
- Increase length for higher dimensions.
- Accuracy: Use mathematical formula with parameters Epsilon (accuracy) and Delta (probability) to determine the number of walks needed for accurate distribution estimates.
Learning Embeddings for Anonymous Walks
- Extension: Learn embeddings for walks themselves (Z_i for walk W_i) in addition to graph embedding (Z_G).
- Approach: Embed walks to predict adjacent walks, similar to DeepWalk/node2vec.
- Objective Function: Predict adjacency of walks occurring in a window size Delta around sampled walks.
- Optimization: To find both graph and walk embeddings maximizing prediction accuracy.
- Usage: Z_G can be used in downstream tasks (e.g., graph classification).
Three Main Ideas Recap
- Sum/Average Node Embeddings: Use node embeddings summed/averaged for the entire graph, computed by DeepWalk or node2vec.
- Virtual Node: Create and embed a virtual node connected to sub-graph or entire graph.
- Anonymous Walks: Sample anonymous walks, represent the graph by walk occurrence fractions, and learn embeddings for walks and entire graph.
Future Approaches
- Hierarchical Clustering: Explore hierarchical aggregation of networks for graph embeddings.
- Graph Neural Networks: Consider GNNs in future lectures for embedding graphs.
Node Embeddings in Practice
- Clustering: Use embeddings for clustering algorithms to detect communities or social roles.
- Node Classification: Predict node labels based on embeddings.
- Link Prediction:.
- Combine node embeddings (e.g., concatenate, sum) to predict links.
- Adjust combination methods for directed or undirected graphs.
- Graph Classification: Embed entire graphs through node aggregation or anonymous walks.
Summary
- Goal: Learn node/graph embeddings independently of tasks, without manual feature engineering.
- Methods: Encoder-Decoder framework, random walks for similarity, DeepWalk and node2vec for node embeddings, graph embeddings through aggregation and anonymous walks.
- Extensions: Enhancements and practical applications discussed.
Thank you for attending the lecture!