đź§ 

Graph Embedding Techniques Lecture Summary

Jul 11, 2024

Embedding Entire Graphs

Overview

  • Focus: Embedding entire graphs (or sub-graphs) rather than individual nodes.
  • Method: Anonymous random walks.
  • Applications: Molecule classification, graph anomaly detection.

Simple Node Embedding Summation

  • Idea: Run standard node embedding techniques (e.g., DeepWalk, node2vec) and sum/average node embeddings of the entire graph or sub-graph.
  • Application: Used in 2016 for molecule classification—found effective despite simplicity.

Virtual Node Technique

  • Improvement: Introduce a virtual node to represent the entire graph or sub-graph.
  • Method:
    • Create a virtual node connected to all targeted nodes.
    • Run node embedding technique (e.g., DeepWalk, node2vec) on the graph.
    • Use the virtual node's embedding as the graph's embedding.

Anonymous Walks

  • Concept: States in anonymous walks correspond to indexes of the first visit time.
  • Example:
    • Random Walk 1: A -> B -> C -> B -> C
    • Random Walk 2: C -> D -> B -> D -> B
    • Both represented as: 1 -> 2 -> 3 -> 2 -> 3
  • Characteristics: Agnostic to node identities, only considering the order of visits.
  • Walk Generation: Number of anonymous walks increases exponentially with length.

Embedding Graphs with Anonymous Walks

  • Method:
    • Simulate anonymous walks of length L.
    • Record their counts and represent the graph as a probability distribution over these walks.
  • Dimensionality:
    • Length 3: 5-dimensional vector representation.
    • Increase length for higher dimensions.
  • Accuracy: Use mathematical formula with parameters Epsilon (accuracy) and Delta (probability) to determine the number of walks needed for accurate distribution estimates.

Learning Embeddings for Anonymous Walks

  • Extension: Learn embeddings for walks themselves (Z_i for walk W_i) in addition to graph embedding (Z_G).
  • Approach: Embed walks to predict adjacent walks, similar to DeepWalk/node2vec.
  • Objective Function: Predict adjacency of walks occurring in a window size Delta around sampled walks.
  • Optimization: To find both graph and walk embeddings maximizing prediction accuracy.
  • Usage: Z_G can be used in downstream tasks (e.g., graph classification).

Three Main Ideas Recap

  1. Sum/Average Node Embeddings: Use node embeddings summed/averaged for the entire graph, computed by DeepWalk or node2vec.
  2. Virtual Node: Create and embed a virtual node connected to sub-graph or entire graph.
  3. Anonymous Walks: Sample anonymous walks, represent the graph by walk occurrence fractions, and learn embeddings for walks and entire graph.

Future Approaches

  • Hierarchical Clustering: Explore hierarchical aggregation of networks for graph embeddings.
  • Graph Neural Networks: Consider GNNs in future lectures for embedding graphs.

Node Embeddings in Practice

  • Clustering: Use embeddings for clustering algorithms to detect communities or social roles.
  • Node Classification: Predict node labels based on embeddings.
  • Link Prediction:.
    • Combine node embeddings (e.g., concatenate, sum) to predict links.
    • Adjust combination methods for directed or undirected graphs.
  • Graph Classification: Embed entire graphs through node aggregation or anonymous walks.

Summary

  • Goal: Learn node/graph embeddings independently of tasks, without manual feature engineering.
  • Methods: Encoder-Decoder framework, random walks for similarity, DeepWalk and node2vec for node embeddings, graph embeddings through aggregation and anonymous walks.
  • Extensions: Enhancements and practical applications discussed.

Thank you for attending the lecture!