Papers I Read Notes and Summaries

Pre-training Graph Neural Networks with Kernels


  • The paper proposes a pretraining technique that can be used with the GNN architecture for learning graph representation as induced by powerful graph kernels.

  • Paper


  • Graph Kernel methods can learn powerful representations of the input graphs but the learned representation is implicit as the kernel function actually computes the dot product between the representations.

  • GNNs are flexible and powerful in terms of the representations they can learn but they can easily overfit if a large amount of training data is not available as is commonly the case of graphs.

  • Kernel methods can be used to learn an unsupervised graph representation that can be finetuned using the GNN architectures for the supervised tasks.


  • Given a dataset of graphs g1, g2, …, gn, use a relevant kernel function to compute k(gi, gj) for all pairs of graphs.

  • A siamese network is used to encode the pair of graphs into representations f(gi) and f(gj) such that dot(f(gi), f(gj)) equals k(gi, gj).

  • The function f is trained to learn the compressed representation of kernel’s feature space.



  • Biological node-labeled graphs representing chemical compounds - MUTAG, PTC, NCI1


  • Graphlet Kernel (GK)
  • Random Walk Kernel
  • Propogation Kernel
  • Weisfeiler-Lehman subtree kernel (WL)


  • Pretraining uses the WL kernel

  • Pretrained model performs better than the baselines for 2 datasets but lags behind WL method (which was used for pretraining) for the NCI1 dataset.


  • The idea is straightforward and intuitive. In general, this kind of pretraining should help the downstream model. It would be interesting to try it on more datasets/kernels/GNNs so that more conclusive results can be obtained.