Pretraining Graph Neural Networks with Kernels
02 Jan 2019Introduction

The paper proposes a pretraining technique that can be used with the GNN architecture for learning graph representation as induced by powerful graph kernels.
Idea

Graph Kernel methods can learn powerful representations of the input graphs but the learned representation is implicit as the kernel function actually computes the dot product between the representations.

GNNs are flexible and powerful in terms of the representations they can learn but they can easily overfit if a large amount of training data is not available as is commonly the case of graphs.

Kernel methods can be used to learn an unsupervised graph representation that can be finetuned using the GNN architectures for the supervised tasks.
Architecture

Given a dataset of graphs g_{1}, g_{2}, …, g_{n}, use a relevant kernel function to compute k(g_{i}, g_{j}) for all pairs of graphs.

A siamese network is used to encode the pair of graphs into representations f(g_{i}) and f(g_{j}) such that dot(f(g_{i}), f(g_{j})) equals k(g_{i}, g_{j}).

The function f is trained to learn the compressed representation of kernel’s feature space.
Experiments
Datasets
 Biological nodelabeled graphs representing chemical compounds  MUTAG, PTC, NCI1
Baselines
 DGCNN
 Graphlet Kernel (GK)
 Random Walk Kernel
 Propogation Kernel
 WeisfeilerLehman subtree kernel (WL)
Results

Pretraining uses the WL kernel

Pretrained model performs better than the baselines for 2 datasets but lags behind WL method (which was used for pretraining) for the NCI1 dataset.
Notes
 The idea is straightforward and intuitive. In general, this kind of pretraining should help the downstream model. It would be interesting to try it on more datasets/kernels/GNNs so that more conclusive results can be obtained.