Papers I Read Notes and Summaries

Revisiting Semi-Supervised Learning with Graph Embeddings


Problem Setting

  • Given a graph G = (V, E) and xL and xU as feature vectors for labelled and unlabelled nodes and yL as labels for the labelled nodes, the problem is to learn a mapping (classifier) f: x -> y

  • There are two settings possible:

    • Transductive - Predictions are made only for those nodes which are already observed in the graph at training time.

    • Inductive - Predictions are made for nodes whether they have been observed in the graph at training time or not.


  • The general semi-supervised learning loss would be LS + λLU where LS is the supervised learning loss while LU is the unsupervised learning loss.

  • The unsupervised loss is a variant of the Skip-gram loss with negative edge sampling.

  • More specifically, first a random walk sequence S is sampled. Then either a positive edge is sampled from S (within a given context distance) or a negative edge is sampled.

  • The label information is injected by using the label as a context and minimising the distance between the positive edges (edges where the nodes have the same label) and maximising the distance between the negative edges (edges where the nodes have different labels).

Transductive Formulation

  • Two separate fully connected networks are applied over the node features and node embeddings.

  • These 2 representations are then concatenated and fed to a softmax classifier to predict the class label.

Inductive Formulation

  • In the inductive setting, it is difficult to obtain the node embeddings at test time. One naive approach is to retrain the network to obtain the embeddings on the previously unobserved nodes but that is inefficient.

  • The embeddings of node x are parameterized as a function of its input feature vector and is learnt by applying a fully connected neural network on the node feature vector.

  • This provides a simple way to extend the original approach to the inductive setting.


  • The proposed approach is evaluated in 3 settings (text classification, distantly supervised entity extraction and entity classification) and it consistently outperforms approaches that use just node features or node embeddings.

  • The key takeaway is that the joint training in the semi-supervised setting has several benefits over the unsupervised setting and that using the graph context (in terms of node embeddings) is much more effective than using graph Laplacian-based regularization term.