Papers I Read Notes and Summaries

Representation Tradeoffs for Hyperbolic Embeddings


  • The paper describes a combinatorial approach to embed trees into hyperbolic spaces without performing optimization.

  • The resulting mechanism is analyzed to obtain dimensionality-precision tradeoffs.

  • To embed any metric spaces in the hyperbolic spaces, a hyperbolic generalization of the multidimensional scaling (h-MDS) is proposed.

  • Link to the paper


  • Hyperbolic Spaces

    • Have the “tree” like property ie the shortest path between a pair of points is almost the same as the path through the origin.

    • Generally, Poincare ball model is used given its advantages like conformity to the Euclidean spaces.

  • Fidelity Measures

    • Mean Average Precision - MAP

      • A local metric that ranks between distances of the immediate neighbors.
    • Distortion

      • A global metric that depends on the underlying distances and not just the local relationship between distances.

Combinatorial Construction for embedding hierarchies into Hyperbolic spaces

  • Embed the given graph G = (V, E) into a tree T.

  • Embed the tree T into the poincare ball Hd of dimensionality d.

Sarkar’s construction to embed points in a 2-d Poincare ball

  • Consider two points a and b (from the tree) where b is the parent of a.

  • Assume that a is embedded as f(a) and b is embedded as f(b) and the children of a needs to be embedded.

  • Reflect f(a) and f(b) across a geodesic such that f(a) is mapped to 0 (origin) while f(b) is mapped to some new point z.

  • Children of a are placed at points yi which are equally placed around a circle of radius (er - 1) / (er + 1) and maximally seperated from z, where r is the scaling factor.

  • Then all the points are reflected back across the geodesic so that all children are at a distance r from f(a).

  • To embed the tree itself, place the root node at the origin, place its children around it in a circle, then place their children and so on.

  • In this construct, precision scales logarithmically with the degree of the tree but linearly with the maximum path length.

d-dimensional hyperbolic spaces

  • In the d-dimensional space, the points are embedded into hyperspheres (instead of circles).

  • The number of children node that can be placed for a particular angle grows with the dimension.

  • Increasing dimension helps with bushy trees (with high node degree).

Hyperbolic multidimensional scaling (h-MDS)

  • Given the pairwise distance from a set of points in the hyperbolic space, how to recover the points?

  • The corresponding problem in the Euclidean space is solved using MDS.

  • A variant of MDS called as h-MDS is proposed.

  • MDS makes a centering assumption that points have 0 mean. In h-MDS, a new mean (called as the pseudo-Euclidean mean) is introduced to enable recovery via matrix factorization.

  • Instead of the Poincare model, the hyperboloid model is used (though the points can be mapped back and forth).

pseudo-Euclidean Mean

  • A set of points can always be centered without affecting their pairwise distance by simply finding their mean and sending it to 0 via isometry

Recovery via matrix factorization

  • Given the pairwise distances, a new matrix Y is constructed by applying cosh on the pairwise distances.

  • Running PCA on -Y recovers X up to rotation.

Dimensionality Reduction with PGA (Principal Geodesic Analysis)

  • PGA is the counterpart of PCA in the hyperbolic spaces.

  • First the Karcher mean of the given points is computed.

  • All points xi are reflected so that their mean is 0 in the Poincare disk model.

  • Combining that with Euclidean reflection formula and hyperbolic metrics leads to a non-convex loss function which can be optimized using gradient descent algorithm.


  • Datasets

    • Trees: fully balanced and phylogenic trees expressing genetic heritage.
    • Tree-like hierarchy: WordNet hypernym and graph of Ph.D. advisor-advisee relationships.
    • No-tree like disease relationships, proteins interactions etc
  • Results

    • Combinatorial construction outperforms approaches based on optimization in terms of both MAP and distortion.
    • eg on WordNet, the combinatorial approach achieves a MAP of 0.989 with just 2 dimensions while the previous best was 0.87 with 200 dimensions.