HyperE: Hyperbolic Embeddings for Entities

Beliz Gunel, Fred Sala, Albert Gu, Christopher Ré

Hyperbolic embeddings have captured the attention of the machine learning community through proposals from Nickel&Kiela (2017) and Chamberlain et al. (2017). The motivation is to embed structured, discrete objects such as knowledge graphs into a continuous representation that can be used with modern machine learning methods. Hyperbolic embeddings can preserve graph distances and complex relationships in very few dimensions, particularly for hierarchical graphs. In this website, we release hyperbolic embeddings that can be further integrated into applications related to knowledge base completion or can be supplied as features into various NLP tasks such as Question Answering. We provide a brief introduction both to hyperbolic geometry and to our work in this blogpost. In our paper, we present new approaches for solving the underlying optimization problems and show trade-offs for each of these approaches.

We provide the hyperbolic embeddings for chosen hierarchies from WordNet, Wikidata and MusicBrainz and evaluate the intrinsic quality of the embeddings through hyperbolic-analogues of "similarity" and "analogy". We use our combinatorial construction algorithm and our optimization-based approach implemented in PyTorch for all of the embeddings. Preliminary code for the embedding algorithms is publicly available here. Our selected 10-dimensional entity embeddings are available to download in GloVe format below.




Download

You can download our 10-dimensional entity embeddings with 64-bit precision for selected relationships from WordNet, Wikidata and MusicBrainz in the following in GloVe format. We also include the scale factor for the embeddings in the top line, since hyperbolic embeddings are not scale invariant, unlike Euclidean embeddings. Input graph distances between entities can be approximately recovered through dividing the hyperbolic distance between their corresponding hyperbolic embeddings by the provided scale factor. You can also find the preprocessing scripts to create edge lists for input graphs here. You can also get started with our demo notebook.

WordNet: hyperE.wn.10d.zip

Wikidata: hyperE.Wikidata.10d.zip

MusicBrainz: hyperE.MusicBrainz.10d.zip

Intrinsic Results

Analogy

State-of-the-art word embeddings used in many natural language processing applications successfully encode the semantic relationships between words in the Euclidean space. One of the main intrinsic evaluation tasks for word embeddings is "word analogy", which consists of questions like "Athens is to Greece as Berlin is to what?" GloVe model answers the question "a is to b as c is to what?" by finding the word d whose word embedding ed is closest to the vector eb-ea+ec according to cosine similarity.

We define the analogous linear operations in hyperbolic space and use the hyperbolic distance as the similarity metric. In other words, we output the vector ed that has the smallest hyperbolic distance to the hyperbolic-analogue of eb-ea+ec. We extend this analogy experiment to the entity hierarchies in Wikidata and include some samples for different types of relationships in the following. For reference, these entity embeddings were produced using our combinatorial approach with 10 dimensions and 64 bits of precision.


Knowledge Graph Relationship ea eb ec Top Result
WordNet Hypernym guitarist musician novelist writer
WordNet Domain Topic fluorocarbon chemistry angular velocity physics
Wikidata Student of Eubilides Euclides Dicaearchus Aristotle
Wikidata Part of Malaysia Asia Amazon Rainforest South America


Detecting Relations

In our music hierarchy embedding, songs-album edges have weight 2, and album-artist edges have weight 1. We can use these weights and the embedded hierarchy to detect relations by simply computing hyperbolic distances between embedded points. For example, the first two songs below are from the Beatles' "Sgt. Pepper's Lonely Hearts Club Band" album, and indeed the embedding expresses this relation. The next two songs are by the Beatles, but from different albums. Lastly, songs by the Rolling Stones and AC/DC are shown as unrelated (note the much larger distance).


Song 1 Song 2 Distance Relation
With a Little Help From My Friends Lucy in the Sky With Diamonds 3.761 Same album and artist
A Hard Day's Night While My Guitar Gently Weeps 4.823 Same artist, different album
(I Can't Get No) Satisfaction Back in Black 23.430 Unrelated


Preserving Relationships

Hyperbolic space enables us to compress the hierarchical relationship information in a knowledge graph to very few dimensions compared to what is possible in Euclidean space. We report precision and recall results for preserving relationships between entity pairs for several input relationship graphs in order to show how well we can preserve graph distances when we embed a multi-layer hierarchy in hyperbolic space. In this task, "correct" means the following: If an entity pair in the input graph is one edge away, hyperbolic distance between final entity embeddings stays within 1(± 0.01), when divided by the scale factor calculated based on the input graph. For reference, each of these embeddings were produced using our combinatorial approach with 64 bits of precision and 10 dimensions, and the largest connected component for each relationship in WordNet was taken.


Knowledge Graph Relationship Precision Recall
WordNet Hypernym 98.11% 98.04%
WordNet Domain Topic 97.92% 98.42%
WordNet Member Holonym 99.43% 99.75%

Visualizations

In Euclidean space, circle circumference and disc area grow linearly and quadratically with radius, respectively. However, in hyperbolic space, they both grow exponentially with respect to radius, which allows particularly efficient embeddings for hierarchical structures like trees. MAP refers to mean average precision, which is a metric that captures how well each vertex's neighborhoods are preserved.

We include some visualizations for our different embedding approaches in the following. Left hand-side is embedding a simple tree using our stochastic gradient descent-based optimization approach, which minimizes a loss function based on hyperbolic distance. For the same input graph, the right-hand side is using our combinatorial construction, a deterministic algorithm that places tree vertices exactly.

The combinatorial construction first embeds the input graph into a tree, and then it embeds this tree into the Poincaré ball. The second step of the construction builds on Sarkar's approach for two-dimensional hyperbolic space. Combinatorial construction can embed tree-like input graphs in hyperbolic space with very low distortion in linear time, while the optimization-based approach can better handle incomplete information and it is more robust to different kinds of input graphs. This shows that input graphs that aren't necessarily tree-like can sometimes be embedded in hyperbolic space with good MAP and distortion. You can also warm-start our optimization-based approach implemented in PyTorch with the embedding outputted from our combinatorial construction. We include detailed analysis of these algorithms in our paper.
alternate textOptimization-based construction
alternate textCombinatorial construction
Related Work

Our study of hyperbolic embeddings was motivated by recent works showing that hyperbolic representations are suitable for many hierarchies, with applications including the question answering (QA) system HyperQA in Tay et al. (2018), vertex classifiers in Chamberlain et al. (2017), and link prediction in Nickel&Kiela (2017).

Embedding entailment relations, i.e. directed acyclic graphs with hyperbolic cones as a heuristic was proposed in Ganea et al. (2018). Hyperbolic word and sentence embeddings are studied by Dhingra et al. (2018). Many of these works use the Poincaré model of hyperbolic geometry; our work and Nickel&Kiela (2018) suggest that other models, such as the hyperboloid model of hyperbolic space, may be simpler for certain tasks. Of course, it is always possible to translate between models.

A pair of recent approaches seek to add hyperbolic operations to neural networks. Gulcehre et al. (2018) introduces a hyperbolic version of the attention mechanism using the hyperboloid model. In Ganea et al. (2018a), building blocks from certain networks are generalized to operate with Riemannian manifolds.

Hyperbolic operations can also be applied to Support Vector Machines, as shown by Cho et al. (2018). Finally, descent methods suitable for hyperbolic optimization have gained renewed research interest; such works include Enokida et al. (2018) for geodesic updates, Zhang and Sra (2018) for accelerated Riemannian gradient methods, and Wilson&Leimeister (2018) for performing gradient descent in hyperbolic space using the hyperboloid model.

Email us!

This project is maintained by HazyResearch, as part of the Stanford DAWN project. Send us an email here for new potential applications or datasets you would like to see using HyperE until we release our engine.

Website template credits to Jon Barron